[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[jira] [Created] (ARROW-2592) [Python] AssertionError in to_pandas()

Dima Ryazanov created ARROW-2592:

             Summary: [Python] AssertionError in to_pandas()
                 Key: ARROW-2592
                 URL: https://issues.apache.org/jira/browse/ARROW-2592
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.9.0, 0.8.0
            Reporter: Dima Ryazanov

Pyarrow 0.8 and 0.9 raises an AssertionError for one of the datasets I have (created using an older version of pyarrow). Repro steps:

{{In [1]: from pyarrow.parquet import ParquetDataset}}

{{In [2]: d = ParquetDataset(['bug.parq'])}}

{{In [3]: t = d.read()}}

{{In [4]: t.to_pandas()}}
{{AssertionError                            Traceback (most recent call last)}}
{{<ipython-input-4-d17c9e2818f1> in <module>()}}
{{----> 1 t.to_pandas()}}

{{table.pxi in pyarrow.lib.Table.to_pandas()}}

{{~/envs/cli3/lib/python3.6/site-packages/pyarrow/pandas_compat.py in table_to_blockmanager(options, table, memory_pool, nthreads, categories)}}
{{    529     # There must be the same number of field names and physical names}}
{{    530     # (fields in the arrow Table)}}
{{--> 531     assert len(logical_index_names) == len(index_columns_set)}}
{{    532 }}
{{    533     # It can never be the case in a released version of pyarrow that}}

{{AssertionError: }}


Here's the file: [https://www.dropbox.com/s/oja3khjsc5tycfh/bug.parq]

(I was not able to attach it here due to a "missing token", whatever that means.)

This message was sent by Atlassian JIRA