osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[jira] [Created] (ARROW-2587) [Python] can read StructArrays from parquet but unable to write them


jacques created ARROW-2587:
------------------------------

             Summary: [Python] can read StructArrays from parquet but unable to write them
                 Key: ARROW-2587
                 URL: https://issues.apache.org/jira/browse/ARROW-2587
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.9.0
            Reporter: jacques


Although I am able to read StructArray from parquet, I am still unable to write it back from pa.Table to parquet.

Here is a quick example

```python

In [2]: import pyarrow.parquet as pq

In [3]: table = pq.read_table('test.parquet')

In [4]: table
Out[4]: 
pyarrow.Table
weight: double
animal_type: string
animal_interpretation: struct<is_large_animal: bool, is_mammal: bool>
  child 0, is_large_animal: bool
  child 1, is_mammal: bool
metadata
--------
{'org.apache.spark.sql.parquet.row.metadata': '\{"type":"struct","fields":[{"name":"weight","type":"double","nullable":true,"metadata":{}},\{"name":"animal_type","type":"string","nullable":true,"metadata":{}},\{"name":"animal_interpretation","type":{"type":"struct","fields":[{"name":"is_large_animal","type":"boolean","nullable":true,"metadata":{}},\{"name":"is_mammal","type":"boolean","nullable":true,"metadata":{}}]},"nullable":false,"metadata":{}}]}'}

In [5]: table.schema
Out[5]: 
weight: double
animal_type: string
animal_interpretation: struct<is_large_animal: bool, is_mammal: bool>
  child 0, is_large_animal: bool
  child 1, is_mammal: bool
metadata
--------
{'org.apache.spark.sql.parquet.row.metadata': '\{"type":"struct","fields":[{"name":"weight","type":"double","nullable":true,"metadata":{}},\{"name":"animal_type","type":"string","nullable":true,"metadata":{}},\{"name":"animal_interpretation","type":{"type":"struct","fields":[{"name":"is_large_animal","type":"boolean","nullable":true,"metadata":{}},\{"name":"is_mammal","type":"boolean","nullable":true,"metadata":{}}]},"nullable":false,"metadata":{}}]}'}

In [6]: pq.write_table(table,"test_write.parquet")
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
<ipython-input-6-bd9d7deee437> in <module>()
----> 1 pq.write_table(table,"test_write.parquet")

/usr/local/lib/python2.7/dist-packages/pyarrow/parquet.pyc in write_table(table, where, row_group_size, version, use_dictionary, compression, use_deprecated_int96_timestamps, coerce_timestamps, flavor, **kwargs)
    982                 use_deprecated_int96_timestamps=use_int96,
    983                 **kwargs) as writer:
--> 984             writer.write_table(table, row_group_size=row_group_size)
    985     except Exception:
    986         if is_path(where):

/usr/local/lib/python2.7/dist-packages/pyarrow/parquet.pyc in write_table(self, table, row_group_size)
    325             table = _sanitize_table(table, self.schema, self.flavor)
    326         assert self.is_open
--> 327         self.writer.write_table(table, row_group_size=row_group_size)
    328 
    329     def close(self):

/usr/local/lib/python2.7/dist-packages/pyarrow/_parquet.so in pyarrow._parquet.ParquetWriter.write_table()

/usr/local/lib/python2.7/dist-packages/pyarrow/lib.so in pyarrow.lib.check_status()

ArrowInvalid: Nested column branch had multiple children

```

 

I would really appreciate a fix on this.

Best,

Jacques



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)