osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[jira] [Created] (ARROW-3956) [Python] ParquetWriter.write_table isn't working


David Lee created ARROW-3956:
--------------------------------

             Summary: [Python] ParquetWriter.write_table isn't working
                 Key: ARROW-3956
                 URL: https://issues.apache.org/jira/browse/ARROW-3956
             Project: Apache Arrow
          Issue Type: Bug
    Affects Versions: 0.11.1
            Reporter: David Lee


ParquetWriter.write_table is erroring out on table schema doesn't match file schema, but it does match.

 

Error:
{code:java}
>>> writer.write_table(arrow_table)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "../lib/python3.6/site-packages/pyarrow/parquet.py", line 374, in write_table
raise ValueError(msg)
ValueError: Table schema does not match schema used to create file:
table:
col1: int64
col2: int64
metadata
--------
{b'pandas': b'{"index_columns": [], "column_indexes": [], "columns": [{"name":'
b' "col1", "field_name": "col1", "pandas_type": "int64", "numpy_ty'
b'pe": "int64", "metadata": null}, {"name": "col2", "field_name": '
b'"col2", "pandas_type": "int64", "numpy_type": "int64", "metadata'
b'": null}], "pandas_version": "0.23.4"}'} vs.
file:
col1: int64
col2: int64
{code}
Test Script:
{code:java}
import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd

d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)

arrow_table = pa.Table.from_pandas(df, preserve_index=False)
arrow_table

pq.write_table(arrow_table, "test.parquet")

test_schema = pa.schema([
pa.field('col1', pa.int64()),
pa.field('col2', pa.int64())
])

writer = pq.ParquetWriter("test2.parquet", use_dictionary=True, schema = test_schema, compression='snappy')
writer.write_table(arrow_table)
writer.close()
{code}
write_table() works, but ParquetWriter.write_table does not..

I think something is wrong with the schema object.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)