[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[jira] [Created] (ARROW-2679) pyarrow dataframe streaming to/from parquet is type-lossy

Rob Ambalu created ARROW-2679:

             Summary: pyarrow dataframe streaming to/from parquet is type-lossy
                 Key: ARROW-2679
                 URL: https://issues.apache.org/jira/browse/ARROW-2679
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.9.0
            Reporter: Rob Ambalu

While streaming a dataframe -> pyarrow -> parquet file and back I noticed that my date column had its type information switch from "object" ( which would have loaded it as a date I would imagine ) to "datetime":

from datetime import date
import pandas as pd
import pyarrow.parquet as pp
import pyarrow as pa

df = pd.DataFrame( { 'a' : [ date( 2017, 1, 1), date( 2017, 2, 1 ) ] })
table = pa.Table.from_pandas( df )
pp.write_table( table, 'C:\\Temp\\parquet_test')
table2 = pp.read_table( 'C:\\Temp\\parquet_test' )
df2 = table2.to_pandas()

>>> df['a'].dtype
>>> df2['a'].dtype

This message was sent by Atlassian JIRA