OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[jira] [Created] (ARROW-3703) [Python] DataFrame.to_parquet crashes if datetime column has time zones


Diego Argueta created ARROW-3703:
------------------------------------

             Summary: [Python] DataFrame.to_parquet crashes if datetime column has time zones
                 Key: ARROW-3703
                 URL: https://issues.apache.org/jira/browse/ARROW-3703
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.11.1
         Environment: pandas 0.23.4
pyarrow 0.11.1
Python 3.5 - 3.7
MacOS High Sierra (10.13.6)
            Reporter: Diego Argueta


On CPython 3.5.6, 3.6.6, and 3.7.0, creating a Pandas DataFrame with a {{datetime.datetime}} object serializes to Parquet just fine, but crashes with an {{AttributeError}} if you try to use the built-in {{timezone}} objects.

To reproduce:
{code:java}
import datetime as dt
import pandas as pd

df = pd.DataFrame({'foo': [dt.datetime(2018, 1, 1, 1, 23, 45, tzinfo=dt.timezone.utc)]})
df.to_parquet('data.parq')
{code}
The following exception results:
{noformat}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pandas/core/frame.py", line 1945, in to_parquet
    compression=compression, **kwargs)
  File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pandas/io/parquet.py", line 257, in to_parquet
    return impl.write(df, path, compression=compression, **kwargs)
  File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pandas/io/parquet.py", line 118, in write
    table = self.api.Table.from_pandas(df)
  File "pyarrow/table.pxi", line 1217, in pyarrow.lib.Table.from_pandas
  File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 381, in dataframe_to_arrays
    convert_types)]
  File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 380, in <listcomp>
    for c, t in zip(columns_to_convert,
  File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 370, in convert_column
    return pa.array(col, type=ty, from_pandas=True, safe=safe)
  File "pyarrow/array.pxi", line 167, in pyarrow.lib.array
  File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 409, in get_datetimetz_type
    type_ = pa.timestamp(unit, tz)
  File "pyarrow/types.pxi", line 1038, in pyarrow.lib.timestamp
  File "pyarrow/types.pxi", line 955, in pyarrow.lib.tzinfo_to_string
AttributeError: 'datetime.timezone' object has no attribute 'zone'

'datetime.timezone' object has no attribute 'zone'
{noformat}
 
 This doesn't happen if you use {{pytz.UTC}} as the timezone object.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)