osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[jira] [Created] (ARROW-3543) crazy timestamp bug in feather?


Olaf created ARROW-3543:
---------------------------

             Summary: crazy timestamp bug in feather? 
                 Key: ARROW-3543
                 URL: https://issues.apache.org/jira/browse/ARROW-3543
             Project: Apache Arrow
          Issue Type: Bug
            Reporter: Olaf


Hello the dream team,

Pasting from https://github.com/wesm/feather/issues/351

Thanks for this wonderful package. I was playing with feather and some timestamps and I noticed some dangerous behavior. Maybe it is a bug.

Consider this

```
import pandas as pd
import feather
import numpy as np


df = pd.DataFrame({'string_time_utc' : [pd.to_datetime('2018-02-01 14:00:00.531'), 
 pd.to_datetime('2018-02-01 14:01:00.456'),
 pd.to_datetime('2018-03-05 14:01:02.200')]})

df['timestamp_est'] = pd.to_datetime(df.string_time_utc).dt.tz_localize('UTC').dt.tz_convert('US/Eastern').dt.tz_localize(None)

df
Out[17]: 
 string_time_utc timestamp_est
0 2018-02-01 14:00:00.531 2018-02-01 09:00:00.531
1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456
2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200
```
Here I create the corresponding `EST` timestamp of my original timestamps (in `UTC` time).

Now saving the dataframe to `csv` or to `feather` will generate two completely different results.

```
df.to_csv('P://testing.csv')
df.to_feather('P://testing.feather')
```
Switching to R.

Using the good old `csv` gives me something a bit annoying, but expected. R thinks my timezone is `UTC` by default, and wrongly attached this timezone to `timestamp_est`. No big deal, I can always use `with_tz` or even better: import as character and process as timestamp while in R.

```
> dataframe <- read_csv('P://testing.csv')
Parsed with column specification:
cols(
 X1 = col_integer(),
 string_time_utc = col_datetime(format = ""),
 timestamp_est = col_datetime(format = "")
)
Warning message:
Missing column names filled in: 'X1' [1] 
> 
> dataframe %>% mutate(mytimezone = tz(timestamp_est))
# A tibble: 3 x 4
 X1 string_time_utc timestamp_est 
 <int> <dttm> <dttm> 
1 0 2018-02-01 14:00:00.530 2018-02-01 09:00:00.530
2 1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456
3 2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200
 mytimezone
 <chr> 
1 UTC 
2 UTC 
3 UTC 
```
Now look at what happens with `feather`:

```
> dataframe <- read_feather('P://testing.feather')
> 
> dataframe %>% mutate(mytimezone = tz(timestamp_est))
# A tibble: 3 x 3
 string_time_utc timestamp_est mytimezone
 <dttm> <dttm> <chr> 
1 2018-02-01 09:00:00.531 2018-02-01 04:00:00.531 "" 
2 2018-02-01 09:01:00.456 2018-02-01 04:01:00.456 "" 
3 2018-03-05 09:01:02.200 2018-03-05 04:01:02.200 "" 
```
My timestamps have been converted!!! pure insanity. 
Am I missing something here?

Thanks!!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)