osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Reading 'scientific' csv using Pandas?


On 2018-11-18 19:22, Martin Sch??n wrote:
> Den 2018-11-18 skrev Shakti Kumar <shakti.shrivastava13 at gmail.com>:
>> On Sun, 18 Nov 2018 at 18:18, Martin Sch??n <martin.schoon at gmail.com> wrote:
>>>
>>> Now I hit a bump in the road when some of the data is not in plain
>>> decimal notation (xxx,xx) but in 'scientific' (xx,xxxe-xx) notation.
>>>
>>
>> Martin, I believe this should be done by pandas itself while reading
>> the csv file,
>> I took an example in scientific notation and checked this out,
>>
>> my sample.csv file is,
>> col1,col2
>> 1.1,0
>> 10.24e-05,1
>> 9.492e-10,2
>>
> That was a quick answer!
> 
> My pandas is up to date.
> 
> In your example you use the US convention of using "." for decimals
> and "," to separate data. This works perfect for me too.
> 
> However, my data files use European conventions: decimal "," and TAB
> to separate data:
> 
> col1	col2
> 1,1	0
> 10,24e-05	1
> 9,492e-10	2
> 
> I use 
> 
> EUData = pd.read_csv('file.csv', skiprows=1, sep='\t',
> decimal=',', engine='python')
> 
> to read from such files. This works so so. 'Common floats' (3,1415 etc)
> works just fine but 'scientific' stuff (1,6023e23) does not work.
> 
> /Martin
> 


This looks like a bug in the 'python' engine specifically. I suggest you
write a bug report at https://github.com/pandas-dev/pandas/issues

(conda:nb) /tmp
0:jollans at mn70% cat test.csv
Index	Value
0	1,674
1	3,48e+3
2	8,1834e-10
3	3984,109
4	2830812370

(conda:nb) /tmp
0:jollans at mn70% ipython
Python 3.7.0 (default, Oct  9 2018, 10:31:47)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.1.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import pandas as pd



In [2]: pd.read_csv('test.csv', header=[0], index_col=0, decimal=',',
sep='\t')

Out[2]:
              Value
Index
0      1.674000e+00
1      3.480000e+03
2      8.183400e-10
3      3.984109e+03
4      2.830812e+09

In [3]: pd.read_csv('test.csv', header=[0], index_col=0, decimal=',',
sep='\t', engine='python')

Out[3]:
            Value
Index
0           1.674
1         3,48e+3
2      8,1834e-10
3        3984.109
4      2830812370

In [4]: pd.__version__


Out[4]: '0.23.4'



-- 
Cheers,
 Thomas