[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

CSVSplitter - Splittable DoFn

Hi All,

I noticed that there is no support for CSV file reading (e.g. rfc4180) in Apache Beam - at least no native transform. There's an issue to add this support: https://issues.apache.org/jira/browse/BEAM-51

I've seen examples which use the apache commons csv parser. I took a shot at implementing a SplittableDoFn transform. I have the full code and some questions in a gist here: https://gist.github.com/pbrumblay/9474dcc6cd238c3f1d26d869a20e863d.

I suspect it could be improved quite a bit. If anyone has time to provide feedback I would really appreciate it. 


Peter Brumblay
Fearless Technology Group, Inc.