Beam Custom I/O Read Transform


My company, SwiftIQ, uses google dataflow for our large scale data processing pipeline. We currently are using java as our codebase. We are looking at Python, but I'm having trouble trying to see if our dataflow can be supported used Python.

Our first step of the pipeline should be a I/O Read Transform of an XML file. I see that this package exists in Java, however I'm not finding it as a module in Python. 

Is there a Python module that does this? If not is there a way to write our own custom Read Transform that reads a XML file into a PCollection?

A quick response would be greatly appreciated.


Sean Schwartz

Data Engineer
Cell: 847.772.0240