Beam Python expects DoFns to return an iterable that contains the actual output elements. This is documented, and visible in examples, but it is also a bit counter-intuitive.
We should definitely add a check in _OutputProcessor to throw a more expressive error if it receives a non-iterable.
Should we also let Beam error out if users return a string?
e.g. consider the following pipeline:
p | Create(['abc']) | ParDo(lambda x: x) | WriteToFile('myfile')
This pipeline would write three separate elements. Is this not a bit awkward?
Erroring out when a string is returned would be the least surprising solution for users, as opposed to having their strings getting broken down into a bunch of single-char elements.
A con is that there may be users already relying on this functionality, so that might be a breaking change. But I think it's still worth discussing.