[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Evolving a Coder for an added field

+Reuven Lax for update proposal

Dataflow is the only Apache Beam runner which has the capability of updating pipelines. This page[1] describes many of the aspects of how it works and specifically talks about coder changes:

  • Changing the Coder for a step. When you update a job, the Cloud Dataflow service preserves any data records currently buffered (for example, while windowing is resolving) and handles them in the replacement job. If the replacement job uses different or incompatible data encoding, the Cloud Dataflow service will not be able to serialize or deserialize these records.
  • Caution: The Cloud Dataflow service currently cannot guarantee that changing a coder in your prior pipeline to an incompatible coder will cause the compatibility check to fail. It is recommended that you do not attempt to make backwards-incompatible changes to Coders when updating your pipeline; if your pipeline update succeeds but you encounter issues or errors in the resulting data, ensure that your replacement pipeline uses data encoding that's the same as, or at least compatible with, your prior job.

There has been a proposal[2] for general update support within Apache Beam with little traction for implementation outside of Dataflow.

Looking at your code, it wouldn't work with update because encoded values concatenated together without an element delimiter in many situations. Hence when you decode a value using the past format with your new coder you would read from the next value corrupting your read. If you really need to change the encoding in a backwards incompatible way, you would need to change the "name" of the coder which currently defaults to the class name.

On Fri, Nov 2, 2018 at 5:44 AM Jeff Klukas <jklukas@xxxxxxxxxxx> wrote:
I'm adding a new lastModifiedMillis field to MatchResult.Metadata [0] which requires also updating MetadataCoder, but it's not clear to me whether there are guidelines to follow when evolving a type when that changes the encoding.

Is a user allowed to update Beam library versions as part of updating a pipeline? If so, there could be a situation where an updated pipeline is reading state that includes Metadata encoded without the new lastModifiedMillis field, which would cause a CodingException to be thrown.

Is there prior art for evolving a type and its Coder? Should I be defensive and catch CodingException when attempting to decode the new field, providing a default value?