[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PROPOSAL] Bundle splitting (https://s.apache.org/beam-checkpoint-and-split-bundles)

I build off of the work performed by Eugene et al. within Breaking the fusion barrier[2] and propose[1] a way of how to support splitting of bundles (primarily for SplittableDoFn) within the portability layer. This also builds off of a lot of past work[3, 4, 5, 6, 7] related to splitting.

Note that this proposal[1] discusses the portability API changes and "control" flow needed. It also discusses implementation details recommended during implementation by SDKs and runners. Interestingly, I believe there is a way to have a limited form of dynamic work rebalancing for all runners[8] that exist today that should be easily extensible by Runners to provide a meaningful solution but until implemented and tried out, hard to say what gains if any there could be. 

Note that follow-up proposals/discussions about any SplittableDoFn API changes specific to each language implementation should follow by those interested in getting SplittableDoFn working with portability. There are a few that are needed to support backlog reporting/splitting at backlog[6] and also bundle finalization[9].

This topic has a lot of historical context so I apologize upfront for the complicated read, but feel free to comment on the doc or this thread.