I build off of the work performed by Eugene et al. within Breaking the fusion barrier and propose a way of how to support splitting of bundles (primarily for SplittableDoFn) within the portability layer. This also builds off of a lot of past work[3, 4, 5, 6, 7] related to splitting.
Note that this proposal discusses the portability API changes and "control" flow needed. It also discusses implementation details recommended during implementation by SDKs and runners. Interestingly, I believe there is a way to have a limited form of dynamic work rebalancing for all runners that exist today that should be easily extensible by Runners to provide a meaningful solution but until implemented and tried out, hard to say what gains if any there could be.
Note that follow-up proposals/discussions about any SplittableDoFn API changes specific to each language implementation should follow by those interested in getting SplittableDoFn working with portability. There are a few that are needed to support backlog reporting/splitting at backlog and also bundle finalization.
This topic has a lot of historical context so I apologize upfront for the complicated read, but feel free to comment on the doc or this thread.