[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[DISCUSS] SplittableDoFn Java SDK User Facing API

I have started to work on how to change the user facing API within the Java SDK to support splitting/checkpointing[1], backlog reporting[2] and bundle finalization[3].

I have this PR[4] which contains minimal interface/type definitions to convey how the API surface would change with these 4 changes:
1) Exposes the ability for @SplitRestriction to take a backlog suggestion on how to perform splitting and for how many restrictions should be returned.
2) Adds the ability for RestrictionTrackers to report backlog
4) Adds the ability for @StartBundle/@ProcessElement/@FinishBundle to register a callback that is invoked after bundle finalization.

The details are in the javadoc comments as to how I would expect the contract to play out.
Feel free to comment on the ML/PR around the contract and after the feedback is received/digested/implemented, I would like to get the changes submitted so that work can start  towards providing an implementation in the Java SDK, Python SDK, and Go SDK and the shared runner portability library.

I would like to call out special attention to 3 since with this change it will enable us to remove the synchronization requirement for users as we will wrap the underlying restriction tracker allowing us to add appropriate synchronization as needed and also to watch any calls that pass through the object such as the claim calls. I also believe this prevents people from writing RestrictionTrackers where the contract of tryClaim is subverted since markDone is outside the purview of tryClaim as in ByteKeyRangeTracker[5].