osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Portable streaming side inputs


I have been using this inside Dataflow with the PushbackSideInputDoFnRunner as the window mapping fn:
https://github.com/lukecwik/incubator-beam/commit/4324185192b27026270ee342b01720ff40e71df8

The input is a KV with the key being a nonce and the value being a window, the output must be a KV with the key being the same nonce as the input and the value being the mapped window.
All window mapping fns are deterministic and all window encodings are deterministic which means that you can cache the encoded output window based upon the encoded input window after it has been mapped once indefinitely.


On Wed, Aug 8, 2018 at 9:08 AM Thomas Weise <thw@xxxxxxxxxx> wrote:
I'm working on support for side inputs in streaming mode in the portable Flink runner [1].

The runner would be responsible for holding main inputs (durably) when side inputs are not available.

To check if side inputs are available, the window mapping function is required (see SimplePushbackSideInputDoFnRunner.isReady).

The window mapping function (like viewFn) is SDK specific serialized in the proto (see PCollectionViewTranslation). 

With Python SDK, the Flink runner cannot rehydrate these objects.

Is my understanding correct and what assumptions can be made about the window mapping in the runner?

Thanks,
Thomas