[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Sharing state between subtasks

On Wed, Oct 10, 2018 at 9:33 AM Fabian Hueske <fhueske@xxxxxxxxx> wrote:

> I think the new source interface would be designed to be able to leverage
> shared state to achieve time alignment.
> I don't think this would be possible without some kind of shared state.
> The problem of tasks that are far ahead in time cannot be solved with
> back-pressure.
> That's because a task cannot choose from which source task it accepts
> events and from which doesn't.
> If it blocks an input, all downstream tasks that are connected to the
> operator are affected. This can easily lead to deadlocks.
> Therefore, all operators need to be able to handle events when they arrive.
> If they cannot process them yet because they are too far ahead in time,
> they are put in state.

The idea I was suggesting is not for operators to block an input.  Rather,
it is that they selectively choose from which input to process the next
message from based on their timestamp, so long as there are buffered
messages waiting to be processed.  That is a best-effort alignment
strategy.  Seems to work relatively well in practice, at least within Kafka

E.g. at the moment StreamTwoInputProcessor uses a UnionInputGate for both
its inputs.  Instead, it could keep them separate and selectively consume
from the one that had a buffer available, and if both have buffers
available, from the buffer with the messages with a lower timestamp.