[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Multiple stream operator watermark handling

On Thu, May 24, 2018 at 7:26 AM, Piotr Nowojski <piotr@xxxxxxxxxxxxxxxxx> wrote:
From top of my head I can imagine two solutions:

1. Override the default behaviour of the operator via for example org.apache.flink.streaming.api.datastream.ConnectedStreams#transform

That seems the safer, but more complicated path.

2. Can you set control stream’s watermark to Watermark#MAX_WATERMARK or maybe Watermark#MAX_WATERMARK - 1 ?

That seems simpler, put potentially perilous if at some point in the future there was some use to control stream watermarks.  Also, would it work if there are no messages in the control stream?  Wouldn't that mean no watermark would be emitted, even if they were hardcoded to Long.MAX_VALUE? In which case, the operator default for the stream would be used, which would still be Long.MIN_VALUE.

BTW, this reminds me of an issue I've mentioned previously, the documentation is lacking on a description of how watermarks are processed by operators.  E.g. when does a window emit watermarks?  what watermarks does it emit?  That seems like a rather large omission, as one of the main features of Flink is event time processing, which puts watermarks almost on equal footing to data and data operations.  Just as the docs describe how data is process, merged, etc, the same should be true for watermarks.