OSDir

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: MapWithState for two keyed stream


CoRichFlatMap or union will work.  If you need to know which is historical the flatmap will be better as you can tell which stream it cam from.  But, be careful about reading historical data and trying to process it all before processing the new data.  That can lead to buffering a lot of incoming data while processing the historical data.

Michael

> On May 9, 2018, at 9:49 AM, Peter Zende <peter.zende@xxxxxxxxx> wrote:
> 
> Hi all,
> 
> Is it possible to define two DataStream sources - one which reads from Kafka, the other reads from HDFS -  and apply mapWithState with CoFlatMapFunction? The idea would be to read historical data from HDFS along with the live stream from Kafka and based on some business  write the output to the sink in correct update order?
> 
> Or is it easier to just union those two streams? In mapWithState we should tell from which stream the record originates from to be able to correctly build up the state store.
> 
> Many thanks.
> Peter