I have stream_A of type "Dog", which needs to be transformed using data from
stream_C of type "Name_Mapping". As stream_C is a slow one (data is not
being updated frequently), to do the transformation I connect two streams,
do a keyBy, and then use a RichCoFlatMapFunction in which mapping data from
stream_C is saved into a State (flatMap1 generates 1 output, while flatMap2
is just to update State table, not generating any output).
Now I have another stream B of type "Cat", which also needs to be
transformed using data from stream_C. After that transformation,
transformed_B will go through a completely different pipeline from
I can see two approaches for this:
1. duplicate stream_C and the RichCoFlatMapFunction and apply on stream_B
2. create a new stream D of type "Animal", transform it with C, then split
the result into two streams using split/select using case class pattern
My question is which option should I choose?
With option 1, at least I need to maintain two State tables, let alone the
cost for duplicating stream (I am not sure how expensive this is in term of
resource), and the requirement on duplicating the CoFlatMapFunction (*).
With option 2, there's additional cost coming from unioning,
splitting/selecting, and type-casting at the final streams.
Is there any better option for me?
Thank you very much for your support.
(*) I am using Scala, and I tried to create a RichCoFlatMapFunction of type
[Animal, Name_Mapping] but it cannot be used for a stream of [Dog,
Name_Mapping] or [Cat, Name_Mapping]. Thus I needed to duplicate the
Function as well.
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/