osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Implementing a “join” between a DataStream and a “set of rules”


Hi Sandybayev,

In the current state, Flink does not provide a solution to the
mentioned use case. However, there is open FLIP[1] [2] which has been
created to address the same.

I can see in your current approach, you are not able to update the
rule set data. I think you can update rule set data by building
DataStream around changelogs which are stored in message
queue/distributed file system.
OR
You can store rule set data in the external system where you can query
for incoming keys from Flink.

[1]: https://cwiki.apache.org/confluence/display/FLINK/FLIP-17+Side+Inputs+for+DataStream+API
[2]: https://issues.apache.org/jira/browse/FLINK-6131

On Tue, Jun 5, 2018 at 5:52 PM, Sandybayev, Turar (CAI - Atlanta)
<Turar.Sandybayev@xxxxxxxxxxxxxx> wrote:
> Hi,
>
>
>
> What is the best practice recommendation for the following use case? We need
> to match a stream against a set of “rules”, which are essentially a Flink
> DataSet concept. Updates to this “rules set" are possible but not frequent.
> Each stream event must be checked against all the records in “rules set”,
> and each match produces one or more events into a sink. Number of records in
> a rule set are in the 6 digit range.
>
>
>
> Currently we're simply loading rules into a local List of rules and using
> flatMap over an incoming DataStream. Inside flatMap, we're just iterating
> over a list comparing each event to each rule.
>
>
>
> To speed up the iteration, we can also split the list into several batches,
> essentially creating a list of lists, and creating a separate thread to
> iterate over each sub-list (using Futures in either Java or Scala).
>
>
>
> Questions:
>
> 1.            Is there a better way to do this kind of a join?
>
> 2.            If not, is it safe to add additional parallelism by creating
> new threads inside each flatMap operation, on top of what Flink is already
> doing?
>
>
>
> Thanks in advance!
>
> Turar
>
>