[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Questions about join and exactly-once

Hi Minglei,

1. Not sure if you are asking for a specific problem, but IMO the main
challenge is that there are many different ways (and meanings) to join two
streams. The required semantics always depend on the concrete use case. If
you want to perform an simple equality join with SQL semantics, you
typically need to fully materialize both streams which is often too
expensive. Often, you want to add some kind of time-based predicate which
allows to evaluate the join more efficiently (both in terms of state and
computation). Flink's DataStream API adds a new "interval" join with the
upcoming 1.6.0 release. Flink's Table API offers a few more built-in joins.

2. Flink provides exactly-once consistency for application state using
checkpoints and resettable sources. Of course checkpointing dose not come
for free and can be quite expensive, but typically the difference to
at-least-once checkpointing is not too large. The real challenge though is
end-to-end exactly-once which requires sophisticated sink connectors.
Again, the complexity depends on the concrete use case. The stricter the
guarantees, the more expensive the application becomes.

Best, Fabian

2018-07-30 11:59 GMT+02:00 zhangminglei <18717838093@xxxxxxx>:

> Hi, I would like to ask 2 questions.
> 1.      Currently, what is the problem of flink join ? And what is the
> essential difference between batch join and stream join ?
> 2.      What are the shortcomings of current exactly-once ?
> Thanks
> minglei.