Re: [DISCUSS][TABLE] How to handle empty delete for UpsertSource
For the record I was also in favour of option 3. regarding empty deletes.
About stateful Filters on upsert stream. Hequn, I don’t fully understand this:
> However, if UpsertSource can also output empty deletes, these
> empty deletes will be more difficult to control. We don't know where these
> deletes come from, and whether should be filtered out. The ambiguity of the
> semantics of processing empty deletes makes the user unable to judge
> whether there will be empty deletes output.
What’s the problem with best effort “filtering out subsequent/redundant deletes”? Our best effort deletes cache could have semantic: always pass first delete, try to filter out subsequent ones.
Also, instead of doing this deletes caching inside Filter, shouldn’t this be implemented as some separate operator, that we can place inside the plan wherever we deem best (one at the end of the pipeline just before sink?). It could give us more elasticity in the future, including things like extending it into “frequent updates cache”.
Regarding filter following GROUP BY. We might need to think this through. I might be wrong here (haven’t thought this through myself), but isn’t it true, that all filters are pushed down until they reach one of the following:
- join/correlate (?)
- set operations (like intersect/difference)
Except of the source, all of them already have a state, thus “best effort cache” could be merged with them.
Another thing to consider would be primary key issue of the upsert stream. For example, if the primary key matches aggregation key of the GROUP BY, such aggregation is trivial and could be converted to simple projection. In Fabian’s example:
SELECT user, count(*) FROM clicks GROUP BY user HAVING count(*) < 10
If the primary key is user, query is trivial - count will always be at most 1. If primary key differs, to deduplicate deletes we would need to add separate index to handle deduplication efficiently and in this case, such index’s size might be equal to state needed to fully deduplicate deletes in separate operator.
> On 21 Aug 2018, at 16:17, Xingcan Cui <xingcanc@xxxxxxxxx> wrote:
> Hi Hequn,
> Thanks for this discussion.
> Personally, I’m also in favor of option 3. There are two reasons for that:
> (1) A proctime-based upsert table source does not guarantee the records’ order, which means empty delete messages may not really be "empty". Simply discarding them may cause semantics problems.
> (2) Materializing the table in the source doesn't sound like an efficient solution, especially considering the downstream operators may also need to materialize the immediate tables many times.
> Therefore, why not choosing a "lazy strategy", i.e., just forward the messages and let the operators that are sensitive with empty delete to tackle them.
> As for the filtering problem, maybe the best approach would be to cache all the keys that meet the criteria and send a retract message when it changes.
> BTW, recently, I’m getting a more and more intense feeling that maybe we should merge the retract message and upsert message into a unified “update message”. (Append Stream VS Update Stream).
>> On Aug 20, 2018, at 7:51 PM, Piotr Nowojski <piotr@xxxxxxxxxxxxxxxxx> wrote:
>> Thanks for bringing up this issue here.
>> I’m not sure whether sometimes swallowing empty deletes could be a problem or always swallowing/forwarding them is better. I guess for most use cases it doesn't matter. Maybe the best for now would be to always forward them, since if they are a problem, user could handle them somehow, either in custom sink wrapper or in system that’s downstream from Flink. Also maybe we could have this configurable in the future.
>> However this thing seems to me like a much lower priority compared to performance implications. Forcing upsert source to always keep all of the keys on the state is not only costly, but in many cases it can be a blocker from executing a query at all. Not only for the UpsertSource -> Calc -> UpsertSink, but also for example in the future for joins or ORDER BY (especially with LIMIT) as well.
>> I would apply same reasoning to FLINK-9528.
>>> On 19 Aug 2018, at 08:21, Hequn Cheng <chenghequn@xxxxxxxxx> wrote:
>>> Hi all,
>>> Currently, I am working on FLINK-8577 Implement proctime DataStream to
>>> Table upsert conversion <https://issues.apache.org/jira/browse/FLINK-8577>.
>>> And a design doc can be found here
>>> It received many valuable suggestions. Many thanks to all of you.
>>> However there are some problems I think may need more discussion.
>>> 1. *Upsert Stream:* Stream that include a key definition and will be
>>> updated. Message types include insert, update and delete.
>>> 2. *Upsert Source:* Source that ingest Upsert Stream.
>>> 3. *Empty Delete:* For a specific key, the first message is a delete
>>> *Problem to be discussed*
>>> How to handle empty deletes for UpsertSource？
>>> *Ways to solve the problem*
>>> 1. Throw away empty delete messages in the UpsertSource(personally in
>>> favor of this option)
>>> - advantages
>>> - This makes sense in semantics. An empty table + delete message is
>>> still an empty table. Losing deletes does not affect the final results.
>>> - At present, the operators or functions in flink are assumed to
>>> process the add message first and then delete. Throw away
>>> empty deletes in
>>> source, so that the downstream operators do not need to
>>> consider the empty
>>> - disadvantages
>>> - Maintaining the state in source is expensive, especially for some
>>> simple sql like: UpsertSource -> Calc -> UpsertSink.
>>> 2. Throw away empty delete messages when source generate
>>> retractions, otherwise pass empty delete messages down
>>> - advantages
>>> - Downstream operator does not need to consider empty delete messages
>>> when the source generates retraction.
>>> - Performance is better since source don't have to maintain state
>>> if it doesn't generate retractions.
>>> - disadvantages
>>> - The judgment that whether the downstream operator will receive
>>> empty delete messages is complicated. Not only take source into
>>> consideration, but also should consider the operators that
>>> are followed by
>>> source. Take join as an example, for the sql: upsert_source
>>> -> upsert_join,
>>> the join receives empty deletes while in sql(upsert_source ->
>>> group_by ->
>>> upsert_join), the join doesn't since empty deletes are ingested by
>>> - The semantics of how to process empty deletes are not clear.
>>> Users may be difficult to understand, because sometimes empty
>>> deletes are
>>> passed down, but sometimes don't.
>>> 3. Pass empty delete messages down always
>>> - advantages
>>> - Performance is better since source don't have to maintain state if
>>> it doesn't generate retractions.
>>> - disadvantages
>>> - All table operators and functions in flink need to consider empty
>>> *Another related problem*
>>> Another related problem is FLINK-9528 Incorrect results: Filter does not
>>> treat Upsert messages correctly
>>> <https://issues.apache.org/jira/browse/FLINK-9528> which I think should be
>>> considered together.
>>> The problem in FLINK-9528 is, for sql like upsert_source -> filter ->
>>> upsert_sink, when the data of a key changes from non-filtering to
>>> filtering, the filter only removes the upsert message such that the
>>> previous version remains in the result.
>>> 1. One way to solve the problem is to make UpserSource generates
>>> 2. Another way is to make a filter aware of the update semantics
>>> (retract or upsert) and convert the upsert message into a delete message if
>>> the predicate evaluates to false.
>>> The second way will also generate many empty delete messages. To avoid too
>>> many empty deletes, the solution is to maintain a filter state at sink to
>>> prevent the empty deletes from causing devastating pressure on the physical
>>> database. However, if UpsertSource can also output empty deletes, these
>>> empty deletes will be more difficult to control. We don't know where these
>>> deletes come from, and whether should be filtered out. The ambiguity of the
>>> semantics of processing empty deletes makes the user unable to judge
>>> whether there will be empty deletes output.
>>> *My personal opinion*
>>> From my point of view, I think the first option(Throw away empty delete
>>> messages in the UpsertSource) is the best, not only because the semantics
>>> are more clear but also the processing logic of the entire table layer can
>>> be more simple thus more efficient. Furthermore the performance loss is
>>> acceptable (We can even only store key in state when source doesn't
>>> generate retraction).
>>> Any suggestions are greatly appreciated!
>>> Best, Hequn