osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss] Proposing FLIP-25 - Support User State TTL Natively in Flink


Hi,

I like the separation of visibility and clean-up.
So far the design only addressed the clean-up aspect and aimed to have
state accessible as long as possible, i.e., until it was cleared.
We did not consider the use case of compliance. Support for strict
visibility is a good idea, IMO.

Moreover, the proposal addresses other aspects that have been discussed
before, such as support for event & processing time and different timer
reset strategies.

Best, Fabian





2018-06-04 9:52 GMT+02:00 sihua zhou <summerleafs@xxxxxxx>:

> Hi andrey,
>
> Thanks for this doc! TBH, personally I prefer the approach you outlined in
> the doc over the previous one that purly based on timers. I think this
> approach looks very similar to the approach I outlined in this thread
> before, so it still face the challenges that @Bowen outlined, but I think
> maybe we can try to overcome them. Will have a closer look at the doc you
> post and leave some comments if I can.
>
> Best, Sihua
>
>
>
> On 06/4/2018 15:27,Andrey Zagrebin<andrey@xxxxxxxxxxxxxxxxx>
> <andrey@xxxxxxxxxxxxxxxxx> wrote:
>
> Hi everybody,
>
> We have been recently brainstorming ideas around state TTL in Flink
> and compiled our thoughts in the following design doc:
> https://docs.google.com/document/d/1SI_WoXAfOd4_
> NKpGyk4yh3mf59g12pSGNXRtNFi-tgM <https://docs.google.com/
> document/d/1SI_WoXAfOd4_NKpGyk4yh3mf59g12pSGNXRtNFi-tgM>
>
> Thanks to the community, many things in the doc are based on the previous
> discussions.
>
> As there are a lot of TTL requirements related to data privacy regulations
> (quite hot topic in EU)
> and better cleanup strategies sometimes need more research and maybe POCs,
> we suggest to start with implementing TTL API itself
> and rather without major changes in current state performance.
>
> In a nut shell, the approach requires only appending expiration timestamp
> bytes to each state value/entry.
> Firstly, just block access to expired state and clean it up on explicit
> touching,
> then gradually adopt cleanup strategies with different guarantees to
> address space concerns better,
> including:
> - filter out expired state during checkpointing process
> - exact cleanup with timer service (though still requires double storing
> of keys in both backends)
> - piggy-back rocksdb compaction using custom filter by TTL (similar to
> cassandra custom filter)
> - cleanup of heap regions around randomly accessed bucket
>
> Please, feel free to give any feedback and comments.
>
> Thanks,
> Andrey
>
> On 27 May 2018, at 09:46, sihua zhou <summerleafs@xxxxxxx> wrote:
>
> Hi Bowen,
>
>
> Thanks for your clarification, I agree that we should wait for the timer
> on RocksDB to be finished, after that we could even do some micro-benchmark
> before start implementing.
>
>
> Best, Sihua
>
>
>
>
>
>
> On 05/27/2018 15:07,Bowen Li<bowenli86@xxxxxxxxx> wrote:
> Thank you Fabian and Sihua. I explained more in the doc itself and its
> comments. I believe the bottom line of v1 are 1) it shouldn't be worse than
> how users implement TTL themselves today, 2) we should develop a generic
> architecture for TTL for all (currently two) state backends (impl can
> vary), 3) optimizations and improvements can come at v2 or later version.
>
> For Sihua proposal, similar to the old plan we came up, I share similar
> concerns as before and wonder if you have answers:
>
> - it requires building custom compaction for both state backends, it's a
> bit unclear in:
> - when and who and how? The 'when' might be the hardest one, because
> it really depends on user's use cases. E.g. if it's once a day, at what
> time in a day?
> - how well it will integrate with Flink's checkpoint/savepoint
> mechanism?
> - any performance regression indications in either state backends?
> - how much is the ROI if it requires very complicated implementation?
> - I'm afraid, eventually, the optimization will easily go to a tricky
> direction we may want to avoid - shall we come up with extra design to
> amortize the cost?
> - I'm afraid the custom compaction logic will have to make some quite
> different assumptions of different state backends. E.g. It's harder to
> estimate total memory required for user's app in Heap statebackend then,
> because it depends on when you trigger the compaction and how strictly you
> will stick to the schedule everyday. Any undeterministic behavior may lead
> to users allocating less memory than enough, and eventually causes user's
> apps to be unstable
> - I want to highlight that lots of users actually want the exact TTL
> feature. How users implement TTL with timers today actually implies that
> their logic depends on exact TTL for both shrinking their state size and
> expiring a key at exactly an expected time, I chatted with a few different
> Flink users recently and confirmed it. That's why I want to add exact TTL
> as a potential goal and motivation if possible, along with relaxed TTL and
> avoiding indefinitely growing state. If we don't provide that out of box,
> many users may still use the timer way themselves
>
> To the concern of doubling keys - in Heap state backend, the key is only a
> reference so there's only one copy, that's not a problem; in rocksdb state
> backend, yes, the state size will bigger. Well, First, I believe this's a
> tradeoff for clearer architecture. Frankly, unlike memory, disk space (even
> SSD) is relatively cheap and accessible, and we don't normally take it as a
> big constraint. Second, w.r.t. to performance, I believe rocksdb timers
> will sit in a different column family than others, which may not cause
> noticeable perf issue. The rocksdb timer service is on is way, and I want
> to see how it's implemented first before asserting if there're truly any
> potential perf burden. Finally, there're also improvements we can make
> after v1, including relaxed TTL and smaller timer state size. E.g. Flink
> can approximate timers within a user configured time range (say within 5
> sec) into a single timer. I don't have concretely plan for that yet, but
> it's doable.
>
> Stefan is adding rocksdb timer and bringing timer service more closely to
> keyed backends, which aligned very well with what we want in this FLIP. I
> suggest we wait and keep a close eye on those efforts, and as they mature,
> we'll have a much better idea of the whole picture.
>
> Thanks, Bowen
>
>
>
> On Sat, May 26, 2018 at 7:52 AM, sihua zhou <summerleafs@xxxxxxx> wrote:
>
>
>
> Hi,
>
>
> thanks for your reply Fabian, about the overhead of storing the key bytes
> twice, I think maybe the situation is even a bit worse, in general, it
> means that the total amount of data to be stored has doubled(for each key,
> we need to store two records, one for timer, one for state). This maybe a
> bit uncomfortable when the state backend is based on RocksDB, because the
> timers are living together with the other states in the same RocksDB
> instance, which means that with using TTL, the amount of the records in
> RocksDB has to be doubled, I'm afraid this may hurt its performance.
>
>
> Concerning the approach to add a timestamp to each value, TBH, I didn't
> have a deep thought on it yet and also not sure about it...In general, it
> can be described as follows:
>
>
> - We attach a TS for every state record.
> - When getting the record, we check the TS to see if its outdated.
> - For the records that we will never touch again, we use the compaction to
> remove them. maybe one day one compaction is enough.
>
>
> Best, Sihua
>
>
> On 05/16/2018 16:38,Fabian Hueske<fhueske@xxxxxxxxx> wrote:
> Hi,
>
>
> Yes. IMO it makes sense to put the logic into the abstract base classes to
> share the implementation across different state backends and state
> primitives.
>
> The overhead of storing the key twice is a valid concern, but I'm not sure
> about the approach to add a timestamp to each value.
> How would we discard state then? Iterating always over all (or a range of)
> keys to check if their state should be expired?
> That would only work efficiently if we relax the clean-up logic which
> could be a valid design decision.
>
>
>
> Best, Fabian
>
>
>
> 2018-05-14 9:33 GMT+02:00 sihua zhou <summerleafs@xxxxxxx>:
>
> Hi Fabian,
> thanks you very much for the reply, just a alternative. Can we implement
> the TTL logical in `AbstractStateBackend` and `AbstractState`? A simplest
> way is to append the `ts` to the state's value? and we use the backend's
> `current time`(its also can be event time and process time) to judge
> whether the data is outdated? The pros is that:
> - state is puly backed by state backend.
> - for each key-value, we only need to store the one copy of key? (if we
> implement TTL base on timer, we need to store two copys of key, one for the
> timer and one for the keyed state)
>
>
> What do you think?
>
>
> Best,
> Sihua
>
>
> On 05/14/2018 15:20,Fabian Hueske<fhueske@xxxxxxxxx> wrote:
> Hi Sihua,
>
>
> I think it makes sense to couple state TTL to the timer service. We'll
> need some kind of timers to expire state, so I think we should reuse
> components that we have instead of implementing another timer service.
>
> Moreover, using the same timer service and using the public state APIs
> helps to have a consistent TTL behavior across different state backend.
>
>
> Best, Fabian
>
>
>
> 2018-05-14 8:51 GMT+02:00 sihua zhou <summerleafs@xxxxxxx>:
>
> Hi Bowen,
> thanks for your doc! I left some comments on the doc, the main concerning
> is that it makes me feel like a coupling that the TTL need to depend on
> `timer`. Because I think the TTL is a property of the state, so it should
> be backed by the state backend. If we implement the TTL base on the timer,
> than it looks like a coupling... it makes me feel that the backend for
> state becomes `state backend` + `timer`. And in fact, IMO, even the `timer`
> should depend on `state backend` in theroy, it's a type of HeapState that
> scoped to the `key group`(not scoped to per key like the current keyed
> state).
>
>
> And I found the doc is for exact TTL, I wonder if we can support a relax
> TTL that could provides a better performance. Because to me, the reason
> that I need TTL is just to prevent the state size growing infinitly(I
> believe I'm not the only one like this), so a relax version is enough, if
> there is a relax TTL which have a better performance, I would prefer that.
>
>
> What do you think?
>
>
> Best,
> Sihua
>
>
>
>
>
>
> On 05/14/2018 14:31,Bowen Li<bowenli86@xxxxxxxxx> wrote:
> Thank you, Fabian! I've created the FLIP-25 page
> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-
> 25%3A+Support+User+State+TTL+Natively>
> .
>
> To continue our discussion points:
> 1. I see what you mean now. I totally agree. Since we don't completely know
> it now, shall we experiment or prototype a little bit before deciding this?
> 2. -
> 3. Adding tags to timers is an option.
>
> Another option I came up with recently, is like this: let
> *InternalTimerService
> *maintains user timers and TTL timers separately. Implementation classes of
> InternalTimerService should add two new collections of timers,  e.g.
> *Ttl*ProcessingTimeTimersQueue
> and *Ttl*EventTimeTimersQueue for HeapInternalTimerService. Upon
> InternalTimerService#onProcessingTime() and advanceWatermark(), they will
> first iterate through ProcessingTimeTimers and EventTimeTimers (user
> timers) and then through *Ttl*ProcessingTimeTimers and *Ttl*EventTimeTimers
>
> (Ttl timers).
>
> We'll also add the following new internal APIs to register Ttl timers:
>
> ```
> @Internal
> public void registerTtlProcessingTimeTimer(N namespace, long time);
>
> @Internal
> public void registerTtlEventTimeTimer(N namespace, long time);
> ```
>
> The biggest advantage, compared to option 1, is that it doesn't impact
> existing timer-related checkpoint/savepoint, restore and migrations.
>
> What do you think?  And, any other Flink committers want to chime in for
> ideas? I've also documented the above two discussion points to the FLIP
> page.
>
> Thanks,
> Bowen
>
>
> On Wed, May 2, 2018 at 5:36 AM, Fabian Hueske <fhueske@xxxxxxxxx> wrote:
>
>
> Hi Bowen,
>
> 1. The motivation to keep the TTL logic outside of the state backend was
> mainly to avoid state backend custom implementations. If we have a generic
> approach that would work for all state backends, we could try to put the
> logic into a base class like AbstractStateBackend. After all, state cleanup
> is tightly related to the responsibilities of state backends.
> 2. -
> 3. You're right. We should first call the user code before cleaning up.
> The main problem that I see right now is that we have to distinguish
> between user and TTL timers. AFAIK, the timer service does not support
> timer tags (or another method) to distinguish timers.
>
> I've given you the permissions to create and edit wiki pages.
>
> Best, Fabian
>
> 2018-04-30 7:47 GMT+02:00 Bowen Li <bowenli86@xxxxxxxxx>:
>
>
> Thanks Fabian! Here're my comments inline, and let me know your thoughts.
>
> 1. Where should the TTL code reside? In the state backend or in the
> operator?
>
> I believe TTL code should not reside in state backend, because a critical
> design is that TTL is independent of and transparent to state backends.
>
> According to my current knowledge, I think it probably should live with
> operators in flink-streaming-java.
>
>
> 2. How to get notified about state accesses? I guess this depends on 1.
>
> You previously suggested using callbacks. I believe that's the right way
> to do decoupling.
>
>
> 3. How to avoid conflicts of TTL timers and user timers?
>
> User timers might always be invoked first? This is not urgent, shall we
> bake it for more time and discuss it along the way?
>
>
>
> Besides, I don't have access to create a FLIP page under
> https://cwiki.apache.org/confluence/display/FLINK/Flin
> k+Improvement+Proposals. Can you grant me the proper access?
>
> Thanks,
>
> Bowen
>
>
>
>
> On Tue, Apr 24, 2018 at 2:40 AM, Fabian Hueske <fhueske@xxxxxxxxx> wrote:
>
>
> Hi Bowen,
>
> Thanks for updating the proposal. This looks pretty good (as I said
> before).
> There are a few areas, that are not yet fully fleshed out:
>
> 1. Where should the TTL code reside? In the state backend or in the
> operator?
> 2. How to get notified about state accesses? I guess this depends on 1.
> 3. How to avoid conflicts of TTL timers and user timers?
>
> @Stefan (in CC) might have some ideas on these issues as well.
>
> Cheers, Fabian
>
> 2018-04-22 21:14 GMT+02:00 Bowen <bowenli86@xxxxxxxxx>:
>
>
> Hello community,
>
> We've come up with a completely new design for Flink state TTL, documented
> here
>
> <https://docs.google.com/document/d/1PPwHMDHGWMOO8YGv08BNi8Fqe_
> h7SjeALRzmW-ZxSfY/edit?usp=sharing>,
>
> and have run it by a few Flink PMC/committers.
>
> What do you think? We'd love to hear feedbacks from you
>
> Thanks,
> Bowen
>
>
> On Wed, Feb 7, 2018 at 12:53 PM, Fabian Hueske <fhueske@xxxxxxxxx>
> wrote:
>
> Hi Bowen,
>
> Thanks for the proposal! I think state TTL would be a great feature!
> Actually, we have implemented this for SQL / Table API [1].
> I've added a couple of comments to the design doc.
>
> In principle, I'm not sure if this functionality should be added to the
> state backends.
> We could also use the existing timer service which would have a few
> nice
> benefits (see my comments in the docs).
>
> Best, Fabian
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.4/
> dev/table/streaming.html#idle-state-retention-time
>
> 2018-02-06 8:26 GMT+01:00 Bowen Li <bowenli86@xxxxxxxxx>:
>
> Hi guys,
>
> I want to propose a new FLIP -- FLIP-25 - Support User State TTL
> Natively
> in Flink. This has been one of most handy and most frequently asked
> features in Flink community. The jira ticket is FLINK-3089
> <https://issues.apache.org/jira/browse/FLINK-3089>.
>
> I've written a rough design
> <https://docs.google.com/document/d/1qiFqtCC80n4oFmmfxBWXrCd37mYKc
> uureyEr_nPAvSo/edit#>
> doc
> <https://docs.google.com/document/d/1qiFqtCC80n4oFmmfxBWXrCd37mYKc
> uureyEr_nPAvSo/edit#>,
> and developed prototypes for both heap and rocksdb state backends.
>
> My question is: shall we create a FLIP page for this? Can I be
> granted the
> privileges of creating pages in
> https://cwiki.apache.org/confluence/display/FLINK/
> Flink+Improvement+Proposals
> ?
>
> Thanks,
> Bowen
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>