[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Proposal] Utilities for reading, transforming and creating Streaming savepoints

This is great, Gyula!  A colleague here at Lyft has also done some work
around bootstrapping DataStream programs and we've also talked a bit about
doing this by running DataSet programs.

On Fri, Aug 17, 2018 at 3:28 AM, Gyula Fóra <gyula.fora@xxxxxxxxx> wrote:

> Hi All!
> I want to share with you a little project we have been working on at King
> (with some help from some dataArtisans folks). I think this would be a
> valuable addition to Flink and solve a bunch of outstanding production
> use-cases and headaches around state bootstrapping and state analytics.
> We have built a quick and dirty POC implementation on top of Flink 1.6,
> please check the README for some nice examples to get a quick idea:
> https://github.com/king/bravo
> *Short story*
> Bravo is a convenient state reader and writer library leveraging the
> Flink’s batch processing capabilities. It supports processing and writing
> Flink streaming savepoints. At the moment it only supports processing
> RocksDB savepoints but this can be extended in the future for other state
> backends and checkpoint types.
> Our goal is to cover a few basic features:
>    - Converting keyed states to Flink DataSets for processing and analytics
>    - Reading/Writing non-keyed operators states
>    - Bootstrap keyed states from Flink DataSets and create new valid
>    savepoints
>    - Transform existing savepoints by replacing/changing some states
> Some example use-cases:
>    - Point-in-time state analytics across all operators and keys
>    - Bootstrap state of a streaming job from external resources such as
>    reading from database/filesystem
>    - Validate and potentially repair corrupted state of a streaming job
>    - Change max parallelism of a job
> Our main goal is to start working together with other Flink production
> users and make this something useful that can be part of Flink. So if you
> have use-cases please talk to us :)
> I have also started a google doc which contains a little bit more info than
> the readme and could be a starting place for discussions:
> https://docs.google.com/document/d/103k6wPX20kMu5H3SOOXSg5PZIaYpw
> dhqBMr-ppkFL5E/edit?usp=sharing
> I know there are a bunch of rough edges and bugs (and no tests) but our
> motto is: If you are not embarrassed, you released too late :)
> Please let me know what you think!
> Cheers,
> Gyula