osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Proposal] Utilities for reading, transforming and creating Streaming savepoints


Hi all!

Thanks for the feedback and I'm happy there is some interest :)
Tomorrow I will start improving the proposal based on the feedback and will
get back to work.

If you are interested working together in this please ping me and we can
discuss some ideas/plans and how to share work.

Cheers,
Gyula

Paris Carbone <parisc@xxxxxx> ezt írta (időpont: 2018. aug. 18., Szo, 9:03):

> +1
>
> Might also be a good start to implement queryable stream state with
> snapshot isolation using that mechanism.
>
> Paris
>
> > On 17 Aug 2018, at 12:28, Gyula Fóra <gyula.fora@xxxxxxxxx> wrote:
> >
> > Hi All!
> >
> > I want to share with you a little project we have been working on at King
> > (with some help from some dataArtisans folks). I think this would be a
> > valuable addition to Flink and solve a bunch of outstanding production
> > use-cases and headaches around state bootstrapping and state analytics.
> >
> > We have built a quick and dirty POC implementation on top of Flink 1.6,
> > please check the README for some nice examples to get a quick idea:
> >
> > https://github.com/king/bravo
> >
> > *Short story*
> > Bravo is a convenient state reader and writer library leveraging the
> > Flink’s batch processing capabilities. It supports processing and writing
> > Flink streaming savepoints. At the moment it only supports processing
> > RocksDB savepoints but this can be extended in the future for other state
> > backends and checkpoint types.
> >
> > Our goal is to cover a few basic features:
> >
> >   - Converting keyed states to Flink DataSets for processing and
> analytics
> >   - Reading/Writing non-keyed operators states
> >   - Bootstrap keyed states from Flink DataSets and create new valid
> >   savepoints
> >   - Transform existing savepoints by replacing/changing some states
> >
> >
> > Some example use-cases:
> >
> >   - Point-in-time state analytics across all operators and keys
> >   - Bootstrap state of a streaming job from external resources such as
> >   reading from database/filesystem
> >   - Validate and potentially repair corrupted state of a streaming job
> >   - Change max parallelism of a job
> >
> >
> > Our main goal is to start working together with other Flink production
> > users and make this something useful that can be part of Flink. So if you
> > have use-cases please talk to us :)
> > I have also started a google doc which contains a little bit more info
> than
> > the readme and could be a starting place for discussions:
> >
> >
> https://docs.google.com/document/d/103k6wPX20kMu5H3SOOXSg5PZIaYpwdhqBMr-ppkFL5E/edit?usp=sharing
> >
> > I know there are a bunch of rough edges and bugs (and no tests) but our
> > motto is: If you are not embarrassed, you released too late :)
> >
> > Please let me know what you think!
> >
> > Cheers,
> > Gyula
>
>