osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Proposal] Utilities for reading, transforming and creating Streaming savepoints


Thanks for the feedback :) I agree that combining this with SQL would give
an extremely nice layer to analyse the states.

Our goal is to contribute this to Flink, I think this should live as part
of the Flink project to make deeper intergration possible in the long run.
Of course a pre-requisite for this is that there is enough production
interest for such a tool, but I believe there should be :)

Gyula

Piotr Nowojski <piotr@xxxxxxxxxxxxxxxxx> ezt írta (időpont: 2018. aug. 17.,
P, 15:07):

> Hi,
>
> Very huge +1 from my side. I found lack of such tool/possibility as a big
> problem for long term maintainability of Flink jobs.
>
> In the long run, I would be delight to see Flink SQL support for those
> things as well. Ad hoc analysis is one of the prime use case of SQL. This
> tool would make analysis possible, while SQL could make them easy to use
> and shorten the feedback loop. Especially in cases when you are not sure
> what you are looking for in the state.
>
> Just to clarify. Is your end goal to contribute such tool to apache Flink
> or do you want it to be separate tool?
>
> Piotrek
>
> > On 17 Aug 2018, at 12:28, Gyula Fóra <gyula.fora@xxxxxxxxx> wrote:
> >
> > Hi All!
> >
> > I want to share with you a little project we have been working on at King
> > (with some help from some dataArtisans folks). I think this would be a
> > valuable addition to Flink and solve a bunch of outstanding production
> > use-cases and headaches around state bootstrapping and state analytics.
> >
> > We have built a quick and dirty POC implementation on top of Flink 1.6,
> > please check the README for some nice examples to get a quick idea:
> >
> > https://github.com/king/bravo
> >
> > *Short story*
> > Bravo is a convenient state reader and writer library leveraging the
> > Flink’s batch processing capabilities. It supports processing and writing
> > Flink streaming savepoints. At the moment it only supports processing
> > RocksDB savepoints but this can be extended in the future for other state
> > backends and checkpoint types.
> >
> > Our goal is to cover a few basic features:
> >
> >   - Converting keyed states to Flink DataSets for processing and
> analytics
> >   - Reading/Writing non-keyed operators states
> >   - Bootstrap keyed states from Flink DataSets and create new valid
> >   savepoints
> >   - Transform existing savepoints by replacing/changing some states
> >
> >
> > Some example use-cases:
> >
> >   - Point-in-time state analytics across all operators and keys
> >   - Bootstrap state of a streaming job from external resources such as
> >   reading from database/filesystem
> >   - Validate and potentially repair corrupted state of a streaming job
> >   - Change max parallelism of a job
> >
> >
> > Our main goal is to start working together with other Flink production
> > users and make this something useful that can be part of Flink. So if you
> > have use-cases please talk to us :)
> > I have also started a google doc which contains a little bit more info
> than
> > the readme and could be a starting place for discussions:
> >
> >
> https://docs.google.com/document/d/103k6wPX20kMu5H3SOOXSg5PZIaYpwdhqBMr-ppkFL5E/edit?usp=sharing
> >
> > I know there are a bunch of rough edges and bugs (and no tests) but our
> > motto is: If you are not embarrassed, you released too late :)
> >
> > Please let me know what you think!
> >
> > Cheers,
> > Gyula
>
>