osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Proposal] Utilities for reading, transforming and creating Streaming savepoints


+1 on the tooling. Also, you mentioned about state bootstrapping problem.
Could you please elaborate on how we can leverage the tooling to solve
state bootstrapping? I think this is a common problem to stream processing,
and it will be great the community can work on it. Thanks.

Shuyi

On Wed, Aug 22, 2018 at 11:51 AM Gyula Fóra <gyula.fora@xxxxxxxxx> wrote:

> Thanks,
>
> I guess the first thing that would be great help from anyone interested in
> helping is to try it for some streaming state :)
>
> We have tested these tools at King to analyze, transform and perform some
> aggregations on our user-states. The major limitation is that it requires
> RocksDB savepoints to work but other than that we successfully analyzed a
> few hundred gigabytes of state including reading keyed, and broadcast
> states from different operators. Also you need to have a savepoint before
> you can create a new savepoint (with whatever state).
>
> Once we have some people who have played with it we can probably greatly
> improve the API and user experience as it is pretty low level at the
> moment. I suggest we use the King git repo <https://github.com/king/bravo>
> for
> now to track some features before it is in a shape that deserves a Flink
> PR. We are super happy to take any improvements, code contributions from
> anyone so dont hesitate to reach out to me if you have some ideas.
>
> Gyula
>
>
> Rong Rong <walterddr@xxxxxxxxx> ezt írta (időpont: 2018. aug. 22., Sze,
> 17:06):
>
> > +1. Being able to analyze the state is a huge operational advantage.
> > Thanks Gyula for the POC and I would be very interested in contributing
> to
> > the work.
> >
> > --
> > Rong
> >
> > On Tue, Aug 21, 2018 at 4:26 AM Till Rohrmann <trohrmann@xxxxxxxxxx>
> > wrote:
> >
> > > big +1 for this feature. A tool to get your state out of and into Flink
> > > will be tremendously helpful.
> > >
> > > On Mon, Aug 20, 2018 at 10:21 AM Aljoscha Krettek <aljoscha@xxxxxxxxxx
> >
> > > wrote:
> > >
> > > > +1 I'd like to have something like this in Flink a lot!
> > > >
> > > > > On 19. Aug 2018, at 11:57, Gyula Fóra <gyula.fora@xxxxxxxxx>
> wrote:
> > > > >
> > > > > Hi all!
> > > > >
> > > > > Thanks for the feedback and I'm happy there is some interest :)
> > > > > Tomorrow I will start improving the proposal based on the feedback
> > and
> > > > will
> > > > > get back to work.
> > > > >
> > > > > If you are interested working together in this please ping me and
> we
> > > can
> > > > > discuss some ideas/plans and how to share work.
> > > > >
> > > > > Cheers,
> > > > > Gyula
> > > > >
> > > > > Paris Carbone <parisc@xxxxxx> ezt írta (időpont: 2018. aug. 18.,
> > Szo,
> > > > 9:03):
> > > > >
> > > > >> +1
> > > > >>
> > > > >> Might also be a good start to implement queryable stream state
> with
> > > > >> snapshot isolation using that mechanism.
> > > > >>
> > > > >> Paris
> > > > >>
> > > > >>> On 17 Aug 2018, at 12:28, Gyula Fóra <gyula.fora@xxxxxxxxx>
> wrote:
> > > > >>>
> > > > >>> Hi All!
> > > > >>>
> > > > >>> I want to share with you a little project we have been working on
> > at
> > > > King
> > > > >>> (with some help from some dataArtisans folks). I think this would
> > be
> > > a
> > > > >>> valuable addition to Flink and solve a bunch of outstanding
> > > production
> > > > >>> use-cases and headaches around state bootstrapping and state
> > > analytics.
> > > > >>>
> > > > >>> We have built a quick and dirty POC implementation on top of
> Flink
> > > 1.6,
> > > > >>> please check the README for some nice examples to get a quick
> idea:
> > > > >>>
> > > > >>> https://github.com/king/bravo
> > > > >>>
> > > > >>> *Short story*
> > > > >>> Bravo is a convenient state reader and writer library leveraging
> > the
> > > > >>> Flink’s batch processing capabilities. It supports processing and
> > > > writing
> > > > >>> Flink streaming savepoints. At the moment it only supports
> > processing
> > > > >>> RocksDB savepoints but this can be extended in the future for
> other
> > > > state
> > > > >>> backends and checkpoint types.
> > > > >>>
> > > > >>> Our goal is to cover a few basic features:
> > > > >>>
> > > > >>>  - Converting keyed states to Flink DataSets for processing and
> > > > >> analytics
> > > > >>>  - Reading/Writing non-keyed operators states
> > > > >>>  - Bootstrap keyed states from Flink DataSets and create new
> valid
> > > > >>>  savepoints
> > > > >>>  - Transform existing savepoints by replacing/changing some
> states
> > > > >>>
> > > > >>>
> > > > >>> Some example use-cases:
> > > > >>>
> > > > >>>  - Point-in-time state analytics across all operators and keys
> > > > >>>  - Bootstrap state of a streaming job from external resources
> such
> > as
> > > > >>>  reading from database/filesystem
> > > > >>>  - Validate and potentially repair corrupted state of a streaming
> > job
> > > > >>>  - Change max parallelism of a job
> > > > >>>
> > > > >>>
> > > > >>> Our main goal is to start working together with other Flink
> > > production
> > > > >>> users and make this something useful that can be part of Flink.
> So
> > if
> > > > you
> > > > >>> have use-cases please talk to us :)
> > > > >>> I have also started a google doc which contains a little bit more
> > > info
> > > > >> than
> > > > >>> the readme and could be a starting place for discussions:
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://docs.google.com/document/d/103k6wPX20kMu5H3SOOXSg5PZIaYpwdhqBMr-ppkFL5E/edit?usp=sharing
> > > > >>>
> > > > >>> I know there are a bunch of rough edges and bugs (and no tests)
> but
> > > our
> > > > >>> motto is: If you are not embarrassed, you released too late :)
> > > > >>>
> > > > >>> Please let me know what you think!
> > > > >>>
> > > > >>> Cheers,
> > > > >>> Gyula
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> >
>


-- 
"So you have to trust that the dots will somehow connect in your future."