osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Proposal] Utilities for reading, transforming and creating Streaming savepoints


Hi Shuyi,

The tool allows you to convert a Flink DataSet containing the individual
state rows, (key, state) pairs, into a state for a streaming operator.

So if you want to bootstrap your state from data on HDFS, you would read
the file in Flink, convert it to the required DataSet format, then you can
write the state for the desired operator.
As you can read many different input formats in Flink this gives large
flexibility to where you can bootstrap the state from.

Gyula

Shuyi Chen <suez1224@xxxxxxxxx> ezt írta (időpont: 2018. aug. 22., Sze,
21:28):

> +1 on the tooling. Also, you mentioned about state bootstrapping problem.
> Could you please elaborate on how we can leverage the tooling to solve
> state bootstrapping? I think this is a common problem to stream processing,
> and it will be great the community can work on it. Thanks.
>
> Shuyi
>
> On Wed, Aug 22, 2018 at 11:51 AM Gyula Fóra <gyula.fora@xxxxxxxxx> wrote:
>
> > Thanks,
> >
> > I guess the first thing that would be great help from anyone interested
> in
> > helping is to try it for some streaming state :)
> >
> > We have tested these tools at King to analyze, transform and perform some
> > aggregations on our user-states. The major limitation is that it requires
> > RocksDB savepoints to work but other than that we successfully analyzed a
> > few hundred gigabytes of state including reading keyed, and broadcast
> > states from different operators. Also you need to have a savepoint before
> > you can create a new savepoint (with whatever state).
> >
> > Once we have some people who have played with it we can probably greatly
> > improve the API and user experience as it is pretty low level at the
> > moment. I suggest we use the King git repo <
> https://github.com/king/bravo>
> > for
> > now to track some features before it is in a shape that deserves a Flink
> > PR. We are super happy to take any improvements, code contributions from
> > anyone so dont hesitate to reach out to me if you have some ideas.
> >
> > Gyula
> >
> >
> > Rong Rong <walterddr@xxxxxxxxx> ezt írta (időpont: 2018. aug. 22., Sze,
> > 17:06):
> >
> > > +1. Being able to analyze the state is a huge operational advantage.
> > > Thanks Gyula for the POC and I would be very interested in contributing
> > to
> > > the work.
> > >
> > > --
> > > Rong
> > >
> > > On Tue, Aug 21, 2018 at 4:26 AM Till Rohrmann <trohrmann@xxxxxxxxxx>
> > > wrote:
> > >
> > > > big +1 for this feature. A tool to get your state out of and into
> Flink
> > > > will be tremendously helpful.
> > > >
> > > > On Mon, Aug 20, 2018 at 10:21 AM Aljoscha Krettek <
> aljoscha@xxxxxxxxxx
> > >
> > > > wrote:
> > > >
> > > > > +1 I'd like to have something like this in Flink a lot!
> > > > >
> > > > > > On 19. Aug 2018, at 11:57, Gyula Fóra <gyula.fora@xxxxxxxxx>
> > wrote:
> > > > > >
> > > > > > Hi all!
> > > > > >
> > > > > > Thanks for the feedback and I'm happy there is some interest :)
> > > > > > Tomorrow I will start improving the proposal based on the
> feedback
> > > and
> > > > > will
> > > > > > get back to work.
> > > > > >
> > > > > > If you are interested working together in this please ping me and
> > we
> > > > can
> > > > > > discuss some ideas/plans and how to share work.
> > > > > >
> > > > > > Cheers,
> > > > > > Gyula
> > > > > >
> > > > > > Paris Carbone <parisc@xxxxxx> ezt írta (időpont: 2018. aug. 18.,
> > > Szo,
> > > > > 9:03):
> > > > > >
> > > > > >> +1
> > > > > >>
> > > > > >> Might also be a good start to implement queryable stream state
> > with
> > > > > >> snapshot isolation using that mechanism.
> > > > > >>
> > > > > >> Paris
> > > > > >>
> > > > > >>> On 17 Aug 2018, at 12:28, Gyula Fóra <gyula.fora@xxxxxxxxx>
> > wrote:
> > > > > >>>
> > > > > >>> Hi All!
> > > > > >>>
> > > > > >>> I want to share with you a little project we have been working
> on
> > > at
> > > > > King
> > > > > >>> (with some help from some dataArtisans folks). I think this
> would
> > > be
> > > > a
> > > > > >>> valuable addition to Flink and solve a bunch of outstanding
> > > > production
> > > > > >>> use-cases and headaches around state bootstrapping and state
> > > > analytics.
> > > > > >>>
> > > > > >>> We have built a quick and dirty POC implementation on top of
> > Flink
> > > > 1.6,
> > > > > >>> please check the README for some nice examples to get a quick
> > idea:
> > > > > >>>
> > > > > >>> https://github.com/king/bravo
> > > > > >>>
> > > > > >>> *Short story*
> > > > > >>> Bravo is a convenient state reader and writer library
> leveraging
> > > the
> > > > > >>> Flink’s batch processing capabilities. It supports processing
> and
> > > > > writing
> > > > > >>> Flink streaming savepoints. At the moment it only supports
> > > processing
> > > > > >>> RocksDB savepoints but this can be extended in the future for
> > other
> > > > > state
> > > > > >>> backends and checkpoint types.
> > > > > >>>
> > > > > >>> Our goal is to cover a few basic features:
> > > > > >>>
> > > > > >>>  - Converting keyed states to Flink DataSets for processing and
> > > > > >> analytics
> > > > > >>>  - Reading/Writing non-keyed operators states
> > > > > >>>  - Bootstrap keyed states from Flink DataSets and create new
> > valid
> > > > > >>>  savepoints
> > > > > >>>  - Transform existing savepoints by replacing/changing some
> > states
> > > > > >>>
> > > > > >>>
> > > > > >>> Some example use-cases:
> > > > > >>>
> > > > > >>>  - Point-in-time state analytics across all operators and keys
> > > > > >>>  - Bootstrap state of a streaming job from external resources
> > such
> > > as
> > > > > >>>  reading from database/filesystem
> > > > > >>>  - Validate and potentially repair corrupted state of a
> streaming
> > > job
> > > > > >>>  - Change max parallelism of a job
> > > > > >>>
> > > > > >>>
> > > > > >>> Our main goal is to start working together with other Flink
> > > > production
> > > > > >>> users and make this something useful that can be part of Flink.
> > So
> > > if
> > > > > you
> > > > > >>> have use-cases please talk to us :)
> > > > > >>> I have also started a google doc which contains a little bit
> more
> > > > info
> > > > > >> than
> > > > > >>> the readme and could be a starting place for discussions:
> > > > > >>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/103k6wPX20kMu5H3SOOXSg5PZIaYpwdhqBMr-ppkFL5E/edit?usp=sharing
> > > > > >>>
> > > > > >>> I know there are a bunch of rough edges and bugs (and no tests)
> > but
> > > > our
> > > > > >>> motto is: If you are not embarrassed, you released too late :)
> > > > > >>>
> > > > > >>> Please let me know what you think!
> > > > > >>>
> > > > > >>> Cheers,
> > > > > >>> Gyula
> > > > > >>
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> "So you have to trust that the dots will somehow connect in your future."
>