osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Proposal] Utilities for reading, transforming and creating Streaming savepoints


Hi Gyula,

+1 for this feature. State bootstrapping and state analytics will be very
helpful.

On Thu, Aug 23, 2018 at 4:09 AM Gyula Fóra <gyula.fora@xxxxxxxxx> wrote:

> Hi Shuyi,
>
> The tool allows you to convert a Flink DataSet containing the individual
> state rows, (key, state) pairs, into a state for a streaming operator.
>
> So if you want to bootstrap your state from data on HDFS, you would read
> the file in Flink, convert it to the required DataSet format, then you can
> write the state for the desired operator.
> As you can read many different input formats in Flink this gives large
> flexibility to where you can bootstrap the state from.
>
> Gyula
>
> Shuyi Chen <suez1224@xxxxxxxxx> ezt írta (időpont: 2018. aug. 22., Sze,
> 21:28):
>
> > +1 on the tooling. Also, you mentioned about state bootstrapping problem.
> > Could you please elaborate on how we can leverage the tooling to solve
> > state bootstrapping? I think this is a common problem to stream
> processing,
> > and it will be great the community can work on it. Thanks.
> >
> > Shuyi
> >
> > On Wed, Aug 22, 2018 at 11:51 AM Gyula Fóra <gyula.fora@xxxxxxxxx>
> wrote:
> >
> > > Thanks,
> > >
> > > I guess the first thing that would be great help from anyone interested
> > in
> > > helping is to try it for some streaming state :)
> > >
> > > We have tested these tools at King to analyze, transform and perform
> some
> > > aggregations on our user-states. The major limitation is that it
> requires
> > > RocksDB savepoints to work but other than that we successfully
> analyzed a
> > > few hundred gigabytes of state including reading keyed, and broadcast
> > > states from different operators. Also you need to have a savepoint
> before
> > > you can create a new savepoint (with whatever state).
> > >
> > > Once we have some people who have played with it we can probably
> greatly
> > > improve the API and user experience as it is pretty low level at the
> > > moment. I suggest we use the King git repo <
> > https://github.com/king/bravo>
> > > for
> > > now to track some features before it is in a shape that deserves a
> Flink
> > > PR. We are super happy to take any improvements, code contributions
> from
> > > anyone so dont hesitate to reach out to me if you have some ideas.
> > >
> > > Gyula
> > >
> > >
> > > Rong Rong <walterddr@xxxxxxxxx> ezt írta (időpont: 2018. aug. 22.,
> Sze,
> > > 17:06):
> > >
> > > > +1. Being able to analyze the state is a huge operational advantage.
> > > > Thanks Gyula for the POC and I would be very interested in
> contributing
> > > to
> > > > the work.
> > > >
> > > > --
> > > > Rong
> > > >
> > > > On Tue, Aug 21, 2018 at 4:26 AM Till Rohrmann <trohrmann@xxxxxxxxxx>
> > > > wrote:
> > > >
> > > > > big +1 for this feature. A tool to get your state out of and into
> > Flink
> > > > > will be tremendously helpful.
> > > > >
> > > > > On Mon, Aug 20, 2018 at 10:21 AM Aljoscha Krettek <
> > aljoscha@xxxxxxxxxx
> > > >
> > > > > wrote:
> > > > >
> > > > > > +1 I'd like to have something like this in Flink a lot!
> > > > > >
> > > > > > > On 19. Aug 2018, at 11:57, Gyula Fóra <gyula.fora@xxxxxxxxx>
> > > wrote:
> > > > > > >
> > > > > > > Hi all!
> > > > > > >
> > > > > > > Thanks for the feedback and I'm happy there is some interest :)
> > > > > > > Tomorrow I will start improving the proposal based on the
> > feedback
> > > > and
> > > > > > will
> > > > > > > get back to work.
> > > > > > >
> > > > > > > If you are interested working together in this please ping me
> and
> > > we
> > > > > can
> > > > > > > discuss some ideas/plans and how to share work.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Gyula
> > > > > > >
> > > > > > > Paris Carbone <parisc@xxxxxx> ezt írta (időpont: 2018. aug.
> 18.,
> > > > Szo,
> > > > > > 9:03):
> > > > > > >
> > > > > > >> +1
> > > > > > >>
> > > > > > >> Might also be a good start to implement queryable stream state
> > > with
> > > > > > >> snapshot isolation using that mechanism.
> > > > > > >>
> > > > > > >> Paris
> > > > > > >>
> > > > > > >>> On 17 Aug 2018, at 12:28, Gyula Fóra <gyula.fora@xxxxxxxxx>
> > > wrote:
> > > > > > >>>
> > > > > > >>> Hi All!
> > > > > > >>>
> > > > > > >>> I want to share with you a little project we have been
> working
> > on
> > > > at
> > > > > > King
> > > > > > >>> (with some help from some dataArtisans folks). I think this
> > would
> > > > be
> > > > > a
> > > > > > >>> valuable addition to Flink and solve a bunch of outstanding
> > > > > production
> > > > > > >>> use-cases and headaches around state bootstrapping and state
> > > > > analytics.
> > > > > > >>>
> > > > > > >>> We have built a quick and dirty POC implementation on top of
> > > Flink
> > > > > 1.6,
> > > > > > >>> please check the README for some nice examples to get a quick
> > > idea:
> > > > > > >>>
> > > > > > >>> https://github.com/king/bravo
> > > > > > >>>
> > > > > > >>> *Short story*
> > > > > > >>> Bravo is a convenient state reader and writer library
> > leveraging
> > > > the
> > > > > > >>> Flink’s batch processing capabilities. It supports processing
> > and
> > > > > > writing
> > > > > > >>> Flink streaming savepoints. At the moment it only supports
> > > > processing
> > > > > > >>> RocksDB savepoints but this can be extended in the future for
> > > other
> > > > > > state
> > > > > > >>> backends and checkpoint types.
> > > > > > >>>
> > > > > > >>> Our goal is to cover a few basic features:
> > > > > > >>>
> > > > > > >>>  - Converting keyed states to Flink DataSets for processing
> and
> > > > > > >> analytics
> > > > > > >>>  - Reading/Writing non-keyed operators states
> > > > > > >>>  - Bootstrap keyed states from Flink DataSets and create new
> > > valid
> > > > > > >>>  savepoints
> > > > > > >>>  - Transform existing savepoints by replacing/changing some
> > > states
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> Some example use-cases:
> > > > > > >>>
> > > > > > >>>  - Point-in-time state analytics across all operators and
> keys
> > > > > > >>>  - Bootstrap state of a streaming job from external resources
> > > such
> > > > as
> > > > > > >>>  reading from database/filesystem
> > > > > > >>>  - Validate and potentially repair corrupted state of a
> > streaming
> > > > job
> > > > > > >>>  - Change max parallelism of a job
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> Our main goal is to start working together with other Flink
> > > > > production
> > > > > > >>> users and make this something useful that can be part of
> Flink.
> > > So
> > > > if
> > > > > > you
> > > > > > >>> have use-cases please talk to us :)
> > > > > > >>> I have also started a google doc which contains a little bit
> > more
> > > > > info
> > > > > > >> than
> > > > > > >>> the readme and could be a starting place for discussions:
> > > > > > >>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/103k6wPX20kMu5H3SOOXSg5PZIaYpwdhqBMr-ppkFL5E/edit?usp=sharing
> > > > > > >>>
> > > > > > >>> I know there are a bunch of rough edges and bugs (and no
> tests)
> > > but
> > > > > our
> > > > > > >>> motto is: If you are not embarrassed, you released too late
> :)
> > > > > > >>>
> > > > > > >>> Please let me know what you think!
> > > > > > >>>
> > > > > > >>> Cheers,
> > > > > > >>> Gyula
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > "So you have to trust that the dots will somehow connect in your future."
> >
>