osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Support Interactive Programming in Flink Table API


Thanks for the suggestion, Jincheng.

Yes, I think it makes sense to have a persist() with lifecycle/defined
scope. I just added a section in the future work for this.

 Thanks,

Jiangjie (Becket) Qin

On Fri, Nov 23, 2018 at 1:55 PM jincheng sun <sunjincheng121@xxxxxxxxx>
wrote:

> Hi Jiangjie,
>
> Thank you for the explanation about the name of `cache()`, I understand why
> you designed this way!
>
> Another idea is whether we can specify a lifecycle for data persistence?
> For example, persist (LifeCycle.SESSION), so that the user is not worried
> about data loss, and will clearly specify the time range for keeping time.
> At the same time, if we want to expand, we can also share in a certain
> group of session, for example: LifeCycle.SESSION_GROUP(...), I am not sure,
> just an immature suggestion, for reference only!
>
> Bests,
> Jincheng
>
> Becket Qin <becket.qin@xxxxxxxxx> 于2018年11月23日周五 下午1:33写道:
>
> > Re: Jincheng,
> >
> > Thanks for the feedback. Regarding cache() v.s. persist(), personally I
> > find cache() to be more accurately describing the behavior, i.e. the
> Table
> > is cached for the session, but will be deleted after the session is
> closed.
> > persist() seems a little misleading as people might think the table will
> > still be there even after the session is gone.
> >
> > Great point about mixing the batch and stream processing in the same job.
> > We should absolutely move towards that goal. I imagine that would be a
> huge
> > change across the board, including sources, operators and optimizations,
> to
> > name some. Likely we will need several separate in-depth discussions.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Fri, Nov 23, 2018 at 5:14 AM Xingcan Cui <xingcanc@xxxxxxxxx> wrote:
> >
> > > Hi all,
> > >
> > > @Shaoxuan, I think the lifecycle or access domain are both orthogonal
> to
> > > the cache problem. Essentially, this may be the first time we plan to
> > > introduce another storage mechanism other than the state. Maybe it’s
> > better
> > > to first draw a big picture and then concentrate on a specific part?
> > >
> > > @Becket, yes, actually I am more concerned with the underlying service.
> > > This seems to be quite a major change to the existing codebase. As you
> > > claimed, the service should be extendible to support other components
> and
> > > we’d better discussed it in another thread.
> > >
> > > All in all, I also eager to enjoy the more interactive Table API, in
> case
> > > of a general and flexible enough service mechanism.
> > >
> > > Best,
> > > Xingcan
> > >
> > > > On Nov 22, 2018, at 10:16 AM, Xiaowei Jiang <xiaoweij@xxxxxxxxx>
> > wrote:
> > > >
> > > > Relying on a callback for the temp table for clean up is not very
> > > reliable.
> > > > There is no guarantee that it will be executed successfully. We may
> > risk
> > > > leaks when that happens. I think that it's safer to have an
> association
> > > > between temp table and session id. So we can always clean up temp
> > tables
> > > > which are no longer associated with any active sessions.
> > > >
> > > > Regards,
> > > > Xiaowei
> > > >
> > > > On Thu, Nov 22, 2018 at 12:55 PM jincheng sun <
> > sunjincheng121@xxxxxxxxx>
> > > > wrote:
> > > >
> > > >> Hi Jiangjie&Shaoxuan,
> > > >>
> > > >> Thanks for initiating this great proposal!
> > > >>
> > > >> Interactive Programming is very useful and user friendly in case of
> > your
> > > >> examples.
> > > >> Moreover, especially when a business has to be executed in several
> > > stages
> > > >> with dependencies,such as the pipeline of Flink ML, in order to
> > utilize
> > > the
> > > >> intermediate calculation results we have to submit a job by
> > > env.execute().
> > > >>
> > > >> About the `cache()`  , I think is better to named `persist()`, And
> The
> > > >> Flink framework determines whether we internally cache in memory or
> > > persist
> > > >> to the storage system,Maybe save the data into state backend
> > > >> (MemoryStateBackend or RocksDBStateBackend etc.)
> > > >>
> > > >> BTW, from the points of my view in the future, support for streaming
> > and
> > > >> batch mode switching in the same job will also benefit in
> "Interactive
> > > >> Programming",  I am looking forward to your JIRAs and FLIP!
> > > >>
> > > >> Best,
> > > >> Jincheng
> > > >>
> > > >>
> > > >> Becket Qin <becket.qin@xxxxxxxxx> 于2018年11月20日周二 下午9:56写道:
> > > >>
> > > >>> Hi all,
> > > >>>
> > > >>> As a few recent email threads have pointed out, it is a promising
> > > >>> opportunity to enhance Flink Table API in various aspects,
> including
> > > >>> functionality and ease of use among others. One of the scenarios
> > where
> > > we
> > > >>> feel Flink could improve is interactive programming. To explain the
> > > >> issues
> > > >>> and facilitate the discussion on the solution, we put together the
> > > >>> following document with our proposal.
> > > >>>
> > > >>>
> > > >>>
> > > >>
> > >
> >
> https://docs.google.com/document/d/1d4T2zTyfe7hdncEUAxrlNOYr4e5IMNEZLyqSuuswkA0/edit?usp=sharing
> > > >>>
> > > >>> Feedback and comments are very welcome!
> > > >>>
> > > >>> Thanks,
> > > >>>
> > > >>> Jiangjie (Becket) Qin
> > > >>>
> > > >>
> > >
> > >
> >
>