[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Support Interactive Programming in Flink Table API

Hi Xingcan,

These a great points. We are on the same page regarding potential
capabilities of the proposed changes. There are actually two main parts in
the proposal, the API and the underlying service. Both parts can be
extended in the future.

We made a few design choices when draft the doc to restrict the scope of
this proposal, yet leave room for future extensions. For example, we did
not specify the interface of the underlying TableService. This is because
in the future, we may not only use it as a caching service, but also a
unified storage with functions such as stream/batch storage with indexing,
columnar/row-oriented formatting, schema awareness, etc.

Similarly, WRT API changes, right now we are just adding a cache() method,
and the cached table is only available within the session (it won't be lost
before the session exits). We found it already solves most of our concerns.
We can always add a persist(String tableId) method in the future, and let
the table be accessible globally. But this may introduce a lot of
interesting questions such as what if the table names conflict? Should
there be a session group? What should the life cycle look like for such
tables? Again, we are trying to restrict the scope and leave such questions
to future discussions.


Jiangjie (Becket) Qin

On Thu, Nov 22, 2018 at 12:45 AM Xingcan Cui <xingcanc@xxxxxxxxx> wrote:

> Hi all,
> Thanks for the replies.
> @Becket I think whether putting the persist/cache methods in a separated
> util class or inside the DataSet/Table depends on what we want to
> introduce. The former one sounds more like a data storage component where
> users may even somehow get a stored DataSet/Table via an ID or something,
> whereas the latter one sounds only like a cache mechanism. I’m not quite
> sure what we really need, but either approach is acceptable to me.
> @Shaoxuan Yes, maybe “generally” is a more accurate word here. As the
> TableAPI only works with row type records, I just wondered whether a cache
> for that can be generalized on arbitrary data types. Anyway, if
> contributions can be made to enhance the TableAPI and rebuild other libs on
> it, that’s not a problem. Another point is, as I replied to @Becket,
> whether we introduce only a cache mechanism or a data storage component.
> IMO, compared to data storage, the cache could be volatile, which means it
> only works for (possibly?) accelerating and doesn’t need to absolutely
> guarantee the existence of DataSets/Tables.
> What do you think?
> Best,
> Xingcan
> > On Nov 21, 2018, at 5:44 AM, Ruidong Li <leonxpray@xxxxxxxxx> wrote:
> >
> > Hi Becket,
> >
> > I think the Flink Service is a good abstraction, with which we can easily
> > build Interactive Programing or some other features.
> > We might bring the concept of 'Session', then we can think of Flink
> > Services as system processes and user jobs as user processes, so the
> > management of life cycle need to be discussed.
> >
> > Kind Regards
> > Xpray
> >
> >
> >
> > Xingcan Cui <xingcanc@xxxxxxxxx> 于2018年11月21日周三 上午1:10写道:
> >
> >> Hi Becket,
> >>
> >> Thanks for bringing this up! For a long time, the intermediate cache
> >> problem has always been a pain point of the Flink streaming model. As
> far
> >> as I know, it’s quite a block for iterate operations in batch-related
> libs
> >> such as Gelly and FlinkML.
> >>
> >> Actually, there’s an old JIRA[1], aiming to solve the cache problem more
> >> “thoroughly”. Compared with your proposal, it makes the persistence in
> >> DataSet level, which also allows the internal operations based on the
> >> DataSet API to benefit.
> >>
> >> I totally understand the importance of Table API, but just wonder
> whether
> >> we should consider this problem in a larger view, i.e., adding a
> >> `PersistentService` rather than a `TablePersistentService` (as
> described in
> >> the "Flink Services" section).
> >>
> >> Thanks,
> >> Xingcan
> >>
> >> [1] https://issues.apache.org/jira/browse/FLINK-1730
> >>
> >>> On Nov 20, 2018, at 8:56 AM, Becket Qin <becket.qin@xxxxxxxxx> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> As a few recent email threads have pointed out, it is a promising
> >>> opportunity to enhance Flink Table API in various aspects, including
> >>> functionality and ease of use among others. One of the scenarios where
> we
> >>> feel Flink could improve is interactive programming. To explain the
> >> issues
> >>> and facilitate the discussion on the solution, we put together the
> >>> following document with our proposal.
> >>>
> >>>
> >>
> https://docs.google.com/document/d/1d4T2zTyfe7hdncEUAxrlNOYr4e5IMNEZLyqSuuswkA0/edit?usp=sharing
> >>>
> >>> Feedback and comments are very welcome!
> >>>
> >>> Thanks,
> >>>
> >>> Jiangjie (Becket) Qin
> >>
> >>