Re: [DISCUSS] Support Interactive Programming in Flink Table API
I think you probably misunderstood our proposal. The proposed “cache()” API
basically infers the data is only available for its session, but not
forever available for other sessions to access. It will be cleaned when the
session exits. “cache” does not imply the underlying implementation only
utilizes the cache. Actually, the default implementation we proposed is
file system. In the future, we may want to extend a “persistent” interface
which allows the application really materializes the data in a QoS/lifetime
I just left a comment and clarified this in the google doc. Feel free to
leave the comment in google doc if you have any further questions.
On Thu, Nov 22, 2018 at 12:45 AM Xingcan Cui <xingcanc@xxxxxxxxx> wrote:
> Hi all,
> Thanks for the replies.
> @Becket I think whether putting the persist/cache methods in a separated
> util class or inside the DataSet/Table depends on what we want to
> introduce. The former one sounds more like a data storage component where
> users may even somehow get a stored DataSet/Table via an ID or something,
> whereas the latter one sounds only like a cache mechanism. I’m not quite
> sure what we really need, but either approach is acceptable to me.
> @Shaoxuan Yes, maybe “generally” is a more accurate word here. As the
> TableAPI only works with row type records, I just wondered whether a cache
> for that can be generalized on arbitrary data types. Anyway, if
> contributions can be made to enhance the TableAPI and rebuild other libs on
> it, that’s not a problem. Another point is, as I replied to @Becket,
> whether we introduce only a cache mechanism or a data storage component.
> IMO, compared to data storage, the cache could be volatile, which means it
> only works for (possibly?) accelerating and doesn’t need to absolutely
> guarantee the existence of DataSets/Tables.
> What do you think?
> > On Nov 21, 2018, at 5:44 AM, Ruidong Li <leonxpray@xxxxxxxxx> wrote:
> > Hi Becket,
> > I think the Flink Service is a good abstraction, with which we can easily
> > build Interactive Programing or some other features.
> > We might bring the concept of 'Session', then we can think of Flink
> > Services as system processes and user jobs as user processes, so the
> > management of life cycle need to be discussed.
> > Kind Regards
> > Xpray
> > Xingcan Cui <xingcanc@xxxxxxxxx> 于2018年11月21日周三 上午1:10写道：
> >> Hi Becket,
> >> Thanks for bringing this up! For a long time, the intermediate cache
> >> problem has always been a pain point of the Flink streaming model. As
> >> as I know, it’s quite a block for iterate operations in batch-related
> >> such as Gelly and FlinkML.
> >> Actually, there’s an old JIRA, aiming to solve the cache problem more
> >> “thoroughly”. Compared with your proposal, it makes the persistence in
> >> DataSet level, which also allows the internal operations based on the
> >> DataSet API to benefit.
> >> I totally understand the importance of Table API, but just wonder
> >> we should consider this problem in a larger view, i.e., adding a
> >> `PersistentService` rather than a `TablePersistentService` (as
> described in
> >> the "Flink Services" section).
> >> Thanks,
> >> Xingcan
> >>  https://issues.apache.org/jira/browse/FLINK-1730
> >>> On Nov 20, 2018, at 8:56 AM, Becket Qin <becket.qin@xxxxxxxxx> wrote:
> >>> Hi all,
> >>> As a few recent email threads have pointed out, it is a promising
> >>> opportunity to enhance Flink Table API in various aspects, including
> >>> functionality and ease of use among others. One of the scenarios where
> >>> feel Flink could improve is interactive programming. To explain the
> >> issues
> >>> and facilitate the discussion on the solution, we put together the
> >>> following document with our proposal.
> >>> Feedback and comments are very welcome!
> >>> Thanks,
> >>> Jiangjie (Becket) Qin