osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Support Interactive Programming in Flink Table API


Hi all,

Thanks for the replies.

@Becket I think whether putting the persist/cache methods in a separated util class or inside the DataSet/Table depends on what we want to introduce. The former one sounds more like a data storage component where users may even somehow get a stored DataSet/Table via an ID or something, whereas the latter one sounds only like a cache mechanism. I’m not quite sure what we really need, but either approach is acceptable to me.

@Shaoxuan Yes, maybe “generally” is a more accurate word here. As the TableAPI only works with row type records, I just wondered whether a cache for that can be generalized on arbitrary data types. Anyway, if contributions can be made to enhance the TableAPI and rebuild other libs on it, that’s not a problem. Another point is, as I replied to @Becket, whether we introduce only a cache mechanism or a data storage component. IMO, compared to data storage, the cache could be volatile, which means it only works for (possibly?) accelerating and doesn’t need to absolutely guarantee the existence of DataSets/Tables.

What do you think?

Best,
Xingcan

> On Nov 21, 2018, at 5:44 AM, Ruidong Li <leonxpray@xxxxxxxxx> wrote:
> 
> Hi Becket,
> 
> I think the Flink Service is a good abstraction, with which we can easily
> build Interactive Programing or some other features.
> We might bring the concept of 'Session', then we can think of Flink
> Services as system processes and user jobs as user processes, so the
> management of life cycle need to be discussed.
> 
> Kind Regards
> Xpray
> 
> 
> 
> Xingcan Cui <xingcanc@xxxxxxxxx> 于2018年11月21日周三 上午1:10写道:
> 
>> Hi Becket,
>> 
>> Thanks for bringing this up! For a long time, the intermediate cache
>> problem has always been a pain point of the Flink streaming model. As far
>> as I know, it’s quite a block for iterate operations in batch-related libs
>> such as Gelly and FlinkML.
>> 
>> Actually, there’s an old JIRA[1], aiming to solve the cache problem more
>> “thoroughly”. Compared with your proposal, it makes the persistence in
>> DataSet level, which also allows the internal operations based on the
>> DataSet API to benefit.
>> 
>> I totally understand the importance of Table API, but just wonder whether
>> we should consider this problem in a larger view, i.e., adding a
>> `PersistentService` rather than a `TablePersistentService` (as described in
>> the "Flink Services" section).
>> 
>> Thanks,
>> Xingcan
>> 
>> [1] https://issues.apache.org/jira/browse/FLINK-1730
>> 
>>> On Nov 20, 2018, at 8:56 AM, Becket Qin <becket.qin@xxxxxxxxx> wrote:
>>> 
>>> Hi all,
>>> 
>>> As a few recent email threads have pointed out, it is a promising
>>> opportunity to enhance Flink Table API in various aspects, including
>>> functionality and ease of use among others. One of the scenarios where we
>>> feel Flink could improve is interactive programming. To explain the
>> issues
>>> and facilitate the discussion on the solution, we put together the
>>> following document with our proposal.
>>> 
>>> 
>> https://docs.google.com/document/d/1d4T2zTyfe7hdncEUAxrlNOYr4e5IMNEZLyqSuuswkA0/edit?usp=sharing
>>> 
>>> Feedback and comments are very welcome!
>>> 
>>> Thanks,
>>> 
>>> Jiangjie (Becket) Qin
>> 
>>