osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Support Interactive Programming in Flink Table API


Hi Xiaowei,

Thanks for the comment. That is a valid point.

The callback is not only associated with a particular temp table. It is a
clean up logic provided by the user. The temp table to session ID mapping
is tracked internally. We also need to associate the callback with the
session lifecycle and make sure it will be invoked when the session exits,
whether normally or abnormally. We haven't decided how exactly that should
be done yet. Several options being explored are:
1. Invoke the callback the Yarn application session shutdown hook if there
is one. (probably the best option if available)
2. Put the logic into Yarn AM.
3. Launch a WatchDog service and let it heartbeat to the client. If the
client indicates the session is closed or the client goes away
accidentally, the cleanup service will just kick in.

In any case, the callback is unlikely to be invoked on the client side.

Thanks,

Jiangjie (Becket) Qin


On Fri, Nov 23, 2018 at 1:32 PM Becket Qin <becket.qin@xxxxxxxxx> wrote:

> Re: Jincheng,
>
> Thanks for the feedback. Regarding cache() v.s. persist(), personally I
> find cache() to be more accurately describing the behavior, i.e. the Table
> is cached for the session, but will be deleted after the session is closed.
> persist() seems a little misleading as people might think the table will
> still be there even after the session is gone.
>
> Great point about mixing the batch and stream processing in the same job.
> We should absolutely move towards that goal. I imagine that would be a huge
> change across the board, including sources, operators and optimizations, to
> name some. Likely we will need several separate in-depth discussions.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Fri, Nov 23, 2018 at 5:14 AM Xingcan Cui <xingcanc@xxxxxxxxx> wrote:
>
>> Hi all,
>>
>> @Shaoxuan, I think the lifecycle or access domain are both orthogonal to
>> the cache problem. Essentially, this may be the first time we plan to
>> introduce another storage mechanism other than the state. Maybe it’s better
>> to first draw a big picture and then concentrate on a specific part?
>>
>> @Becket, yes, actually I am more concerned with the underlying service.
>> This seems to be quite a major change to the existing codebase. As you
>> claimed, the service should be extendible to support other components and
>> we’d better discussed it in another thread.
>>
>> All in all, I also eager to enjoy the more interactive Table API, in case
>> of a general and flexible enough service mechanism.
>>
>> Best,
>> Xingcan
>>
>> > On Nov 22, 2018, at 10:16 AM, Xiaowei Jiang <xiaoweij@xxxxxxxxx> wrote:
>> >
>> > Relying on a callback for the temp table for clean up is not very
>> reliable.
>> > There is no guarantee that it will be executed successfully. We may risk
>> > leaks when that happens. I think that it's safer to have an association
>> > between temp table and session id. So we can always clean up temp tables
>> > which are no longer associated with any active sessions.
>> >
>> > Regards,
>> > Xiaowei
>> >
>> > On Thu, Nov 22, 2018 at 12:55 PM jincheng sun <sunjincheng121@xxxxxxxxx
>> >
>> > wrote:
>> >
>> >> Hi Jiangjie&Shaoxuan,
>> >>
>> >> Thanks for initiating this great proposal!
>> >>
>> >> Interactive Programming is very useful and user friendly in case of
>> your
>> >> examples.
>> >> Moreover, especially when a business has to be executed in several
>> stages
>> >> with dependencies,such as the pipeline of Flink ML, in order to
>> utilize the
>> >> intermediate calculation results we have to submit a job by
>> env.execute().
>> >>
>> >> About the `cache()`  , I think is better to named `persist()`, And The
>> >> Flink framework determines whether we internally cache in memory or
>> persist
>> >> to the storage system,Maybe save the data into state backend
>> >> (MemoryStateBackend or RocksDBStateBackend etc.)
>> >>
>> >> BTW, from the points of my view in the future, support for streaming
>> and
>> >> batch mode switching in the same job will also benefit in "Interactive
>> >> Programming",  I am looking forward to your JIRAs and FLIP!
>> >>
>> >> Best,
>> >> Jincheng
>> >>
>> >>
>> >> Becket Qin <becket.qin@xxxxxxxxx> 于2018年11月20日周二 下午9:56写道:
>> >>
>> >>> Hi all,
>> >>>
>> >>> As a few recent email threads have pointed out, it is a promising
>> >>> opportunity to enhance Flink Table API in various aspects, including
>> >>> functionality and ease of use among others. One of the scenarios
>> where we
>> >>> feel Flink could improve is interactive programming. To explain the
>> >> issues
>> >>> and facilitate the discussion on the solution, we put together the
>> >>> following document with our proposal.
>> >>>
>> >>>
>> >>>
>> >>
>> https://docs.google.com/document/d/1d4T2zTyfe7hdncEUAxrlNOYr4e5IMNEZLyqSuuswkA0/edit?usp=sharing
>> >>>
>> >>> Feedback and comments are very welcome!
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Jiangjie (Becket) Qin
>> >>>
>> >>
>>
>>