[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Support Interactive Programming in Flink Table API

Hi Weihua,

Thanks for the comments. These are great questions!

To answer question 1, I think it depends on what do we want from the cache
service. At this point, it is not quite clear to me whether Flink needs
different caching levels. For example, in Spark, the memory level caching
are mostly used for iteration. I kind of think it is a little ugly to ask
users to explicitly do cache() and uncache() when writing the iterations.
In Flink, the iteration is done more efficiently without requiring user
explicitly managing the cache. BTW, Table API does not have iteration
support at this point, but we have being working on this and will send a
design doc shortly.
Another consideration here is that if we allow pluggable temp table
services, those implementations may not be able to provide all levels of
caching, which will make the cache level a bit confusing.

WRT the cleanup of the temp tables. That is a great point. As of now, the
cleanup is done in the callback when the session exits, i.e. when the
application program finishes. This assumes that the caching service could
host all the cached tables created in the entire session. I agree that an
explicit uncache() could be useful, we should probably add that.

We haven't thought through the FlinkService API yet. A rough idea is that
there will be a ServiceDescriptor/ServiceConfig as the contract between
Flink and user defined service. The service could be configured to either
run in a standalone process or within TMs. That said, FlinkService itself
is probably a big topic and justifies a discussion thread on its own. In
this proposal, it only affects how the default caching service is launched,
we can always adapt that to the FlinkService API once that is nailed.


Jiangjie (Becket) Qin

On Wed, Nov 21, 2018 at 10:42 AM Weihua Jiang <weihua.jiang@xxxxxxxxx>

> Hi Becket,
> The design is quite interesting and useful.
> I have several questions about your design:
> 1. Shall we add some persistence level hint to cache() function for
> different temperature data? E.g. IN_MEM, IN_DISK, etc, or HOTTEST, HOT,
> 2. When will the corresponding cached data be cleaned, by some kind of GC?
> Shall we add uncache() function to allow user manually delete the cached
> data?
> 3.  Must the FlinkService be a running service or Flink will run the
> service in TM?
> Thanks
> Weihua
> Becket Qin <becket.qin@xxxxxxxxx> 于2018年11月20日周二 下午9:56写道:
> > Hi all,
> >
> > As a few recent email threads have pointed out, it is a promising
> > opportunity to enhance Flink Table API in various aspects, including
> > functionality and ease of use among others. One of the scenarios where we
> > feel Flink could improve is interactive programming. To explain the
> issues
> > and facilitate the discussion on the solution, we put together the
> > following document with our proposal.
> >
> >
> >
> https://docs.google.com/document/d/1d4T2zTyfe7hdncEUAxrlNOYr4e5IMNEZLyqSuuswkA0/edit?usp=sharing
> >
> > Feedback and comments are very welcome!
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >