osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Support Interactive Programming in Flink Table API


Hi Xingcan,

Thanks for the comments. Yes, "cache/persistent the intermediate data" is
useful. It can bring benefit to many scenarios. But different scenarios may
have different ways to solve it. For instance, as I replied to
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html,
I
expect FlinkML to be implemented on top of tableAPI in the near future. We
already have some ideas/prototypes about how to do the iterations on
tableAPI. Will share it to the DEV soon.

I am not sure what you mean by “more thoroughly”. If you are referring to
"more general”, I think the underlying implementation of our proposal can
indeed extend to other APIs. But for now we want to focus on the tableAPI,
as we see lots of the user interests on tableAPI as oppose to dataset. As
you may already read, our proposal basically consists of two parts, one of
which is the changes on the tableAPI, including the table.cache() and how
to hook the table/store service in the table environment. The other one is
to provide a table/store service interface, with which the user can
plug/config different table / storeService according to their own
environment. It is not difficult to implement the same functionality for
dataset as what we proposed.

Regards,
Shaoxuan


On Wed, Nov 21, 2018 at 1:10 AM Xingcan Cui <xingcanc@xxxxxxxxx> wrote:

> Hi Becket,
>
> Thanks for bringing this up! For a long time, the intermediate cache
> problem has always been a pain point of the Flink streaming model. As far
> as I know, it’s quite a block for iterate operations in batch-related libs
> such as Gelly and FlinkML.
>
> Actually, there’s an old JIRA[1], aiming to solve the cache problem more
> “thoroughly”. Compared with your proposal, it makes the persistence in
> DataSet level, which also allows the internal operations based on the
> DataSet API to benefit.
> I totally understand the importance of Table API, but just wonder whether
> we should consider this problem in a larger view, i.e., adding a
> `PersistentService` rather than a `TablePersistentService` (as described in
> the "Flink Services" section).


> Thanks,
> Xingcan
>
> [1] https://issues.apache.org/jira/browse/FLINK-1730
>
> > On Nov 20, 2018, at 8:56 AM, Becket Qin <becket.qin@xxxxxxxxx> wrote:
> >
> > Hi all,
> >
> > As a few recent email threads have pointed out, it is a promising
> > opportunity to enhance Flink Table API in various aspects, including
> > functionality and ease of use among others. One of the scenarios where we
> > feel Flink could improve is interactive programming. To explain the
> issues
> > and facilitate the discussion on the solution, we put together the
> > following document with our proposal.
> >
> >
> https://docs.google.com/document/d/1d4T2zTyfe7hdncEUAxrlNOYr4e5IMNEZLyqSuuswkA0/edit?usp=sharing
> >
> > Feedback and comments are very welcome!
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
>
>