osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Embracing Table API in Flink ML


HI Becket,

Thanks a lot for the Table API enhancement design doc.

 I am working on some simple ML algorithm using this new ML pipeline. Will
feedback you if there is any Table enhancement needed.

Thanks
Weihua


Becket Qin <becket.qin@xxxxxxxxx> 于2018年11月20日周二 下午10:43写道:

> Hi Weihua,
>
> Thanks for the well written design doc!
>
> The abstraction of ML pipeline is pretty handy to the AI engineers. As
> Jincheng mentioned, there is an undergoing effort to enhance the Table API
> for ML. But it would still be helpful to understand what is missing in
> Table API to fully support the ML pipeline. Given that there are quite a
> few proposed API and different related items to discuss, do you think
> having some examples of how the pipeline works would facilitate the
> discussion?
>
> Again, thanks for kicking off the discussion.
>
> Jiangjie (Becket) Qin
>
>
> On Tue, Nov 20, 2018 at 9:17 PM jincheng sun <sunjincheng121@xxxxxxxxx>
> wrote:
>
> > Hi Weihua,
> > Thanks for bring up this discuss!
> >
> > I quickly read the google doc,and I fully agree that ML can be well
> > supported on  TableAPI (at some stage in the future).
> > In fact, Xiaowei and I have already brought up a discussion on enhancing
> > the Table API. In the first phase, we will add support for
> > map/flatmap/agg/flatagg in TableAPI.
> > So I am very happy to be involved in this discussion and will leave a
> > comment in the good doc later.
> >
> > I think It's grateful if you can add a phased implementation plan in
> google
> > doc. What to do you think?
> >
> > Thanks,
> > Jincheng
> >
> >
> > Weihua Jiang <weihua.jiang@xxxxxxxxx> 于2018年11月20日周二 下午8:53写道:
> >
> > > ML Pipeline is the idea brought by Scikit-learn
> > > <https://arxiv.org/abs/1309.0238>. Both Spark and Flink has borrowed
> > this
> > > idea and made their own implementations [Spark ML Pipeline
> > > <https://spark.apache.org/docs/latest/ml-pipeline.html>, Flink ML
> > Pipeline
> > > <
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/libs/ml/pipelines.html
> > > >].
> > >
> > >
> > >
> > > NOTE: though I am using the term "ML", ML Pipeline shall apply to both
> ML
> > > and DL pipelines.
> > >
> > >
> > > ML Pipeline is quite helpful for model composition (i.e. using model(s)
> > for
> > > feature engineering) . And it enables logic reuse in train and
> inference
> > > phases (via pipeline persistence and load), which is essential for AI
> > > engineering. ML Pipeline can also be a good base for Flink based AI
> > > engineering platform if we can make ML Pipeline have good tooling
> support
> > > (i.e. meta data human readable).
> > >
> > >
> > > As the Table API will be the unified high level API for both stream and
> > > batch processing, I want to initiate the design discussion of new Table
> > > based Flink ML Pipeline.
> > >
> > >
> > > I drafted a design document [1] for this discussion. This design tries
> to
> > > create a new ML Pipeline implementation so that concrete ML/DL
> algorithms
> > > can fit to this new API to achieve interoperability.
> > >
> > >
> > > Any feedback is highly appreciated.
> > >
> > >
> > > Thanks
> > >
> > > Weihua
> > >
> > >
> > > [1]
> > >
> > >
> >
> https://docs.google.com/document/d/1PLddLEMP_wn4xHwi6069f3vZL7LzkaP0MN9nAB63X90/edit?usp=sharing
> > >
> >
>