osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Embracing Table API in Flink ML


Hi Jincheng,

Thanks a lot for the warm feedback.

I've already read your Table API enhancement google doc. Those enhancements
are essential to implement any ML/DL algorithm on Table API. Our two
designs are perfectly complementary to each other. :)

Will add a section in my google doc for the implementation phased plan.

Thanks
Weihua

jincheng sun <sunjincheng121@xxxxxxxxx> 于2018年11月20日周二 下午9:17写道:

> Hi Weihua,
> Thanks for bring up this discuss!
>
> I quickly read the google doc,and I fully agree that ML can be well
> supported on  TableAPI (at some stage in the future).
> In fact, Xiaowei and I have already brought up a discussion on enhancing
> the Table API. In the first phase, we will add support for
> map/flatmap/agg/flatagg in TableAPI.
> So I am very happy to be involved in this discussion and will leave a
> comment in the good doc later.
>
> I think It's grateful if you can add a phased implementation plan in google
> doc. What to do you think?
>
> Thanks,
> Jincheng
>
>
> Weihua Jiang <weihua.jiang@xxxxxxxxx> 于2018年11月20日周二 下午8:53写道:
>
> > ML Pipeline is the idea brought by Scikit-learn
> > <https://arxiv.org/abs/1309.0238>. Both Spark and Flink has borrowed
> this
> > idea and made their own implementations [Spark ML Pipeline
> > <https://spark.apache.org/docs/latest/ml-pipeline.html>, Flink ML
> Pipeline
> > <
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/libs/ml/pipelines.html
> > >].
> >
> >
> >
> > NOTE: though I am using the term "ML", ML Pipeline shall apply to both ML
> > and DL pipelines.
> >
> >
> > ML Pipeline is quite helpful for model composition (i.e. using model(s)
> for
> > feature engineering) . And it enables logic reuse in train and inference
> > phases (via pipeline persistence and load), which is essential for AI
> > engineering. ML Pipeline can also be a good base for Flink based AI
> > engineering platform if we can make ML Pipeline have good tooling support
> > (i.e. meta data human readable).
> >
> >
> > As the Table API will be the unified high level API for both stream and
> > batch processing, I want to initiate the design discussion of new Table
> > based Flink ML Pipeline.
> >
> >
> > I drafted a design document [1] for this discussion. This design tries to
> > create a new ML Pipeline implementation so that concrete ML/DL algorithms
> > can fit to this new API to achieve interoperability.
> >
> >
> > Any feedback is highly appreciated.
> >
> >
> > Thanks
> >
> > Weihua
> >
> >
> > [1]
> >
> >
> https://docs.google.com/document/d/1PLddLEMP_wn4xHwi6069f3vZL7LzkaP0MN9nAB63X90/edit?usp=sharing
> >
>