osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Embracing Table API in Flink ML


Hi Yun,

Can't wait to see your design.

Thanks
Weihua

Yun Gao <yungao.gy@xxxxxxxxxx.invalid> 于2018年11月21日周三 上午12:43写道:

> Hi Weihua,
>
>     Thanks for the exciting proposal!
>
>     I have quickly read through it,  and I really appropriate the idea of
> providing the ML Pipeline API similar to the commonly used library
> scikit-learn, since it greatly reduce the learning cost for the AI
> engineers to transfer to the Flink platform.
>
>     Currently we are also working on a related issue, namely enhancing the
> stream iteration of Flink to support both SGD and online learning, and it
> also support batch training as a special case. we have had a rough design
> and will start a new discussion in the next few days. I think the enhanced
> stream iteration will help to implement Estimators directly in Flink, and
> it may help to simplify the online learning pipeline by eliminating the
> requirement to load the models from external file systems.
>
>     I will read the design doc more carefully. Thanks again for sharing
> the design doc!
>
> Yours sincerely
>     Yun Gao
>
>
> ------------------------------------------------------------------
> 发件人:Weihua Jiang <weihua.jiang@xxxxxxxxx>
> 发送时间:2018年11月20日(星期二) 20:53
> 收件人:dev <dev@xxxxxxxxxxxxxxxx>
> 主 题:[DISCUSS] Embracing Table API in Flink ML
>
> ML Pipeline is the idea brought by Scikit-learn
> <https://arxiv.org/abs/1309.0238>. Both Spark and Flink has borrowed this
> idea and made their own implementations [Spark ML Pipeline
> <https://spark.apache.org/docs/latest/ml-pipeline.html>, Flink ML Pipeline
> <
> https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/libs/ml/pipelines.html
> >].
>
>
>
> NOTE: though I am using the term "ML", ML Pipeline shall apply to both ML
> and DL pipelines.
>
>
> ML Pipeline is quite helpful for model composition (i.e. using model(s) for
> feature engineering) . And it enables logic reuse in train and inference
> phases (via pipeline persistence and load), which is essential for AI
> engineering. ML Pipeline can also be a good base for Flink based AI
> engineering platform if we can make ML Pipeline have good tooling support
> (i.e. meta data human readable).
>
>
> As the Table API will be the unified high level API for both stream and
> batch processing, I want to initiate the design discussion of new Table
> based Flink ML Pipeline.
>
>
> I drafted a design document [1] for this discussion. This design tries to
> create a new ML Pipeline implementation so that concrete ML/DL algorithms
> can fit to this new API to achieve interoperability.
>
>
> Any feedback is highly appreciated.
>
>
> Thanks
>
> Weihua
>
>
> [1]
>
> https://docs.google.com/document/d/1PLddLEMP_wn4xHwi6069f3vZL7LzkaP0MN9nAB63X90/edit?usp=sharing
>
>