Re: [DISCUSS] Embracing Table API in Flink ML
Can't wait to see your design.
Yun Gao <firstname.lastname@example.org> 于2018年11月21日周三 上午12:43写道：
> Hi Weihua,
> Thanks for the exciting proposal!
> I have quickly read through it, and I really appropriate the idea of
> providing the ML Pipeline API similar to the commonly used library
> scikit-learn, since it greatly reduce the learning cost for the AI
> engineers to transfer to the Flink platform.
> Currently we are also working on a related issue, namely enhancing the
> stream iteration of Flink to support both SGD and online learning, and it
> also support batch training as a special case. we have had a rough design
> and will start a new discussion in the next few days. I think the enhanced
> stream iteration will help to implement Estimators directly in Flink, and
> it may help to simplify the online learning pipeline by eliminating the
> requirement to load the models from external file systems.
> I will read the design doc more carefully. Thanks again for sharing
> the design doc!
> Yours sincerely
> Yun Gao
> 发件人：Weihua Jiang <weihua.jiang@xxxxxxxxx>
> 发送时间：2018年11月20日(星期二) 20:53
> 收件人：dev <dev@xxxxxxxxxxxxxxxx>
> 主 题：[DISCUSS] Embracing Table API in Flink ML
> ML Pipeline is the idea brought by Scikit-learn
> <https://arxiv.org/abs/1309.0238>. Both Spark and Flink has borrowed this
> idea and made their own implementations [Spark ML Pipeline
> <https://spark.apache.org/docs/latest/ml-pipeline.html>, Flink ML Pipeline
> NOTE: though I am using the term "ML", ML Pipeline shall apply to both ML
> and DL pipelines.
> ML Pipeline is quite helpful for model composition (i.e. using model(s) for
> feature engineering) . And it enables logic reuse in train and inference
> phases (via pipeline persistence and load), which is essential for AI
> engineering. ML Pipeline can also be a good base for Flink based AI
> engineering platform if we can make ML Pipeline have good tooling support
> (i.e. meta data human readable).
> As the Table API will be the unified high level API for both stream and
> batch processing, I want to initiate the design discussion of new Table
> based Flink ML Pipeline.
> I drafted a design document  for this discussion. This design tries to
> create a new ML Pipeline implementation so that concrete ML/DL algorithms
> can fit to this new API to achieve interoperability.
> Any feedback is highly appreciated.