Re: [DISCUSS] Embracing Table API in Flink ML
Thanks for bring up this discuss!
I quickly read the google doc，and I fully agree that ML can be well
supported on TableAPI (at some stage in the future).
In fact, Xiaowei and I have already brought up a discussion on enhancing
the Table API. In the first phase, we will add support for
map/flatmap/agg/flatagg in TableAPI.
So I am very happy to be involved in this discussion and will leave a
comment in the good doc later.
I think It's grateful if you can add a phased implementation plan in google
doc. What to do you think?
Weihua Jiang <weihua.jiang@xxxxxxxxx> 于2018年11月20日周二 下午8:53写道：
> ML Pipeline is the idea brought by Scikit-learn
> <https://arxiv.org/abs/1309.0238>. Both Spark and Flink has borrowed this
> idea and made their own implementations [Spark ML Pipeline
> <https://spark.apache.org/docs/latest/ml-pipeline.html>, Flink ML Pipeline
> NOTE: though I am using the term "ML", ML Pipeline shall apply to both ML
> and DL pipelines.
> ML Pipeline is quite helpful for model composition (i.e. using model(s) for
> feature engineering) . And it enables logic reuse in train and inference
> phases (via pipeline persistence and load), which is essential for AI
> engineering. ML Pipeline can also be a good base for Flink based AI
> engineering platform if we can make ML Pipeline have good tooling support
> (i.e. meta data human readable).
> As the Table API will be the unified high level API for both stream and
> batch processing, I want to initiate the design discussion of new Table
> based Flink ML Pipeline.
> I drafted a design document  for this discussion. This design tries to
> create a new ML Pipeline implementation so that concrete ML/DL algorithms
> can fit to this new API to achieve interoperability.
> Any feedback is highly appreciated.