osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Embracing Table API in Flink ML


Hi Shaoxuan,

You are perfectly right. What I want to achieve is a combination of all
your 3 points. Let me rephrase here:
1. Define a Table based ML Pipeline interface to have the same
functionality as current DataSet based implementations.
2. Support new features like online learning, streaming inference.
3. Provide a base for Flink AI tooling (i.e. AI platform) and ML/DL SQL
support.

This definitely will be step-by-step actions and will need a lot of help
from Table enhancements. I am currently working on #1.

Thanks
Weihua

Shaoxuan Wang <wshaoxuan@xxxxxxxxx> 于2018年11月20日周二 下午11:11写道:

> Hi Weihua,
>
> Thanks for the proposal. I have quickly read through it. It looks great.
> A quick question. Do you consider changing the ML Lib (implementation
> of Estimator/Predictor/Transformer) also on top of the tableAPI? I
> will be very happy if this is also included in the scope. It is not
> easy and needs lots of new tableAPI functionalities, which is exactly
> one of the reasons that motivate us to "enhance the tableAPI"
> discussed in other threads.
>
> The entire scope of your proposal is so big that I would suggest we
> should complete it step by step. I think you have mainly proposed 3
> things:
> 1. Redesign the ML pipeline based on tableAPI
> 2. Take streaming ML pipeline into account
> 3. Enhance ML pipeline with some new features for a better user experience
> Maybe we should first replace the ml pipeline interface with tableAPI,
> then move into #2 and #3. In the meanwhile, we can also explore the
> possibility of changing the ML lib also on top of tableAPI. What do
> you think?
>
> BTW, we should not break the current ML pipeline interface (which is
> based on dataset) when we introduce the new ones. Let us leave it for
> a while before the new interface is completed and well adopted. Then
> we can deprecate the old ones.
>
> I will take a more thorough look at your proposal and leave comments
> directly on the doc.
>
> Regards,
> Shaoxuan
>
>
> On 11/20/18, Weihua Jiang <weihua.jiang@xxxxxxxxx> wrote:
> > ML Pipeline is the idea brought by Scikit-learn
> > <https://arxiv.org/abs/1309.0238>. Both Spark and Flink has borrowed
> this
> > idea and made their own implementations [Spark ML Pipeline
> > <https://spark.apache.org/docs/latest/ml-pipeline.html>, Flink ML
> Pipeline
> > <
> https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/libs/ml/pipelines.html
> >].
> >
> >
> >
> > NOTE: though I am using the term "ML", ML Pipeline shall apply to both ML
> > and DL pipelines.
> >
> >
> > ML Pipeline is quite helpful for model composition (i.e. using model(s)
> for
> > feature engineering) . And it enables logic reuse in train and inference
> > phases (via pipeline persistence and load), which is essential for AI
> > engineering. ML Pipeline can also be a good base for Flink based AI
> > engineering platform if we can make ML Pipeline have good tooling support
> > (i.e. meta data human readable).
> >
> >
> > As the Table API will be the unified high level API for both stream and
> > batch processing, I want to initiate the design discussion of new Table
> > based Flink ML Pipeline.
> >
> >
> > I drafted a design document [1] for this discussion. This design tries to
> > create a new ML Pipeline implementation so that concrete ML/DL algorithms
> > can fit to this new API to achieve interoperability.
> >
> >
> > Any feedback is highly appreciated.
> >
> >
> > Thanks
> >
> > Weihua
> >
> >
> > [1]
> >
> https://docs.google.com/document/d/1PLddLEMP_wn4xHwi6069f3vZL7LzkaP0MN9nAB63X90/edit?usp=sharing
> >
>
>
> --
>
> -----------------------------------------------------------------------------------
>
> *Rome was not built in one day*
>
>
> -----------------------------------------------------------------------------------
>