osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Enhancing the functionality and productivity of Table API


Hi, Aljoscha,

Thanks for your feedback and suggestions. I think your are right, the
detailed design/FLIP is very necessary. Before the detailed design or open
a FLIP, I would like to hear the community's views on Enhancing the
functionality and productivity of Table API,  to ensure that it worth to
effort. If most community members agree with my proposal, I will list the
changes and discuss with all community members. Is that make sense to you?

Thanks,
Jincheng

Aljoscha Krettek <aljoscha@xxxxxxxxxx> 于2018年11月1日周四 下午8:12写道:

> Hi Jincheng,
>
> these points sound very good! Are there any concrete proposals for
> changes? For example a FLIP/design document?
>
> See here for FLIPs:
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
>
> Best,
> Aljoscha
>
> > On 1. Nov 2018, at 12:51, jincheng sun <sunjincheng121@xxxxxxxxx> wrote:
> >
> > *--------I am sorry for the formatting of the email content. I reformat
> > the **content** as follows-----------*
> >
> > *Hi ALL,*
> >
> > With the continuous efforts from the community, the Flink system has been
> > continuously improved, which has attracted more and more users. Flink SQL
> > is a canonical, widely used relational query language. However, there are
> > still some scenarios where Flink SQL failed to meet user needs in terms
> of
> > functionality and ease of use, such as:
> >
> > *1. In terms of functionality*
> >    Iteration, user-defined window, user-defined join, user-defined
> > GroupReduce, etc. Users cannot express them with SQL;
> >
> > *2. In terms of ease of use*
> >
> >   - Map - e.g. “dataStream.map(mapFun)”. Although “table.select(udf1(),
> >   udf2(), udf3()....)” can be used to accomplish the same function.,
> with a
> >   map() function returning 100 columns, one has to define or call 100
> UDFs
> >   when using SQL, which is quite involved.
> >   - FlatMap -  e.g. “dataStrem.flatmap(flatMapFun)”. Similarly, it can be
> >   implemented with “table.join(udtf).select()”. However, it is obvious
> that
> >   dataStream is easier to use than SQL.
> >
> > Due to the above two reasons, some users have to use the DataStream API
> or
> > the DataSet API. But when they do that, they lose the unification of
> batch
> > and streaming. They will also lose the sophisticated optimizations such
> as
> > codegen, aggregate join transpose and multi-stage agg from Flink SQL.
> >
> > We believe that enhancing the functionality and productivity is vital for
> > the successful adoption of Table API. To this end,  Table API still
> > requires more efforts from every contributor in the community. We see
> great
> > opportunity in improving our user’s experience from this work. Any
> feedback
> > is welcome.
> >
> > Regards,
> >
> > Jincheng
> >
> > jincheng sun <sunjincheng121@xxxxxxxxx> 于2018年11月1日周四 下午5:07写道:
> >
> >> Hi all,
> >>
> >> With the continuous efforts from the community, the Flink system has
> been
> >> continuously improved, which has attracted more and more users. Flink
> SQL
> >> is a canonical, widely used relational query language. However, there
> are
> >> still some scenarios where Flink SQL failed to meet user needs in terms
> of
> >> functionality and ease of use, such as:
> >>
> >>
> >>   -
> >>
> >>   In terms of functionality
> >>
> >> Iteration, user-defined window, user-defined join, user-defined
> >> GroupReduce, etc. Users cannot express them with SQL;
> >>
> >>   -
> >>
> >>   In terms of ease of use
> >>   -
> >>
> >>      Map - e.g. “dataStream.map(mapFun)”. Although “table.select(udf1(),
> >>      udf2(), udf3()....)” can be used to accomplish the same function.,
> with a
> >>      map() function returning 100 columns, one has to define or call
> 100 UDFs
> >>      when using SQL, which is quite involved.
> >>      -
> >>
> >>      FlatMap -  e.g. “dataStrem.flatmap(flatMapFun)”. Similarly, it can
> >>      be implemented with “table.join(udtf).select()”. However, it is
> obvious
> >>      that datastream is easier to use than SQL.
> >>
> >>
> >> Due to the above two reasons, some users have to use the DataStream API
> or
> >> the DataSet API. But when they do that, they lose the unification of
> batch
> >> and streaming. They will also lose the sophisticated optimizations such
> as
> >> codegen, aggregate join transpose  and multi-stage agg from Flink SQL.
> >>
> >> We believe that enhancing the functionality and productivity is vital
> for
> >> the successful adoption of Table API. To this end,  Table API still
> >> requires more efforts from every contributor in the community. We see
> great
> >> opportunity in improving our user’s experience from this work. Any
> feedback
> >> is welcome.
> >>
> >> Regards,
> >>
> >> Jincheng
> >>
> >>
>
>