osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Enhancing the functionality and productivity of Table API


Hi Jincheng,

I was also thinking about introducing a process function for the Table API several times. This would allow to define more complex logic (custom windows, timers, etc.) embedded into a relational API with schema awareness and optimization around the black box. Of course this would mean that we diverge with Table API from SQL API, however, it would open the Table API also for more event-driven applications.

Maybe it would be possible to define timers and firing logic using Table API expressions and UDFs. Within planning this would be treated as a special Calc node.

Just some ideas that might be interesting for new use cases.

Regards,
Timo


Am 01.11.18 um 13:12 schrieb Aljoscha Krettek:
Hi Jincheng,

these points sound very good! Are there any concrete proposals for changes? For example a FLIP/design document?

See here for FLIPs: https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals

Best,
Aljoscha

On 1. Nov 2018, at 12:51, jincheng sun <sunjincheng121@xxxxxxxxx> wrote:

*--------I am sorry for the formatting of the email content. I reformat
the **content** as follows-----------*

*Hi ALL,*

With the continuous efforts from the community, the Flink system has been
continuously improved, which has attracted more and more users. Flink SQL
is a canonical, widely used relational query language. However, there are
still some scenarios where Flink SQL failed to meet user needs in terms of
functionality and ease of use, such as:

*1. In terms of functionality*
    Iteration, user-defined window, user-defined join, user-defined
GroupReduce, etc. Users cannot express them with SQL;

*2. In terms of ease of use*

   - Map - e.g. “dataStream.map(mapFun)”. Although “table.select(udf1(),
   udf2(), udf3()....)” can be used to accomplish the same function., with a
   map() function returning 100 columns, one has to define or call 100 UDFs
   when using SQL, which is quite involved.
   - FlatMap -  e.g. “dataStrem.flatmap(flatMapFun)”. Similarly, it can be
   implemented with “table.join(udtf).select()”. However, it is obvious that
   dataStream is easier to use than SQL.

Due to the above two reasons, some users have to use the DataStream API or
the DataSet API. But when they do that, they lose the unification of batch
and streaming. They will also lose the sophisticated optimizations such as
codegen, aggregate join transpose and multi-stage agg from Flink SQL.

We believe that enhancing the functionality and productivity is vital for
the successful adoption of Table API. To this end,  Table API still
requires more efforts from every contributor in the community. We see great
opportunity in improving our user’s experience from this work. Any feedback
is welcome.

Regards,

Jincheng

jincheng sun <sunjincheng121@xxxxxxxxx> 于2018年11月1日周四 下午5:07写道:

Hi all,

With the continuous efforts from the community, the Flink system has been
continuously improved, which has attracted more and more users. Flink SQL
is a canonical, widely used relational query language. However, there are
still some scenarios where Flink SQL failed to meet user needs in terms of
functionality and ease of use, such as:


   -

   In terms of functionality

Iteration, user-defined window, user-defined join, user-defined
GroupReduce, etc. Users cannot express them with SQL;

   -

   In terms of ease of use
   -

      Map - e.g. “dataStream.map(mapFun)”. Although “table.select(udf1(),
      udf2(), udf3()....)” can be used to accomplish the same function., with a
      map() function returning 100 columns, one has to define or call 100 UDFs
      when using SQL, which is quite involved.
      -

      FlatMap -  e.g. “dataStrem.flatmap(flatMapFun)”. Similarly, it can
      be implemented with “table.join(udtf).select()”. However, it is obvious
      that datastream is easier to use than SQL.


Due to the above two reasons, some users have to use the DataStream API or
the DataSet API. But when they do that, they lose the unification of batch
and streaming. They will also lose the sophisticated optimizations such as
codegen, aggregate join transpose  and multi-stage agg from Flink SQL.

We believe that enhancing the functionality and productivity is vital for
the successful adoption of Table API. To this end,  Table API still
requires more efforts from every contributor in the community. We see great
opportunity in improving our user’s experience from this work. Any feedback
is welcome.

Regards,

Jincheng