osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Enhancing the functionality and productivity of Table API


Hi Piotrek, Fabian:

I am very glad to see your reply. Thank you very much Piotrek for asking
very good questions. I will share my opinion:


   - The Enhancing TableAPI that I proposed is proposed for user
   friendliness. After enhancement, it will maintain the characteristics of
   TableAPI&SQL, such as: declarative, optimization etc.


   - For the difference between DataStreamAPI and TableAPI, I think there
   are two points:
   -  a). State management, DataStreamAPI users can use stateAPI to perform
      state operations directly. TableAPI/SQL can only use such as: DataView
      indirectly.
      -  b). Physical operation, DataStreamAPI can call
      rebalance()/rescale()/shuffle()etc. physical operation, TableAPI will use
      the optimizer to judge the underlying strategy. If necessary, we will
      follow the database's hint mechanism and add a hint to the tableAPI.
      Affects physical operations, but I don't recommend adding operations such
      as rebalance()/rescale()/shuffle()/sortPartition()etc. on the TableAPI.
   - About the management of time attributes we can continue to discuss in
   TableAPI Enhancement Outline:
   https://mail.google.com/mail/u/0/#search/xiaowei/FMfcgxvzLWzfvCnmvMzzSfxHTSfdwLkB
   .


   - Regarding the proposed Enhanced TableAPI, my core goal is to solve the
   usability problem. At this stage, the following four APIs will be proposed:
   - Table.Map
      - Table.FlatMap
      - GroupedTable.agg
      - GroupedTable.flatAgg

                Adding the above API is for  the terms of ease of use,
taking map as an example: Map - e.g. “dataStream.map(mapFun)”. Although
“table.select(udf1(), udf2(), udf3()....)” can be used to accomplish the
same function., with a map() function returning 100 columns, one has to
define or call 100 UDFs when using SQL, which is quite involved.

I totally agree that we have to discuss in depth the changes in the API and
let our community APIs continue to develop in the right direction.  Thanks
again for the reply, and looking forward to your feedback!:)

Best,
Jincheng

Fabian Hueske <fhueske@xxxxxxxxx> 于2018年11月13日周二 下午9:31写道:

> Yes, that is my understanding as well.
>
> Manual time management would be another difference.
> Something still to be discussed would be whether (or to what extent) it
> would be possible to define the physical execution plan with hints or
> methods like partitionByHash and sortPartition.
>
> Best, Fabian
>
> Am Di., 13. Nov. 2018, 13:57 hat Piotr Nowojski <piotr@xxxxxxxxxxxxxxxxx>
> geschrieben:
>
> > Hi,
> >
> > > This thread is meant to enhancing the functionalities of TableAPI. I
> > don't
> > > think that anyone is suggesting either reducing the effort in SQL or
> > > DataStream. So let's focus on how we can enhance TableAPI.
> >
> > I wasn’t thinking about that. As I said before, I was rising a question,
> > what Table API should look like in the future if we want to diverge it
> more
> > and more from SQL. It looks to me, that the more or less consensus is
> that
> > it should be expanded and evolve parallel to the DataStream API, but in
> > order to better suite different needs with following differences:
> > - declarative
> > - predefined schema/types
> > - no custom state operations (?)
> > - optimisations allowed by the above points
> >
> > Piotrek
> >
> > > On 7 Nov 2018, at 16:01, Xiaowei Jiang <xiaoweij@xxxxxxxxx> wrote:
> > >
> > > Hi Piotr:
> > >
> > > I want to clarify one thing first: I think that we will keep the
> > > interoperability between TableAPI and DataStream in any case. So user
> can
> > > switch between the two whenever needed. Given that, it would still be
> > very
> > > helpful that users can use one API to achieve most of what they do.
> > > Currently, TableAPI/SQL is good enough for most data analytics kind of
> > > scenarios, but there are some limitations that when removed will help
> we
> > go
> > > even further in this direction. An initial list of these is provided in
> > > another thread. These are naturally extensions to TableAPI which we
> need
> > to
> > > do just for the sake of making TableAPI more usable.
> > >
> > > TableAPI and SQL share the same underlying implementation, so
> enhancement
> > > in one will end up helping the other. I don't see them as competitive.
> > > TableAPI is easier to extend because that we have a bit more freedom in
> > > adding new functionalities. In reality, TableAPI can be mixed with SQL
> as
> > > well.
> > >
> > > On the implementation side, I agree that Table API/SQL and DataStream
> > > should try to share as much as possible. But that question is
> orthogonal
> > to
> > > the API discussion.
> > >
> > > This thread is meant to enhancing the functionalities of TableAPI. I
> > don't
> > > think that anyone is suggesting either reducing the effort in SQL or
> > > DataStream. So let's focus on how we can enhance TableAPI.
> > >
> > > Regards,
> > > Xiaowei
> >
> >
>