osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Enhancing the functionality and productivity of Table API


Thanks Jincheng,

That makes sense to me.
Another differentiation of Table API and DataStream API would be the access
to the timer service.
The DataStream API can register and act on timers while the Table API would
not have this feature.

Best, Fabian

Am Mi., 14. Nov. 2018 um 02:02 Uhr schrieb jincheng sun <
sunjincheng121@xxxxxxxxx>:

> Hi Piotrek, Fabian:
>
> I am very glad to see your reply. Thank you very much Piotrek for asking
> very good questions. I will share my opinion:
>
>
>    - The Enhancing TableAPI that I proposed is proposed for user
>    friendliness. After enhancement, it will maintain the characteristics of
>    TableAPI&SQL, such as: declarative, optimization etc.
>
>
>    - For the difference between DataStreamAPI and TableAPI, I think there
>    are two points:
>    -  a). State management, DataStreamAPI users can use stateAPI to perform
>       state operations directly. TableAPI/SQL can only use such as:
> DataView
>       indirectly.
>       -  b). Physical operation, DataStreamAPI can call
>       rebalance()/rescale()/shuffle()etc. physical operation, TableAPI
> will use
>       the optimizer to judge the underlying strategy. If necessary, we will
>       follow the database's hint mechanism and add a hint to the tableAPI.
>       Affects physical operations, but I don't recommend adding operations
> such
>       as rebalance()/rescale()/shuffle()/sortPartition()etc. on the
> TableAPI.
>    - About the management of time attributes we can continue to discuss in
>    TableAPI Enhancement Outline:
>
> https://mail.google.com/mail/u/0/#search/xiaowei/FMfcgxvzLWzfvCnmvMzzSfxHTSfdwLkB
>    .
>
>
>    - Regarding the proposed Enhanced TableAPI, my core goal is to solve the
>    usability problem. At this stage, the following four APIs will be
> proposed:
>    - Table.Map
>       - Table.FlatMap
>       - GroupedTable.agg
>       - GroupedTable.flatAgg
>
>                 Adding the above API is for  the terms of ease of use,
> taking map as an example: Map - e.g. “dataStream.map(mapFun)”. Although
> “table.select(udf1(), udf2(), udf3()....)” can be used to accomplish the
> same function., with a map() function returning 100 columns, one has to
> define or call 100 UDFs when using SQL, which is quite involved.
>
> I totally agree that we have to discuss in depth the changes in the API and
> let our community APIs continue to develop in the right direction.  Thanks
> again for the reply, and looking forward to your feedback!:)
>
> Best,
> Jincheng
>
> Fabian Hueske <fhueske@xxxxxxxxx> 于2018年11月13日周二 下午9:31写道:
>
> > Yes, that is my understanding as well.
> >
> > Manual time management would be another difference.
> > Something still to be discussed would be whether (or to what extent) it
> > would be possible to define the physical execution plan with hints or
> > methods like partitionByHash and sortPartition.
> >
> > Best, Fabian
> >
> > Am Di., 13. Nov. 2018, 13:57 hat Piotr Nowojski <piotr@xxxxxxxxxxxxxxxxx
> >
> > geschrieben:
> >
> > > Hi,
> > >
> > > > This thread is meant to enhancing the functionalities of TableAPI. I
> > > don't
> > > > think that anyone is suggesting either reducing the effort in SQL or
> > > > DataStream. So let's focus on how we can enhance TableAPI.
> > >
> > > I wasn’t thinking about that. As I said before, I was rising a
> question,
> > > what Table API should look like in the future if we want to diverge it
> > more
> > > and more from SQL. It looks to me, that the more or less consensus is
> > that
> > > it should be expanded and evolve parallel to the DataStream API, but in
> > > order to better suite different needs with following differences:
> > > - declarative
> > > - predefined schema/types
> > > - no custom state operations (?)
> > > - optimisations allowed by the above points
> > >
> > > Piotrek
> > >
> > > > On 7 Nov 2018, at 16:01, Xiaowei Jiang <xiaoweij@xxxxxxxxx> wrote:
> > > >
> > > > Hi Piotr:
> > > >
> > > > I want to clarify one thing first: I think that we will keep the
> > > > interoperability between TableAPI and DataStream in any case. So user
> > can
> > > > switch between the two whenever needed. Given that, it would still be
> > > very
> > > > helpful that users can use one API to achieve most of what they do.
> > > > Currently, TableAPI/SQL is good enough for most data analytics kind
> of
> > > > scenarios, but there are some limitations that when removed will help
> > we
> > > go
> > > > even further in this direction. An initial list of these is provided
> in
> > > > another thread. These are naturally extensions to TableAPI which we
> > need
> > > to
> > > > do just for the sake of making TableAPI more usable.
> > > >
> > > > TableAPI and SQL share the same underlying implementation, so
> > enhancement
> > > > in one will end up helping the other. I don't see them as
> competitive.
> > > > TableAPI is easier to extend because that we have a bit more freedom
> in
> > > > adding new functionalities. In reality, TableAPI can be mixed with
> SQL
> > as
> > > > well.
> > > >
> > > > On the implementation side, I agree that Table API/SQL and DataStream
> > > > should try to share as much as possible. But that question is
> > orthogonal
> > > to
> > > > the API discussion.
> > > >
> > > > This thread is meant to enhancing the functionalities of TableAPI. I
> > > don't
> > > > think that anyone is suggesting either reducing the effort in SQL or
> > > > DataStream. So let's focus on how we can enhance TableAPI.
> > > >
> > > > Regards,
> > > > Xiaowei
> > >
> > >
> >
>