osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Enhancing the functionality and productivity of Table API


Hi Fabian,

Yes, Timers is not only the difference between Table and DataStream, but
also the difference between DataStream and DataSet. We need to unify the
batch and Stream in Table, so the difference about timers needs to be
considered in depth. :)

Thanks, Jincheng


Fabian Hueske <fhueske@xxxxxxxxx> 于2018年11月15日周四 下午8:58写道:

> Thanks Jincheng,
>
> That makes sense to me.
> Another differentiation of Table API and DataStream API would be the access
> to the timer service.
> The DataStream API can register and act on timers while the Table API would
> not have this feature.
>
> Best, Fabian
>
> Am Mi., 14. Nov. 2018 um 02:02 Uhr schrieb jincheng sun <
> sunjincheng121@xxxxxxxxx>:
>
> > Hi Piotrek, Fabian:
> >
> > I am very glad to see your reply. Thank you very much Piotrek for asking
> > very good questions. I will share my opinion:
> >
> >
> >    - The Enhancing TableAPI that I proposed is proposed for user
> >    friendliness. After enhancement, it will maintain the characteristics
> of
> >    TableAPI&SQL, such as: declarative, optimization etc.
> >
> >
> >    - For the difference between DataStreamAPI and TableAPI, I think there
> >    are two points:
> >    -  a). State management, DataStreamAPI users can use stateAPI to
> perform
> >       state operations directly. TableAPI/SQL can only use such as:
> > DataView
> >       indirectly.
> >       -  b). Physical operation, DataStreamAPI can call
> >       rebalance()/rescale()/shuffle()etc. physical operation, TableAPI
> > will use
> >       the optimizer to judge the underlying strategy. If necessary, we
> will
> >       follow the database's hint mechanism and add a hint to the
> tableAPI.
> >       Affects physical operations, but I don't recommend adding
> operations
> > such
> >       as rebalance()/rescale()/shuffle()/sortPartition()etc. on the
> > TableAPI.
> >    - About the management of time attributes we can continue to discuss
> in
> >    TableAPI Enhancement Outline:
> >
> >
> https://mail.google.com/mail/u/0/#search/xiaowei/FMfcgxvzLWzfvCnmvMzzSfxHTSfdwLkB
> >    .
> >
> >
> >    - Regarding the proposed Enhanced TableAPI, my core goal is to solve
> the
> >    usability problem. At this stage, the following four APIs will be
> > proposed:
> >    - Table.Map
> >       - Table.FlatMap
> >       - GroupedTable.agg
> >       - GroupedTable.flatAgg
> >
> >                 Adding the above API is for  the terms of ease of use,
> > taking map as an example: Map - e.g. “dataStream.map(mapFun)”. Although
> > “table.select(udf1(), udf2(), udf3()....)” can be used to accomplish the
> > same function., with a map() function returning 100 columns, one has to
> > define or call 100 UDFs when using SQL, which is quite involved.
> >
> > I totally agree that we have to discuss in depth the changes in the API
> and
> > let our community APIs continue to develop in the right direction.
> Thanks
> > again for the reply, and looking forward to your feedback!:)
> >
> > Best,
> > Jincheng
> >
> > Fabian Hueske <fhueske@xxxxxxxxx> 于2018年11月13日周二 下午9:31写道:
> >
> > > Yes, that is my understanding as well.
> > >
> > > Manual time management would be another difference.
> > > Something still to be discussed would be whether (or to what extent) it
> > > would be possible to define the physical execution plan with hints or
> > > methods like partitionByHash and sortPartition.
> > >
> > > Best, Fabian
> > >
> > > Am Di., 13. Nov. 2018, 13:57 hat Piotr Nowojski <
> piotr@xxxxxxxxxxxxxxxxx
> > >
> > > geschrieben:
> > >
> > > > Hi,
> > > >
> > > > > This thread is meant to enhancing the functionalities of TableAPI.
> I
> > > > don't
> > > > > think that anyone is suggesting either reducing the effort in SQL
> or
> > > > > DataStream. So let's focus on how we can enhance TableAPI.
> > > >
> > > > I wasn’t thinking about that. As I said before, I was rising a
> > question,
> > > > what Table API should look like in the future if we want to diverge
> it
> > > more
> > > > and more from SQL. It looks to me, that the more or less consensus is
> > > that
> > > > it should be expanded and evolve parallel to the DataStream API, but
> in
> > > > order to better suite different needs with following differences:
> > > > - declarative
> > > > - predefined schema/types
> > > > - no custom state operations (?)
> > > > - optimisations allowed by the above points
> > > >
> > > > Piotrek
> > > >
> > > > > On 7 Nov 2018, at 16:01, Xiaowei Jiang <xiaoweij@xxxxxxxxx> wrote:
> > > > >
> > > > > Hi Piotr:
> > > > >
> > > > > I want to clarify one thing first: I think that we will keep the
> > > > > interoperability between TableAPI and DataStream in any case. So
> user
> > > can
> > > > > switch between the two whenever needed. Given that, it would still
> be
> > > > very
> > > > > helpful that users can use one API to achieve most of what they do.
> > > > > Currently, TableAPI/SQL is good enough for most data analytics kind
> > of
> > > > > scenarios, but there are some limitations that when removed will
> help
> > > we
> > > > go
> > > > > even further in this direction. An initial list of these is
> provided
> > in
> > > > > another thread. These are naturally extensions to TableAPI which we
> > > need
> > > > to
> > > > > do just for the sake of making TableAPI more usable.
> > > > >
> > > > > TableAPI and SQL share the same underlying implementation, so
> > > enhancement
> > > > > in one will end up helping the other. I don't see them as
> > competitive.
> > > > > TableAPI is easier to extend because that we have a bit more
> freedom
> > in
> > > > > adding new functionalities. In reality, TableAPI can be mixed with
> > SQL
> > > as
> > > > > well.
> > > > >
> > > > > On the implementation side, I agree that Table API/SQL and
> DataStream
> > > > > should try to share as much as possible. But that question is
> > > orthogonal
> > > > to
> > > > > the API discussion.
> > > > >
> > > > > This thread is meant to enhancing the functionalities of TableAPI.
> I
> > > > don't
> > > > > think that anyone is suggesting either reducing the effort in SQL
> or
> > > > > DataStream. So let's focus on how we can enhance TableAPI.
> > > > >
> > > > > Regards,
> > > > > Xiaowei
> > > >
> > > >
> > >
> >
>