osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Long-term goal of making flink-table Scala-free


Hi hequn,

I am very glad to hear that you are interested in this work.
As we all know, this process involves a lot.
Currently, the migration work has begun. I started with the
Kafka connector's dependency on flink-table and moved the
related dependencies to flink-table-common.
This work is tracked by FLINK-9461.  [1]
I don't know if it will conflict with what you expect to do, but from the
impact I have observed,
it will involve many classes that are currently in flink-table.

*Just a statement to prevent unnecessary conflicts.*

Thanks, vino.

[1]: https://issues.apache.org/jira/browse/FLINK-9461

Hequn Cheng <chenghequn@xxxxxxxxx> 于2018年11月24日周六 下午7:20写道:

> Hi Timo,
>
> Thanks for the effort and writing up this document. I like the idea to make
> flink-table scala free, so +1 for the proposal!
>
> It's good to make Java the first-class citizen. For a long time, we have
> neglected java so that many features in Table are missed in Java Test
> cases, such as this one[1] I found recently. And I think we may also need
> to migrate our test cases, i.e, add java tests.
>
> This definitely is a big change and will break API compatible. In order to
> bring a smaller impact on users, I think we should go fast when we migrate
> APIs targeted to users. It's better to introduce the user sensitive changes
> within a release. However, it may be not that easy. I can help to
> contribute.
>
> Separation of interface and implementation is a good idea. This may
> introduce a minimum of dependencies or even no dependencies. I saw your
> reply in the google doc. Java8 has already supported static method for
> interfaces, I think we can make use of it?
>
> Best,
> Hequn
>
> [1] https://issues.apache.org/jira/browse/FLINK-11001
>
>
> On Fri, Nov 23, 2018 at 5:36 PM Timo Walther <twalthr@xxxxxxxxxx> wrote:
>
> > Hi everyone,
> >
> > thanks for the great feedback so far. I updated the document with the
> > input I got so far
> >
> > @Fabian: I moved the porting of flink-table-runtime classes up in the
> list.
> >
> > @Xiaowei: Could you elaborate what "interface only" means to you? Do you
> > mean a module containing pure Java `interface`s? Or is the validation
> > logic also part of the API module? Are 50+ expression classes part of
> > the API interface or already too implementation-specific?
> >
> > @Xuefu: I extended the document by almost a page to clarify when we
> > should develop in Scala and when in Java. As Piotr said, every new Scala
> > line is instant technical debt.
> >
> > Thanks,
> > Timo
> >
> >
> > Am 23.11.18 um 10:29 schrieb Piotr Nowojski:
> > > Hi Timo,
> > >
> > > Thanks for writing this down +1 from my side :)
> > >
> > >> I'm wondering that whether we can have rule in the interim when Java
> > and Scala coexist that dependency can only be one-way. I found that in
> the
> > current code base there are cases where a Scala class extends Java and
> vise
> > versa. This is quite painful. I'm thinking if we could say that extension
> > can only be from Java to Scala, which will help the situation. However,
> I'm
> > not sure if this is practical.
> > > Xuefu: I’m also not sure what’s the best approach here, probably we
> will
> > have to work it out as we go. One thing to consider is that from now on,
> > every single new code line written in Scala anywhere in Flink-table
> (except
> > of Flink-table-api-scala) is an instant technological debt. From this
> > perspective I would be in favour of tolerating quite big inchonvieneces
> > just to avoid any new Scala code.
> > >
> > > Piotrek
> > >
> > >> On 23 Nov 2018, at 03:25, Zhang, Xuefu <xuefu.z@xxxxxxxxxxxxxxx>
> wrote:
> > >>
> > >> Hi Timo,
> > >>
> > >> Thanks for the effort and the Google writeup. During our external
> > catalog rework, we found much confusion between Java and Scala, and this
> > Scala-free roadmap should greatly mitigate that.
> > >>
> > >> I'm wondering that whether we can have rule in the interim when Java
> > and Scala coexist that dependency can only be one-way. I found that in
> the
> > current code base there are cases where a Scala class extends Java and
> vise
> > versa. This is quite painful. I'm thinking if we could say that extension
> > can only be from Java to Scala, which will help the situation. However,
> I'm
> > not sure if this is practical.
> > >>
> > >> Thanks,
> > >> Xuefu
> > >>
> > >>
> > >> ------------------------------------------------------------------
> > >> Sender:jincheng sun <sunjincheng121@xxxxxxxxx>
> > >> Sent at:2018 Nov 23 (Fri) 09:49
> > >> Recipient:dev <dev@xxxxxxxxxxxxxxxx>
> > >> Subject:Re: [DISCUSS] Long-term goal of making flink-table Scala-free
> > >>
> > >> Hi Timo,
> > >> Thanks for initiating this great discussion.
> > >>
> > >> Currently when using SQL/TableAPI should include many dependence. In
> > >> particular, it is not necessary to introduce the specific
> implementation
> > >> dependencies which users do not care about. So I am glad to see your
> > >> proposal, and hope when we consider splitting the API interface into a
> > >> separate module, so that the user can introduce minimum of
> dependencies.
> > >>
> > >> So, +1 to [separation of interface and implementation; e.g. `Table` &
> > >> `TableImpl`] which you mentioned in the google doc.
> > >> Best,
> > >> Jincheng
> > >>
> > >> Xiaowei Jiang <xiaoweij@xxxxxxxxx> 于2018年11月22日周四 下午10:50写道:
> > >>
> > >>> Hi Timo, thanks for driving this! I think that this is a nice thing
> to
> > do.
> > >>> While we are doing this, can we also keep in mind that we want to
> > >>> eventually have a TableAPI interface only module which users can take
> > >>> dependency on, but without including any implementation details?
> > >>>
> > >>> Xiaowei
> > >>>
> > >>> On Thu, Nov 22, 2018 at 6:37 PM Fabian Hueske <fhueske@xxxxxxxxx>
> > wrote:
> > >>>
> > >>>> Hi Timo,
> > >>>>
> > >>>> Thanks for writing up this document.
> > >>>> I like the new structure and agree to prioritize the porting of the
> > >>>> flink-table-common classes.
> > >>>> Since flink-table-runtime is (or should be) independent of the API
> and
> > >>>> planner modules, we could start porting these classes once the code
> is
> > >>>> split into the new module structure.
> > >>>> The benefits of a Scala-free flink-table-runtime would be a
> Scala-free
> > >>>> execution Jar.
> > >>>>
> > >>>> Best, Fabian
> > >>>>
> > >>>>
> > >>>> Am Do., 22. Nov. 2018 um 10:54 Uhr schrieb Timo Walther <
> > >>>> twalthr@xxxxxxxxxx
> > >>>>> :
> > >>>>> Hi everyone,
> > >>>>>
> > >>>>> I would like to continue this discussion thread and convert the
> > outcome
> > >>>>> into a FLIP such that users and contributors know what to expect in
> > the
> > >>>>> upcoming releases.
> > >>>>>
> > >>>>> I created a design document [1] that clarifies our motivation why
> we
> > >>>>> want to do this, how a Maven module structure could look like, and
> a
> > >>>>> suggestion for a migration plan.
> > >>>>>
> > >>>>> It would be great to start with the efforts for the 1.8 release
> such
> > >>>>> that new features can be developed in Java and major refactorings
> > such
> > >>>>> as improvements to the connectors and external catalog support are
> > not
> > >>>>> blocked.
> > >>>>>
> > >>>>> Please let me know what you think.
> > >>>>>
> > >>>>> Regards,
> > >>>>> Timo
> > >>>>>
> > >>>>> [1]
> > >>>>>
> > >>>>>
> > >>>
> >
> https://docs.google.com/document/d/1PPo6goW7tOwxmpFuvLSjFnx7BF8IVz0w3dcmPPyqvoY/edit?usp=sharing
> > >>>>>
> > >>>>> Am 02.07.18 um 17:08 schrieb Fabian Hueske:
> > >>>>>> Hi Piotr,
> > >>>>>>
> > >>>>>> thanks for bumping this thread and thanks for Xingcan for the
> > >>> comments.
> > >>>>>> I think the first step would be to separate the flink-table module
> > >>> into
> > >>>>>> multiple sub modules. These could be:
> > >>>>>>
> > >>>>>> - flink-table-api: All API facing classes. Can be later divided
> > >>> further
> > >>>>>> into Java/Scala Table API/SQL
> > >>>>>> - flink-table-planning: involves all planning (basically
> everything
> > >>> we
> > >>>> do
> > >>>>>> with Calcite)
> > >>>>>> - flink-table-runtime: the runtime code
> > >>>>>>
> > >>>>>> IMO, a realistic mid-term goal is to have the runtime module and
> > >>>> certain
> > >>>>>> parts of the planning module ported to Java.
> > >>>>>> The api module will be much harder to port because of several
> > >>>>> dependencies
> > >>>>>> to Scala core classes (the parser framework, tree iterations,
> etc.).
> > >>>> I'm
> > >>>>>> not saying we should not port this to Java, but it is not clear to
> > me
> > >>>>> (yet)
> > >>>>>> how to do it.
> > >>>>>>
> > >>>>>> I think flink-table-runtime should not be too hard to port. The
> code
> > >>>> does
> > >>>>>> not make use of many Scala features, i.e., it's writing very
> > >>> Java-like.
> > >>>>>> Also, there are not many dependencies and operators can be
> > >>> individually
> > >>>>>> ported step-by-step.
> > >>>>>> For flink-table-planning, we can have certain packages that we
> port
> > >>> to
> > >>>>> Java
> > >>>>>> like planning rules or plan nodes. The related classes mostly
> extend
> > >>>>>> Calcite's Java interfaces/classes and would be natural choices for
> > >>>> being
> > >>>>>> ported. The code generation classes will require more effort to
> > port.
> > >>>>> There
> > >>>>>> are also some dependencies in planning on the api module that we
> > >>> would
> > >>>>> need
> > >>>>>> to resolve somehow.
> > >>>>>>
> > >>>>>> For SQL most work when adding new features is done in the planning
> > >>> and
> > >>>>>> runtime modules. So, this separation should already reduce
> > >>>> "technological
> > >>>>>> dept" quite a lot.
> > >>>>>> The Table API depends much more on Scala than SQL.
> > >>>>>>
> > >>>>>> Cheers, Fabian
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> 2018-07-02 16:26 GMT+02:00 Xingcan Cui <xingcanc@xxxxxxxxx>:
> > >>>>>>
> > >>>>>>> Hi all,
> > >>>>>>>
> > >>>>>>> I also think about this problem these days and here are my
> > thoughts.
> > >>>>>>>
> > >>>>>>> 1) We must admit that it’s really a tough task to interoperate
> with
> > >>>> Java
> > >>>>>>> and Scala. E.g., they have different collection types (Scala
> > >>>> collections
> > >>>>>>> v.s. java.util.*) and in Java, it's hard to implement a method
> > which
> > >>>>> takes
> > >>>>>>> Scala functions as parameters. Considering the major part of the
> > >>> code
> > >>>>> base
> > >>>>>>> is implemented in Java, +1 for this goal from a long-term view.
> > >>>>>>>
> > >>>>>>> 2) The ideal solution would be to just expose a Scala API and
> make
> > >>> all
> > >>>>> the
> > >>>>>>> other parts Scala-free. But I am not sure if it could be achieved
> > >>> even
> > >>>>> in a
> > >>>>>>> long-term. Thus as Timo suggested, keep the Scala codes in
> > >>>>>>> "flink-table-core" would be a compromise solution.
> > >>>>>>>
> > >>>>>>> 3) If the community makes the final decision, maybe any new
> > features
> > >>>>>>> should be added in Java (regardless of the modules), in order to
> > >>>> prevent
> > >>>>>>> the Scala codes from growing.
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Xingcan
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> On Jul 2, 2018, at 9:30 PM, Piotr Nowojski <
> > >>> piotr@xxxxxxxxxxxxxxxxx>
> > >>>>>>> wrote:
> > >>>>>>>> Bumping the topic.
> > >>>>>>>>
> > >>>>>>>> If we want to do this, the sooner we decide, the less code we
> will
> > >>>> have
> > >>>>>>> to rewrite. I have some objections/counter proposals to Fabian's
> > >>>>> proposal
> > >>>>>>> of doing it module wise and one module at a time.
> > >>>>>>>> First, I do not see a problem of having java/scala code even
> > within
> > >>>> one
> > >>>>>>> module, especially not if there are clean boundaries. Like we
> could
> > >>>> have
> > >>>>>>> API in Scala and optimizer rules/logical nodes written in Java in
> > >>> the
> > >>>>> same
> > >>>>>>> module. However I haven’t previously maintained mixed scala/java
> > >>> code
> > >>>>> bases
> > >>>>>>> before, so I might be missing something here.
> > >>>>>>>> Secondly this whole migration might and most like will take
> longer
> > >>>> then
> > >>>>>>> expected, so that creates a problem for a new code that we will
> be
> > >>>>>>> creating. After making a decision to migrate to Java, almost any
> > new
> > >>>>> Scala
> > >>>>>>> line of code will be immediately a technological debt and we will
> > >>> have
> > >>>>> to
> > >>>>>>> rewrite it to Java later.
> > >>>>>>>> Thus I would propose first to state our end goal - modules
> > >>> structure
> > >>>>> and
> > >>>>>>> which parts of modules we want to have eventually Scala-free.
> > >>> Secondly
> > >>>>>>> taking all steps necessary that will allow us to write new code
> > >>>>> complaint
> > >>>>>>> with our end goal. Only after that we should/could focus on
> > >>>>> incrementally
> > >>>>>>> rewriting the old code. Otherwise we could be stuck/blocked for
> > >>> years
> > >>>>>>> writing new code in Scala (and increasing technological debt),
> > >>> because
> > >>>>>>> nobody have found a time to rewrite some non important and not
> > >>>> actively
> > >>>>>>> developed part of some module.
> > >>>>>>>> Piotrek
> > >>>>>>>>
> > >>>>>>>>> On 14 Jun 2018, at 15:34, Fabian Hueske <fhueske@xxxxxxxxx>
> > >>> wrote:
> > >>>>>>>>> Hi,
> > >>>>>>>>>
> > >>>>>>>>> In general, I think this is a good effort. However, it won't be
> > >>> easy
> > >>>>>>> and I
> > >>>>>>>>> think we have to plan this well.
> > >>>>>>>>> I don't like the idea of having the whole code base fragmented
> > >>> into
> > >>>>> Java
> > >>>>>>>>> and Scala code for too long.
> > >>>>>>>>>
> > >>>>>>>>> I think we should do this one step at a time and focus on
> > >>> migrating
> > >>>>> one
> > >>>>>>>>> module at a time.
> > >>>>>>>>> IMO, the easiest start would be to port the runtime to Java.
> > >>>>>>>>> Extracting the API classes into an own module, porting them to
> > >>> Java,
> > >>>>> and
> > >>>>>>>>> removing the Scala dependency won't be possible without
> breaking
> > >>> the
> > >>>>> API
> > >>>>>>>>> since a few classes depend on the Scala Table API.
> > >>>>>>>>>
> > >>>>>>>>> Best, Fabian
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> 2018-06-14 10:33 GMT+02:00 Till Rohrmann <trohrmann@xxxxxxxxxx
> >:
> > >>>>>>>>>
> > >>>>>>>>>> I think that is a noble and honorable goal and we should
> strive
> > >>> for
> > >>>>> it.
> > >>>>>>>>>> This, however, must be an iterative process given the sheer
> size
> > >>> of
> > >>>>> the
> > >>>>>>>>>> code base. I like the approach to define common Java modules
> > >>> which
> > >>>>> are
> > >>>>>>> used
> > >>>>>>>>>> by more specific Scala modules and slowly moving classes from
> > >>> Scala
> > >>>>> to
> > >>>>>>>>>> Java. Thus +1 for the proposal.
> > >>>>>>>>>>
> > >>>>>>>>>> Cheers,
> > >>>>>>>>>> Till
> > >>>>>>>>>>
> > >>>>>>>>>> On Wed, Jun 13, 2018 at 12:01 PM Piotr Nowojski <
> > >>>>>>> piotr@xxxxxxxxxxxxxxxxx>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Hi,
> > >>>>>>>>>>>
> > >>>>>>>>>>> I do not have an experience with how scala and java interacts
> > >>> with
> > >>>>>>> each
> > >>>>>>>>>>> other, so I can not fully validate your proposal, but
> generally
> > >>>>>>> speaking
> > >>>>>>>>>> +1
> > >>>>>>>>>>> from me.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Does it also mean, that we should slowly migrate
> > >>>> `flink-table-core`
> > >>>>> to
> > >>>>>>>>>>> Java? How would you envision it? It would be nice to be able
> to
> > >>>> add
> > >>>>>>> new
> > >>>>>>>>>>> classes/features written in Java and so that they can coexist
> > >>> with
> > >>>>> old
> > >>>>>>>>>>> Scala code until we gradually switch from Scala to Java.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Piotrek
> > >>>>>>>>>>>
> > >>>>>>>>>>>> On 13 Jun 2018, at 11:32, Timo Walther <twalthr@xxxxxxxxxx>
> > >>>> wrote:
> > >>>>>>>>>>>> Hi everyone,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> as you all know, currently the Table & SQL API is
> implemented
> > >>> in
> > >>>>>>> Scala.
> > >>>>>>>>>>> This decision was made a long-time ago when the initital code
> > >>> base
> > >>>>> was
> > >>>>>>>>>>> created as part of a master's thesis. The community kept
> Scala
> > >>>>>>> because of
> > >>>>>>>>>>> the nice language features that enable a fluent Table API
> like
> > >>>>>>>>>>> table.select('field.trim()) and because Scala allows for
> quick
> > >>>>>>>>>> prototyping
> > >>>>>>>>>>> (e.g. multi-line comments for code generation). The
> committers
> > >>>>>>> enforced
> > >>>>>>>>>> not
> > >>>>>>>>>>> splitting the code-base into two programming languages.
> > >>>>>>>>>>>> However, nowadays the flink-table module more and more
> becomes
> > >>> an
> > >>>>>>>>>>> important part in the Flink ecosystem. Connectors, formats,
> and
> > >>>> SQL
> > >>>>>>>>>> client
> > >>>>>>>>>>> are actually implemented in Java but need to interoperate
> with
> > >>>>>>>>>> flink-table
> > >>>>>>>>>>> which makes these modules dependent on Scala. As mentioned in
> > an
> > >>>>>>> earlier
> > >>>>>>>>>>> mail thread, using Scala for API classes also exposes member
> > >>>>> variables
> > >>>>>>>>>> and
> > >>>>>>>>>>> methods in Java that should not be exposed to users [1]. Java
> > is
> > >>>>> still
> > >>>>>>>>>> the
> > >>>>>>>>>>> most important API language and right now we treat it as a
> > >>>>>>> second-class
> > >>>>>>>>>>> citizen. I just noticed that you even need to add Scala if
> you
> > >>>> just
> > >>>>>>> want
> > >>>>>>>>>> to
> > >>>>>>>>>>> implement a ScalarFunction because of method clashes between
> > >>>> `public
> > >>>>>>>>>> String
> > >>>>>>>>>>> toString()` and `public scala.Predef.String toString()`.
> > >>>>>>>>>>>> Given the size of the current code base, reimplementing the
> > >>>> entire
> > >>>>>>>>>>> flink-table code in Java is a goal that we might never reach.
> > >>>>>>> However, we
> > >>>>>>>>>>> should at least treat the symptoms and have this as a
> long-term
> > >>>> goal
> > >>>>>>> in
> > >>>>>>>>>>> mind. My suggestion would be to convert user-facing and
> runtime
> > >>>>>>> classes
> > >>>>>>>>>> and
> > >>>>>>>>>>> split the code base into multiple modules:
> > >>>>>>>>>>>>> flink-table-java {depends on flink-table-core}
> > >>>>>>>>>>>> Implemented in Java. Java users can use this. This would
> > >>> require
> > >>>> to
> > >>>>>>>>>>> convert classes like TableEnvironment, Table.
> > >>>>>>>>>>>>> flink-table-scala {depends on flink-table-core}
> > >>>>>>>>>>>> Implemented in Scala. Scala users can use this.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> flink-table-common
> > >>>>>>>>>>>> Implemented in Java. Connectors, formats, and UDFs can use
> > >>> this.
> > >>>> It
> > >>>>>>>>>>> contains interface classes such as descriptors, table sink,
> > >>> table
> > >>>>>>> source.
> > >>>>>>>>>>>>> flink-table-core {depends on flink-table-common and
> > >>>>>>>>>>> flink-table-runtime}
> > >>>>>>>>>>>> Implemented in Scala. Contains the current main code base.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> flink-table-runtime
> > >>>>>>>>>>>> Implemented in Java. This would require to convert classes
> in
> > >>>>>>>>>>> o.a.f.table.runtime but would improve the runtime
> potentially.
> > >>>>>>>>>>>> What do you think?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Regards,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Timo
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> [1]
> > >>>>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.
> > >>>>>>>>>> nabble.com/DISCUSS-Convert-main-Table-API-classes-into-
> > >>>>>>> traits-tp21335.html
> > >>>>>
> >
> >
>