osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Long-term goal of making flink-table Scala-free


Hi Timo,

Thanks for the effort and writing up this document. I like the idea to make
flink-table scala free, so +1 for the proposal!

It's good to make Java the first-class citizen. For a long time, we have
neglected java so that many features in Table are missed in Java Test
cases, such as this one[1] I found recently. And I think we may also need
to migrate our test cases, i.e, add java tests.

This definitely is a big change and will break API compatible. In order to
bring a smaller impact on users, I think we should go fast when we migrate
APIs targeted to users. It's better to introduce the user sensitive changes
within a release. However, it may be not that easy. I can help to
contribute.

Separation of interface and implementation is a good idea. This may
introduce a minimum of dependencies or even no dependencies. I saw your
reply in the google doc. Java8 has already supported static method for
interfaces, I think we can make use of it?

Best,
Hequn

[1] https://issues.apache.org/jira/browse/FLINK-11001


On Fri, Nov 23, 2018 at 5:36 PM Timo Walther <twalthr@xxxxxxxxxx> wrote:

> Hi everyone,
>
> thanks for the great feedback so far. I updated the document with the
> input I got so far
>
> @Fabian: I moved the porting of flink-table-runtime classes up in the list.
>
> @Xiaowei: Could you elaborate what "interface only" means to you? Do you
> mean a module containing pure Java `interface`s? Or is the validation
> logic also part of the API module? Are 50+ expression classes part of
> the API interface or already too implementation-specific?
>
> @Xuefu: I extended the document by almost a page to clarify when we
> should develop in Scala and when in Java. As Piotr said, every new Scala
> line is instant technical debt.
>
> Thanks,
> Timo
>
>
> Am 23.11.18 um 10:29 schrieb Piotr Nowojski:
> > Hi Timo,
> >
> > Thanks for writing this down +1 from my side :)
> >
> >> I'm wondering that whether we can have rule in the interim when Java
> and Scala coexist that dependency can only be one-way. I found that in the
> current code base there are cases where a Scala class extends Java and vise
> versa. This is quite painful. I'm thinking if we could say that extension
> can only be from Java to Scala, which will help the situation. However, I'm
> not sure if this is practical.
> > Xuefu: I’m also not sure what’s the best approach here, probably we will
> have to work it out as we go. One thing to consider is that from now on,
> every single new code line written in Scala anywhere in Flink-table (except
> of Flink-table-api-scala) is an instant technological debt. From this
> perspective I would be in favour of tolerating quite big inchonvieneces
> just to avoid any new Scala code.
> >
> > Piotrek
> >
> >> On 23 Nov 2018, at 03:25, Zhang, Xuefu <xuefu.z@xxxxxxxxxxxxxxx> wrote:
> >>
> >> Hi Timo,
> >>
> >> Thanks for the effort and the Google writeup. During our external
> catalog rework, we found much confusion between Java and Scala, and this
> Scala-free roadmap should greatly mitigate that.
> >>
> >> I'm wondering that whether we can have rule in the interim when Java
> and Scala coexist that dependency can only be one-way. I found that in the
> current code base there are cases where a Scala class extends Java and vise
> versa. This is quite painful. I'm thinking if we could say that extension
> can only be from Java to Scala, which will help the situation. However, I'm
> not sure if this is practical.
> >>
> >> Thanks,
> >> Xuefu
> >>
> >>
> >> ------------------------------------------------------------------
> >> Sender:jincheng sun <sunjincheng121@xxxxxxxxx>
> >> Sent at:2018 Nov 23 (Fri) 09:49
> >> Recipient:dev <dev@xxxxxxxxxxxxxxxx>
> >> Subject:Re: [DISCUSS] Long-term goal of making flink-table Scala-free
> >>
> >> Hi Timo,
> >> Thanks for initiating this great discussion.
> >>
> >> Currently when using SQL/TableAPI should include many dependence. In
> >> particular, it is not necessary to introduce the specific implementation
> >> dependencies which users do not care about. So I am glad to see your
> >> proposal, and hope when we consider splitting the API interface into a
> >> separate module, so that the user can introduce minimum of dependencies.
> >>
> >> So, +1 to [separation of interface and implementation; e.g. `Table` &
> >> `TableImpl`] which you mentioned in the google doc.
> >> Best,
> >> Jincheng
> >>
> >> Xiaowei Jiang <xiaoweij@xxxxxxxxx> 于2018年11月22日周四 下午10:50写道:
> >>
> >>> Hi Timo, thanks for driving this! I think that this is a nice thing to
> do.
> >>> While we are doing this, can we also keep in mind that we want to
> >>> eventually have a TableAPI interface only module which users can take
> >>> dependency on, but without including any implementation details?
> >>>
> >>> Xiaowei
> >>>
> >>> On Thu, Nov 22, 2018 at 6:37 PM Fabian Hueske <fhueske@xxxxxxxxx>
> wrote:
> >>>
> >>>> Hi Timo,
> >>>>
> >>>> Thanks for writing up this document.
> >>>> I like the new structure and agree to prioritize the porting of the
> >>>> flink-table-common classes.
> >>>> Since flink-table-runtime is (or should be) independent of the API and
> >>>> planner modules, we could start porting these classes once the code is
> >>>> split into the new module structure.
> >>>> The benefits of a Scala-free flink-table-runtime would be a Scala-free
> >>>> execution Jar.
> >>>>
> >>>> Best, Fabian
> >>>>
> >>>>
> >>>> Am Do., 22. Nov. 2018 um 10:54 Uhr schrieb Timo Walther <
> >>>> twalthr@xxxxxxxxxx
> >>>>> :
> >>>>> Hi everyone,
> >>>>>
> >>>>> I would like to continue this discussion thread and convert the
> outcome
> >>>>> into a FLIP such that users and contributors know what to expect in
> the
> >>>>> upcoming releases.
> >>>>>
> >>>>> I created a design document [1] that clarifies our motivation why we
> >>>>> want to do this, how a Maven module structure could look like, and a
> >>>>> suggestion for a migration plan.
> >>>>>
> >>>>> It would be great to start with the efforts for the 1.8 release such
> >>>>> that new features can be developed in Java and major refactorings
> such
> >>>>> as improvements to the connectors and external catalog support are
> not
> >>>>> blocked.
> >>>>>
> >>>>> Please let me know what you think.
> >>>>>
> >>>>> Regards,
> >>>>> Timo
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>>
> >>>
> https://docs.google.com/document/d/1PPo6goW7tOwxmpFuvLSjFnx7BF8IVz0w3dcmPPyqvoY/edit?usp=sharing
> >>>>>
> >>>>> Am 02.07.18 um 17:08 schrieb Fabian Hueske:
> >>>>>> Hi Piotr,
> >>>>>>
> >>>>>> thanks for bumping this thread and thanks for Xingcan for the
> >>> comments.
> >>>>>> I think the first step would be to separate the flink-table module
> >>> into
> >>>>>> multiple sub modules. These could be:
> >>>>>>
> >>>>>> - flink-table-api: All API facing classes. Can be later divided
> >>> further
> >>>>>> into Java/Scala Table API/SQL
> >>>>>> - flink-table-planning: involves all planning (basically everything
> >>> we
> >>>> do
> >>>>>> with Calcite)
> >>>>>> - flink-table-runtime: the runtime code
> >>>>>>
> >>>>>> IMO, a realistic mid-term goal is to have the runtime module and
> >>>> certain
> >>>>>> parts of the planning module ported to Java.
> >>>>>> The api module will be much harder to port because of several
> >>>>> dependencies
> >>>>>> to Scala core classes (the parser framework, tree iterations, etc.).
> >>>> I'm
> >>>>>> not saying we should not port this to Java, but it is not clear to
> me
> >>>>> (yet)
> >>>>>> how to do it.
> >>>>>>
> >>>>>> I think flink-table-runtime should not be too hard to port. The code
> >>>> does
> >>>>>> not make use of many Scala features, i.e., it's writing very
> >>> Java-like.
> >>>>>> Also, there are not many dependencies and operators can be
> >>> individually
> >>>>>> ported step-by-step.
> >>>>>> For flink-table-planning, we can have certain packages that we port
> >>> to
> >>>>> Java
> >>>>>> like planning rules or plan nodes. The related classes mostly extend
> >>>>>> Calcite's Java interfaces/classes and would be natural choices for
> >>>> being
> >>>>>> ported. The code generation classes will require more effort to
> port.
> >>>>> There
> >>>>>> are also some dependencies in planning on the api module that we
> >>> would
> >>>>> need
> >>>>>> to resolve somehow.
> >>>>>>
> >>>>>> For SQL most work when adding new features is done in the planning
> >>> and
> >>>>>> runtime modules. So, this separation should already reduce
> >>>> "technological
> >>>>>> dept" quite a lot.
> >>>>>> The Table API depends much more on Scala than SQL.
> >>>>>>
> >>>>>> Cheers, Fabian
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> 2018-07-02 16:26 GMT+02:00 Xingcan Cui <xingcanc@xxxxxxxxx>:
> >>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I also think about this problem these days and here are my
> thoughts.
> >>>>>>>
> >>>>>>> 1) We must admit that it’s really a tough task to interoperate with
> >>>> Java
> >>>>>>> and Scala. E.g., they have different collection types (Scala
> >>>> collections
> >>>>>>> v.s. java.util.*) and in Java, it's hard to implement a method
> which
> >>>>> takes
> >>>>>>> Scala functions as parameters. Considering the major part of the
> >>> code
> >>>>> base
> >>>>>>> is implemented in Java, +1 for this goal from a long-term view.
> >>>>>>>
> >>>>>>> 2) The ideal solution would be to just expose a Scala API and make
> >>> all
> >>>>> the
> >>>>>>> other parts Scala-free. But I am not sure if it could be achieved
> >>> even
> >>>>> in a
> >>>>>>> long-term. Thus as Timo suggested, keep the Scala codes in
> >>>>>>> "flink-table-core" would be a compromise solution.
> >>>>>>>
> >>>>>>> 3) If the community makes the final decision, maybe any new
> features
> >>>>>>> should be added in Java (regardless of the modules), in order to
> >>>> prevent
> >>>>>>> the Scala codes from growing.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Xingcan
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Jul 2, 2018, at 9:30 PM, Piotr Nowojski <
> >>> piotr@xxxxxxxxxxxxxxxxx>
> >>>>>>> wrote:
> >>>>>>>> Bumping the topic.
> >>>>>>>>
> >>>>>>>> If we want to do this, the sooner we decide, the less code we will
> >>>> have
> >>>>>>> to rewrite. I have some objections/counter proposals to Fabian's
> >>>>> proposal
> >>>>>>> of doing it module wise and one module at a time.
> >>>>>>>> First, I do not see a problem of having java/scala code even
> within
> >>>> one
> >>>>>>> module, especially not if there are clean boundaries. Like we could
> >>>> have
> >>>>>>> API in Scala and optimizer rules/logical nodes written in Java in
> >>> the
> >>>>> same
> >>>>>>> module. However I haven’t previously maintained mixed scala/java
> >>> code
> >>>>> bases
> >>>>>>> before, so I might be missing something here.
> >>>>>>>> Secondly this whole migration might and most like will take longer
> >>>> then
> >>>>>>> expected, so that creates a problem for a new code that we will be
> >>>>>>> creating. After making a decision to migrate to Java, almost any
> new
> >>>>> Scala
> >>>>>>> line of code will be immediately a technological debt and we will
> >>> have
> >>>>> to
> >>>>>>> rewrite it to Java later.
> >>>>>>>> Thus I would propose first to state our end goal - modules
> >>> structure
> >>>>> and
> >>>>>>> which parts of modules we want to have eventually Scala-free.
> >>> Secondly
> >>>>>>> taking all steps necessary that will allow us to write new code
> >>>>> complaint
> >>>>>>> with our end goal. Only after that we should/could focus on
> >>>>> incrementally
> >>>>>>> rewriting the old code. Otherwise we could be stuck/blocked for
> >>> years
> >>>>>>> writing new code in Scala (and increasing technological debt),
> >>> because
> >>>>>>> nobody have found a time to rewrite some non important and not
> >>>> actively
> >>>>>>> developed part of some module.
> >>>>>>>> Piotrek
> >>>>>>>>
> >>>>>>>>> On 14 Jun 2018, at 15:34, Fabian Hueske <fhueske@xxxxxxxxx>
> >>> wrote:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> In general, I think this is a good effort. However, it won't be
> >>> easy
> >>>>>>> and I
> >>>>>>>>> think we have to plan this well.
> >>>>>>>>> I don't like the idea of having the whole code base fragmented
> >>> into
> >>>>> Java
> >>>>>>>>> and Scala code for too long.
> >>>>>>>>>
> >>>>>>>>> I think we should do this one step at a time and focus on
> >>> migrating
> >>>>> one
> >>>>>>>>> module at a time.
> >>>>>>>>> IMO, the easiest start would be to port the runtime to Java.
> >>>>>>>>> Extracting the API classes into an own module, porting them to
> >>> Java,
> >>>>> and
> >>>>>>>>> removing the Scala dependency won't be possible without breaking
> >>> the
> >>>>> API
> >>>>>>>>> since a few classes depend on the Scala Table API.
> >>>>>>>>>
> >>>>>>>>> Best, Fabian
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> 2018-06-14 10:33 GMT+02:00 Till Rohrmann <trohrmann@xxxxxxxxxx>:
> >>>>>>>>>
> >>>>>>>>>> I think that is a noble and honorable goal and we should strive
> >>> for
> >>>>> it.
> >>>>>>>>>> This, however, must be an iterative process given the sheer size
> >>> of
> >>>>> the
> >>>>>>>>>> code base. I like the approach to define common Java modules
> >>> which
> >>>>> are
> >>>>>>> used
> >>>>>>>>>> by more specific Scala modules and slowly moving classes from
> >>> Scala
> >>>>> to
> >>>>>>>>>> Java. Thus +1 for the proposal.
> >>>>>>>>>>
> >>>>>>>>>> Cheers,
> >>>>>>>>>> Till
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Jun 13, 2018 at 12:01 PM Piotr Nowojski <
> >>>>>>> piotr@xxxxxxxxxxxxxxxxx>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi,
> >>>>>>>>>>>
> >>>>>>>>>>> I do not have an experience with how scala and java interacts
> >>> with
> >>>>>>> each
> >>>>>>>>>>> other, so I can not fully validate your proposal, but generally
> >>>>>>> speaking
> >>>>>>>>>> +1
> >>>>>>>>>>> from me.
> >>>>>>>>>>>
> >>>>>>>>>>> Does it also mean, that we should slowly migrate
> >>>> `flink-table-core`
> >>>>> to
> >>>>>>>>>>> Java? How would you envision it? It would be nice to be able to
> >>>> add
> >>>>>>> new
> >>>>>>>>>>> classes/features written in Java and so that they can coexist
> >>> with
> >>>>> old
> >>>>>>>>>>> Scala code until we gradually switch from Scala to Java.
> >>>>>>>>>>>
> >>>>>>>>>>> Piotrek
> >>>>>>>>>>>
> >>>>>>>>>>>> On 13 Jun 2018, at 11:32, Timo Walther <twalthr@xxxxxxxxxx>
> >>>> wrote:
> >>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>
> >>>>>>>>>>>> as you all know, currently the Table & SQL API is implemented
> >>> in
> >>>>>>> Scala.
> >>>>>>>>>>> This decision was made a long-time ago when the initital code
> >>> base
> >>>>> was
> >>>>>>>>>>> created as part of a master's thesis. The community kept Scala
> >>>>>>> because of
> >>>>>>>>>>> the nice language features that enable a fluent Table API like
> >>>>>>>>>>> table.select('field.trim()) and because Scala allows for quick
> >>>>>>>>>> prototyping
> >>>>>>>>>>> (e.g. multi-line comments for code generation). The committers
> >>>>>>> enforced
> >>>>>>>>>> not
> >>>>>>>>>>> splitting the code-base into two programming languages.
> >>>>>>>>>>>> However, nowadays the flink-table module more and more becomes
> >>> an
> >>>>>>>>>>> important part in the Flink ecosystem. Connectors, formats, and
> >>>> SQL
> >>>>>>>>>> client
> >>>>>>>>>>> are actually implemented in Java but need to interoperate with
> >>>>>>>>>> flink-table
> >>>>>>>>>>> which makes these modules dependent on Scala. As mentioned in
> an
> >>>>>>> earlier
> >>>>>>>>>>> mail thread, using Scala for API classes also exposes member
> >>>>> variables
> >>>>>>>>>> and
> >>>>>>>>>>> methods in Java that should not be exposed to users [1]. Java
> is
> >>>>> still
> >>>>>>>>>> the
> >>>>>>>>>>> most important API language and right now we treat it as a
> >>>>>>> second-class
> >>>>>>>>>>> citizen. I just noticed that you even need to add Scala if you
> >>>> just
> >>>>>>> want
> >>>>>>>>>> to
> >>>>>>>>>>> implement a ScalarFunction because of method clashes between
> >>>> `public
> >>>>>>>>>> String
> >>>>>>>>>>> toString()` and `public scala.Predef.String toString()`.
> >>>>>>>>>>>> Given the size of the current code base, reimplementing the
> >>>> entire
> >>>>>>>>>>> flink-table code in Java is a goal that we might never reach.
> >>>>>>> However, we
> >>>>>>>>>>> should at least treat the symptoms and have this as a long-term
> >>>> goal
> >>>>>>> in
> >>>>>>>>>>> mind. My suggestion would be to convert user-facing and runtime
> >>>>>>> classes
> >>>>>>>>>> and
> >>>>>>>>>>> split the code base into multiple modules:
> >>>>>>>>>>>>> flink-table-java {depends on flink-table-core}
> >>>>>>>>>>>> Implemented in Java. Java users can use this. This would
> >>> require
> >>>> to
> >>>>>>>>>>> convert classes like TableEnvironment, Table.
> >>>>>>>>>>>>> flink-table-scala {depends on flink-table-core}
> >>>>>>>>>>>> Implemented in Scala. Scala users can use this.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> flink-table-common
> >>>>>>>>>>>> Implemented in Java. Connectors, formats, and UDFs can use
> >>> this.
> >>>> It
> >>>>>>>>>>> contains interface classes such as descriptors, table sink,
> >>> table
> >>>>>>> source.
> >>>>>>>>>>>>> flink-table-core {depends on flink-table-common and
> >>>>>>>>>>> flink-table-runtime}
> >>>>>>>>>>>> Implemented in Scala. Contains the current main code base.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> flink-table-runtime
> >>>>>>>>>>>> Implemented in Java. This would require to convert classes in
> >>>>>>>>>>> o.a.f.table.runtime but would improve the runtime potentially.
> >>>>>>>>>>>> What do you think?
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Regards,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Timo
> >>>>>>>>>>>>
> >>>>>>>>>>>> [1]
> >>>>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.
> >>>>>>>>>> nabble.com/DISCUSS-Convert-main-Table-API-classes-into-
> >>>>>>> traits-tp21335.html
> >>>>>
>
>