osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Long-term goal of making flink-table Scala-free


Hi Timo,
Thanks for initiating this great discussion.

Currently when using SQL/TableAPI should include many dependence. In
particular, it is not necessary to introduce the specific implementation
dependencies which users do not care about. So I am glad to see your
proposal, and hope when we consider splitting the API interface into a
separate module, so that the user can introduce minimum of dependencies.

So, +1 to [separation of interface and implementation; e.g. `Table` &
`TableImpl`] which you mentioned in the google doc.
Best,
Jincheng

Xiaowei Jiang <xiaoweij@xxxxxxxxx> 于2018年11月22日周四 下午10:50写道:

> Hi Timo, thanks for driving this! I think that this is a nice thing to do.
> While we are doing this, can we also keep in mind that we want to
> eventually have a TableAPI interface only module which users can take
> dependency on, but without including any implementation details?
>
> Xiaowei
>
> On Thu, Nov 22, 2018 at 6:37 PM Fabian Hueske <fhueske@xxxxxxxxx> wrote:
>
> > Hi Timo,
> >
> > Thanks for writing up this document.
> > I like the new structure and agree to prioritize the porting of the
> > flink-table-common classes.
> > Since flink-table-runtime is (or should be) independent of the API and
> > planner modules, we could start porting these classes once the code is
> > split into the new module structure.
> > The benefits of a Scala-free flink-table-runtime would be a Scala-free
> > execution Jar.
> >
> > Best, Fabian
> >
> >
> > Am Do., 22. Nov. 2018 um 10:54 Uhr schrieb Timo Walther <
> > twalthr@xxxxxxxxxx
> > >:
> >
> > > Hi everyone,
> > >
> > > I would like to continue this discussion thread and convert the outcome
> > > into a FLIP such that users and contributors know what to expect in the
> > > upcoming releases.
> > >
> > > I created a design document [1] that clarifies our motivation why we
> > > want to do this, how a Maven module structure could look like, and a
> > > suggestion for a migration plan.
> > >
> > > It would be great to start with the efforts for the 1.8 release such
> > > that new features can be developed in Java and major refactorings such
> > > as improvements to the connectors and external catalog support are not
> > > blocked.
> > >
> > > Please let me know what you think.
> > >
> > > Regards,
> > > Timo
> > >
> > > [1]
> > >
> > >
> >
> https://docs.google.com/document/d/1PPo6goW7tOwxmpFuvLSjFnx7BF8IVz0w3dcmPPyqvoY/edit?usp=sharing
> > >
> > >
> > > Am 02.07.18 um 17:08 schrieb Fabian Hueske:
> > > > Hi Piotr,
> > > >
> > > > thanks for bumping this thread and thanks for Xingcan for the
> comments.
> > > >
> > > > I think the first step would be to separate the flink-table module
> into
> > > > multiple sub modules. These could be:
> > > >
> > > > - flink-table-api: All API facing classes. Can be later divided
> further
> > > > into Java/Scala Table API/SQL
> > > > - flink-table-planning: involves all planning (basically everything
> we
> > do
> > > > with Calcite)
> > > > - flink-table-runtime: the runtime code
> > > >
> > > > IMO, a realistic mid-term goal is to have the runtime module and
> > certain
> > > > parts of the planning module ported to Java.
> > > > The api module will be much harder to port because of several
> > > dependencies
> > > > to Scala core classes (the parser framework, tree iterations, etc.).
> > I'm
> > > > not saying we should not port this to Java, but it is not clear to me
> > > (yet)
> > > > how to do it.
> > > >
> > > > I think flink-table-runtime should not be too hard to port. The code
> > does
> > > > not make use of many Scala features, i.e., it's writing very
> Java-like.
> > > > Also, there are not many dependencies and operators can be
> individually
> > > > ported step-by-step.
> > > > For flink-table-planning, we can have certain packages that we port
> to
> > > Java
> > > > like planning rules or plan nodes. The related classes mostly extend
> > > > Calcite's Java interfaces/classes and would be natural choices for
> > being
> > > > ported. The code generation classes will require more effort to port.
> > > There
> > > > are also some dependencies in planning on the api module that we
> would
> > > need
> > > > to resolve somehow.
> > > >
> > > > For SQL most work when adding new features is done in the planning
> and
> > > > runtime modules. So, this separation should already reduce
> > "technological
> > > > dept" quite a lot.
> > > > The Table API depends much more on Scala than SQL.
> > > >
> > > > Cheers, Fabian
> > > >
> > > >
> > > >
> > > > 2018-07-02 16:26 GMT+02:00 Xingcan Cui <xingcanc@xxxxxxxxx>:
> > > >
> > > >> Hi all,
> > > >>
> > > >> I also think about this problem these days and here are my thoughts.
> > > >>
> > > >> 1) We must admit that it’s really a tough task to interoperate with
> > Java
> > > >> and Scala. E.g., they have different collection types (Scala
> > collections
> > > >> v.s. java.util.*) and in Java, it's hard to implement a method which
> > > takes
> > > >> Scala functions as parameters. Considering the major part of the
> code
> > > base
> > > >> is implemented in Java, +1 for this goal from a long-term view.
> > > >>
> > > >> 2) The ideal solution would be to just expose a Scala API and make
> all
> > > the
> > > >> other parts Scala-free. But I am not sure if it could be achieved
> even
> > > in a
> > > >> long-term. Thus as Timo suggested, keep the Scala codes in
> > > >> "flink-table-core" would be a compromise solution.
> > > >>
> > > >> 3) If the community makes the final decision, maybe any new features
> > > >> should be added in Java (regardless of the modules), in order to
> > prevent
> > > >> the Scala codes from growing.
> > > >>
> > > >> Best,
> > > >> Xingcan
> > > >>
> > > >>
> > > >>> On Jul 2, 2018, at 9:30 PM, Piotr Nowojski <
> piotr@xxxxxxxxxxxxxxxxx>
> > > >> wrote:
> > > >>> Bumping the topic.
> > > >>>
> > > >>> If we want to do this, the sooner we decide, the less code we will
> > have
> > > >> to rewrite. I have some objections/counter proposals to Fabian's
> > > proposal
> > > >> of doing it module wise and one module at a time.
> > > >>> First, I do not see a problem of having java/scala code even within
> > one
> > > >> module, especially not if there are clean boundaries. Like we could
> > have
> > > >> API in Scala and optimizer rules/logical nodes written in Java in
> the
> > > same
> > > >> module. However I haven’t previously maintained mixed scala/java
> code
> > > bases
> > > >> before, so I might be missing something here.
> > > >>> Secondly this whole migration might and most like will take longer
> > then
> > > >> expected, so that creates a problem for a new code that we will be
> > > >> creating. After making a decision to migrate to Java, almost any new
> > > Scala
> > > >> line of code will be immediately a technological debt and we will
> have
> > > to
> > > >> rewrite it to Java later.
> > > >>> Thus I would propose first to state our end goal - modules
> structure
> > > and
> > > >> which parts of modules we want to have eventually Scala-free.
> Secondly
> > > >> taking all steps necessary that will allow us to write new code
> > > complaint
> > > >> with our end goal. Only after that we should/could focus on
> > > incrementally
> > > >> rewriting the old code. Otherwise we could be stuck/blocked for
> years
> > > >> writing new code in Scala (and increasing technological debt),
> because
> > > >> nobody have found a time to rewrite some non important and not
> > actively
> > > >> developed part of some module.
> > > >>> Piotrek
> > > >>>
> > > >>>> On 14 Jun 2018, at 15:34, Fabian Hueske <fhueske@xxxxxxxxx>
> wrote:
> > > >>>>
> > > >>>> Hi,
> > > >>>>
> > > >>>> In general, I think this is a good effort. However, it won't be
> easy
> > > >> and I
> > > >>>> think we have to plan this well.
> > > >>>> I don't like the idea of having the whole code base fragmented
> into
> > > Java
> > > >>>> and Scala code for too long.
> > > >>>>
> > > >>>> I think we should do this one step at a time and focus on
> migrating
> > > one
> > > >>>> module at a time.
> > > >>>> IMO, the easiest start would be to port the runtime to Java.
> > > >>>> Extracting the API classes into an own module, porting them to
> Java,
> > > and
> > > >>>> removing the Scala dependency won't be possible without breaking
> the
> > > API
> > > >>>> since a few classes depend on the Scala Table API.
> > > >>>>
> > > >>>> Best, Fabian
> > > >>>>
> > > >>>>
> > > >>>> 2018-06-14 10:33 GMT+02:00 Till Rohrmann <trohrmann@xxxxxxxxxx>:
> > > >>>>
> > > >>>>> I think that is a noble and honorable goal and we should strive
> for
> > > it.
> > > >>>>> This, however, must be an iterative process given the sheer size
> of
> > > the
> > > >>>>> code base. I like the approach to define common Java modules
> which
> > > are
> > > >> used
> > > >>>>> by more specific Scala modules and slowly moving classes from
> Scala
> > > to
> > > >>>>> Java. Thus +1 for the proposal.
> > > >>>>>
> > > >>>>> Cheers,
> > > >>>>> Till
> > > >>>>>
> > > >>>>> On Wed, Jun 13, 2018 at 12:01 PM Piotr Nowojski <
> > > >> piotr@xxxxxxxxxxxxxxxxx>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hi,
> > > >>>>>>
> > > >>>>>> I do not have an experience with how scala and java interacts
> with
> > > >> each
> > > >>>>>> other, so I can not fully validate your proposal, but generally
> > > >> speaking
> > > >>>>> +1
> > > >>>>>> from me.
> > > >>>>>>
> > > >>>>>> Does it also mean, that we should slowly migrate
> > `flink-table-core`
> > > to
> > > >>>>>> Java? How would you envision it? It would be nice to be able to
> > add
> > > >> new
> > > >>>>>> classes/features written in Java and so that they can coexist
> with
> > > old
> > > >>>>>> Scala code until we gradually switch from Scala to Java.
> > > >>>>>>
> > > >>>>>> Piotrek
> > > >>>>>>
> > > >>>>>>> On 13 Jun 2018, at 11:32, Timo Walther <twalthr@xxxxxxxxxx>
> > wrote:
> > > >>>>>>>
> > > >>>>>>> Hi everyone,
> > > >>>>>>>
> > > >>>>>>> as you all know, currently the Table & SQL API is implemented
> in
> > > >> Scala.
> > > >>>>>> This decision was made a long-time ago when the initital code
> base
> > > was
> > > >>>>>> created as part of a master's thesis. The community kept Scala
> > > >> because of
> > > >>>>>> the nice language features that enable a fluent Table API like
> > > >>>>>> table.select('field.trim()) and because Scala allows for quick
> > > >>>>> prototyping
> > > >>>>>> (e.g. multi-line comments for code generation). The committers
> > > >> enforced
> > > >>>>> not
> > > >>>>>> splitting the code-base into two programming languages.
> > > >>>>>>> However, nowadays the flink-table module more and more becomes
> an
> > > >>>>>> important part in the Flink ecosystem. Connectors, formats, and
> > SQL
> > > >>>>> client
> > > >>>>>> are actually implemented in Java but need to interoperate with
> > > >>>>> flink-table
> > > >>>>>> which makes these modules dependent on Scala. As mentioned in an
> > > >> earlier
> > > >>>>>> mail thread, using Scala for API classes also exposes member
> > > variables
> > > >>>>> and
> > > >>>>>> methods in Java that should not be exposed to users [1]. Java is
> > > still
> > > >>>>> the
> > > >>>>>> most important API language and right now we treat it as a
> > > >> second-class
> > > >>>>>> citizen. I just noticed that you even need to add Scala if you
> > just
> > > >> want
> > > >>>>> to
> > > >>>>>> implement a ScalarFunction because of method clashes between
> > `public
> > > >>>>> String
> > > >>>>>> toString()` and `public scala.Predef.String toString()`.
> > > >>>>>>> Given the size of the current code base, reimplementing the
> > entire
> > > >>>>>> flink-table code in Java is a goal that we might never reach.
> > > >> However, we
> > > >>>>>> should at least treat the symptoms and have this as a long-term
> > goal
> > > >> in
> > > >>>>>> mind. My suggestion would be to convert user-facing and runtime
> > > >> classes
> > > >>>>> and
> > > >>>>>> split the code base into multiple modules:
> > > >>>>>>>> flink-table-java {depends on flink-table-core}
> > > >>>>>>> Implemented in Java. Java users can use this. This would
> require
> > to
> > > >>>>>> convert classes like TableEnvironment, Table.
> > > >>>>>>>> flink-table-scala {depends on flink-table-core}
> > > >>>>>>> Implemented in Scala. Scala users can use this.
> > > >>>>>>>
> > > >>>>>>>> flink-table-common
> > > >>>>>>> Implemented in Java. Connectors, formats, and UDFs can use
> this.
> > It
> > > >>>>>> contains interface classes such as descriptors, table sink,
> table
> > > >> source.
> > > >>>>>>>> flink-table-core {depends on flink-table-common and
> > > >>>>>> flink-table-runtime}
> > > >>>>>>> Implemented in Scala. Contains the current main code base.
> > > >>>>>>>
> > > >>>>>>>> flink-table-runtime
> > > >>>>>>> Implemented in Java. This would require to convert classes in
> > > >>>>>> o.a.f.table.runtime but would improve the runtime potentially.
> > > >>>>>>>
> > > >>>>>>> What do you think?
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Regards,
> > > >>>>>>>
> > > >>>>>>> Timo
> > > >>>>>>>
> > > >>>>>>> [1]
> > > >>>>>> http://apache-flink-mailing-list-archive.1008284.n3.
> > > >>>>> nabble.com/DISCUSS-Convert-main-Table-API-classes-into-
> > > >> traits-tp21335.html
> > > >>>>>>
> > > >>
> > >
> > >
> >
>