osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Long-term goal of making flink-table Scala-free


Hi Timo, thanks for driving this! I think that this is a nice thing to do.
While we are doing this, can we also keep in mind that we want to
eventually have a TableAPI interface only module which users can take
dependency on, but without including any implementation details?

Xiaowei

On Thu, Nov 22, 2018 at 6:37 PM Fabian Hueske <fhueske@xxxxxxxxx> wrote:

> Hi Timo,
>
> Thanks for writing up this document.
> I like the new structure and agree to prioritize the porting of the
> flink-table-common classes.
> Since flink-table-runtime is (or should be) independent of the API and
> planner modules, we could start porting these classes once the code is
> split into the new module structure.
> The benefits of a Scala-free flink-table-runtime would be a Scala-free
> execution Jar.
>
> Best, Fabian
>
>
> Am Do., 22. Nov. 2018 um 10:54 Uhr schrieb Timo Walther <
> twalthr@xxxxxxxxxx
> >:
>
> > Hi everyone,
> >
> > I would like to continue this discussion thread and convert the outcome
> > into a FLIP such that users and contributors know what to expect in the
> > upcoming releases.
> >
> > I created a design document [1] that clarifies our motivation why we
> > want to do this, how a Maven module structure could look like, and a
> > suggestion for a migration plan.
> >
> > It would be great to start with the efforts for the 1.8 release such
> > that new features can be developed in Java and major refactorings such
> > as improvements to the connectors and external catalog support are not
> > blocked.
> >
> > Please let me know what you think.
> >
> > Regards,
> > Timo
> >
> > [1]
> >
> >
> https://docs.google.com/document/d/1PPo6goW7tOwxmpFuvLSjFnx7BF8IVz0w3dcmPPyqvoY/edit?usp=sharing
> >
> >
> > Am 02.07.18 um 17:08 schrieb Fabian Hueske:
> > > Hi Piotr,
> > >
> > > thanks for bumping this thread and thanks for Xingcan for the comments.
> > >
> > > I think the first step would be to separate the flink-table module into
> > > multiple sub modules. These could be:
> > >
> > > - flink-table-api: All API facing classes. Can be later divided further
> > > into Java/Scala Table API/SQL
> > > - flink-table-planning: involves all planning (basically everything we
> do
> > > with Calcite)
> > > - flink-table-runtime: the runtime code
> > >
> > > IMO, a realistic mid-term goal is to have the runtime module and
> certain
> > > parts of the planning module ported to Java.
> > > The api module will be much harder to port because of several
> > dependencies
> > > to Scala core classes (the parser framework, tree iterations, etc.).
> I'm
> > > not saying we should not port this to Java, but it is not clear to me
> > (yet)
> > > how to do it.
> > >
> > > I think flink-table-runtime should not be too hard to port. The code
> does
> > > not make use of many Scala features, i.e., it's writing very Java-like.
> > > Also, there are not many dependencies and operators can be individually
> > > ported step-by-step.
> > > For flink-table-planning, we can have certain packages that we port to
> > Java
> > > like planning rules or plan nodes. The related classes mostly extend
> > > Calcite's Java interfaces/classes and would be natural choices for
> being
> > > ported. The code generation classes will require more effort to port.
> > There
> > > are also some dependencies in planning on the api module that we would
> > need
> > > to resolve somehow.
> > >
> > > For SQL most work when adding new features is done in the planning and
> > > runtime modules. So, this separation should already reduce
> "technological
> > > dept" quite a lot.
> > > The Table API depends much more on Scala than SQL.
> > >
> > > Cheers, Fabian
> > >
> > >
> > >
> > > 2018-07-02 16:26 GMT+02:00 Xingcan Cui <xingcanc@xxxxxxxxx>:
> > >
> > >> Hi all,
> > >>
> > >> I also think about this problem these days and here are my thoughts.
> > >>
> > >> 1) We must admit that it’s really a tough task to interoperate with
> Java
> > >> and Scala. E.g., they have different collection types (Scala
> collections
> > >> v.s. java.util.*) and in Java, it's hard to implement a method which
> > takes
> > >> Scala functions as parameters. Considering the major part of the code
> > base
> > >> is implemented in Java, +1 for this goal from a long-term view.
> > >>
> > >> 2) The ideal solution would be to just expose a Scala API and make all
> > the
> > >> other parts Scala-free. But I am not sure if it could be achieved even
> > in a
> > >> long-term. Thus as Timo suggested, keep the Scala codes in
> > >> "flink-table-core" would be a compromise solution.
> > >>
> > >> 3) If the community makes the final decision, maybe any new features
> > >> should be added in Java (regardless of the modules), in order to
> prevent
> > >> the Scala codes from growing.
> > >>
> > >> Best,
> > >> Xingcan
> > >>
> > >>
> > >>> On Jul 2, 2018, at 9:30 PM, Piotr Nowojski <piotr@xxxxxxxxxxxxxxxxx>
> > >> wrote:
> > >>> Bumping the topic.
> > >>>
> > >>> If we want to do this, the sooner we decide, the less code we will
> have
> > >> to rewrite. I have some objections/counter proposals to Fabian's
> > proposal
> > >> of doing it module wise and one module at a time.
> > >>> First, I do not see a problem of having java/scala code even within
> one
> > >> module, especially not if there are clean boundaries. Like we could
> have
> > >> API in Scala and optimizer rules/logical nodes written in Java in the
> > same
> > >> module. However I haven’t previously maintained mixed scala/java code
> > bases
> > >> before, so I might be missing something here.
> > >>> Secondly this whole migration might and most like will take longer
> then
> > >> expected, so that creates a problem for a new code that we will be
> > >> creating. After making a decision to migrate to Java, almost any new
> > Scala
> > >> line of code will be immediately a technological debt and we will have
> > to
> > >> rewrite it to Java later.
> > >>> Thus I would propose first to state our end goal - modules structure
> > and
> > >> which parts of modules we want to have eventually Scala-free. Secondly
> > >> taking all steps necessary that will allow us to write new code
> > complaint
> > >> with our end goal. Only after that we should/could focus on
> > incrementally
> > >> rewriting the old code. Otherwise we could be stuck/blocked for years
> > >> writing new code in Scala (and increasing technological debt), because
> > >> nobody have found a time to rewrite some non important and not
> actively
> > >> developed part of some module.
> > >>> Piotrek
> > >>>
> > >>>> On 14 Jun 2018, at 15:34, Fabian Hueske <fhueske@xxxxxxxxx> wrote:
> > >>>>
> > >>>> Hi,
> > >>>>
> > >>>> In general, I think this is a good effort. However, it won't be easy
> > >> and I
> > >>>> think we have to plan this well.
> > >>>> I don't like the idea of having the whole code base fragmented into
> > Java
> > >>>> and Scala code for too long.
> > >>>>
> > >>>> I think we should do this one step at a time and focus on migrating
> > one
> > >>>> module at a time.
> > >>>> IMO, the easiest start would be to port the runtime to Java.
> > >>>> Extracting the API classes into an own module, porting them to Java,
> > and
> > >>>> removing the Scala dependency won't be possible without breaking the
> > API
> > >>>> since a few classes depend on the Scala Table API.
> > >>>>
> > >>>> Best, Fabian
> > >>>>
> > >>>>
> > >>>> 2018-06-14 10:33 GMT+02:00 Till Rohrmann <trohrmann@xxxxxxxxxx>:
> > >>>>
> > >>>>> I think that is a noble and honorable goal and we should strive for
> > it.
> > >>>>> This, however, must be an iterative process given the sheer size of
> > the
> > >>>>> code base. I like the approach to define common Java modules which
> > are
> > >> used
> > >>>>> by more specific Scala modules and slowly moving classes from Scala
> > to
> > >>>>> Java. Thus +1 for the proposal.
> > >>>>>
> > >>>>> Cheers,
> > >>>>> Till
> > >>>>>
> > >>>>> On Wed, Jun 13, 2018 at 12:01 PM Piotr Nowojski <
> > >> piotr@xxxxxxxxxxxxxxxxx>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hi,
> > >>>>>>
> > >>>>>> I do not have an experience with how scala and java interacts with
> > >> each
> > >>>>>> other, so I can not fully validate your proposal, but generally
> > >> speaking
> > >>>>> +1
> > >>>>>> from me.
> > >>>>>>
> > >>>>>> Does it also mean, that we should slowly migrate
> `flink-table-core`
> > to
> > >>>>>> Java? How would you envision it? It would be nice to be able to
> add
> > >> new
> > >>>>>> classes/features written in Java and so that they can coexist with
> > old
> > >>>>>> Scala code until we gradually switch from Scala to Java.
> > >>>>>>
> > >>>>>> Piotrek
> > >>>>>>
> > >>>>>>> On 13 Jun 2018, at 11:32, Timo Walther <twalthr@xxxxxxxxxx>
> wrote:
> > >>>>>>>
> > >>>>>>> Hi everyone,
> > >>>>>>>
> > >>>>>>> as you all know, currently the Table & SQL API is implemented in
> > >> Scala.
> > >>>>>> This decision was made a long-time ago when the initital code base
> > was
> > >>>>>> created as part of a master's thesis. The community kept Scala
> > >> because of
> > >>>>>> the nice language features that enable a fluent Table API like
> > >>>>>> table.select('field.trim()) and because Scala allows for quick
> > >>>>> prototyping
> > >>>>>> (e.g. multi-line comments for code generation). The committers
> > >> enforced
> > >>>>> not
> > >>>>>> splitting the code-base into two programming languages.
> > >>>>>>> However, nowadays the flink-table module more and more becomes an
> > >>>>>> important part in the Flink ecosystem. Connectors, formats, and
> SQL
> > >>>>> client
> > >>>>>> are actually implemented in Java but need to interoperate with
> > >>>>> flink-table
> > >>>>>> which makes these modules dependent on Scala. As mentioned in an
> > >> earlier
> > >>>>>> mail thread, using Scala for API classes also exposes member
> > variables
> > >>>>> and
> > >>>>>> methods in Java that should not be exposed to users [1]. Java is
> > still
> > >>>>> the
> > >>>>>> most important API language and right now we treat it as a
> > >> second-class
> > >>>>>> citizen. I just noticed that you even need to add Scala if you
> just
> > >> want
> > >>>>> to
> > >>>>>> implement a ScalarFunction because of method clashes between
> `public
> > >>>>> String
> > >>>>>> toString()` and `public scala.Predef.String toString()`.
> > >>>>>>> Given the size of the current code base, reimplementing the
> entire
> > >>>>>> flink-table code in Java is a goal that we might never reach.
> > >> However, we
> > >>>>>> should at least treat the symptoms and have this as a long-term
> goal
> > >> in
> > >>>>>> mind. My suggestion would be to convert user-facing and runtime
> > >> classes
> > >>>>> and
> > >>>>>> split the code base into multiple modules:
> > >>>>>>>> flink-table-java {depends on flink-table-core}
> > >>>>>>> Implemented in Java. Java users can use this. This would require
> to
> > >>>>>> convert classes like TableEnvironment, Table.
> > >>>>>>>> flink-table-scala {depends on flink-table-core}
> > >>>>>>> Implemented in Scala. Scala users can use this.
> > >>>>>>>
> > >>>>>>>> flink-table-common
> > >>>>>>> Implemented in Java. Connectors, formats, and UDFs can use this.
> It
> > >>>>>> contains interface classes such as descriptors, table sink, table
> > >> source.
> > >>>>>>>> flink-table-core {depends on flink-table-common and
> > >>>>>> flink-table-runtime}
> > >>>>>>> Implemented in Scala. Contains the current main code base.
> > >>>>>>>
> > >>>>>>>> flink-table-runtime
> > >>>>>>> Implemented in Java. This would require to convert classes in
> > >>>>>> o.a.f.table.runtime but would improve the runtime potentially.
> > >>>>>>>
> > >>>>>>> What do you think?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Regards,
> > >>>>>>>
> > >>>>>>> Timo
> > >>>>>>>
> > >>>>>>> [1]
> > >>>>>> http://apache-flink-mailing-list-archive.1008284.n3.
> > >>>>> nabble.com/DISCUSS-Convert-main-Table-API-classes-into-
> > >> traits-tp21335.html
> > >>>>>>
> > >>
> >
> >
>