osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Long-term goal of making flink-table Scala-free


Hi everyone,

thanks for the great feedback so far. I updated the document with the input I got so far

@Fabian: I moved the porting of flink-table-runtime classes up in the list.

@Xiaowei: Could you elaborate what "interface only" means to you? Do you mean a module containing pure Java `interface`s? Or is the validation logic also part of the API module? Are 50+ expression classes part of the API interface or already too implementation-specific?

@Xuefu: I extended the document by almost a page to clarify when we should develop in Scala and when in Java. As Piotr said, every new Scala line is instant technical debt.

Thanks,
Timo


Am 23.11.18 um 10:29 schrieb Piotr Nowojski:
Hi Timo,

Thanks for writing this down +1 from my side :)

I'm wondering that whether we can have rule in the interim when Java and Scala coexist that dependency can only be one-way. I found that in the current code base there are cases where a Scala class extends Java and vise versa. This is quite painful. I'm thinking if we could say that extension can only be from Java to Scala, which will help the situation. However, I'm not sure if this is practical.
Xuefu: I’m also not sure what’s the best approach here, probably we will have to work it out as we go. One thing to consider is that from now on, every single new code line written in Scala anywhere in Flink-table (except of Flink-table-api-scala) is an instant technological debt. From this perspective I would be in favour of tolerating quite big inchonvieneces just to avoid any new Scala code.

Piotrek

On 23 Nov 2018, at 03:25, Zhang, Xuefu <xuefu.z@xxxxxxxxxxxxxxx> wrote:

Hi Timo,

Thanks for the effort and the Google writeup. During our external catalog rework, we found much confusion between Java and Scala, and this Scala-free roadmap should greatly mitigate that.

I'm wondering that whether we can have rule in the interim when Java and Scala coexist that dependency can only be one-way. I found that in the current code base there are cases where a Scala class extends Java and vise versa. This is quite painful. I'm thinking if we could say that extension can only be from Java to Scala, which will help the situation. However, I'm not sure if this is practical.

Thanks,
Xuefu


------------------------------------------------------------------
Sender:jincheng sun <sunjincheng121@xxxxxxxxx>
Sent at:2018 Nov 23 (Fri) 09:49
Recipient:dev <dev@xxxxxxxxxxxxxxxx>
Subject:Re: [DISCUSS] Long-term goal of making flink-table Scala-free

Hi Timo,
Thanks for initiating this great discussion.

Currently when using SQL/TableAPI should include many dependence. In
particular, it is not necessary to introduce the specific implementation
dependencies which users do not care about. So I am glad to see your
proposal, and hope when we consider splitting the API interface into a
separate module, so that the user can introduce minimum of dependencies.

So, +1 to [separation of interface and implementation; e.g. `Table` &
`TableImpl`] which you mentioned in the google doc.
Best,
Jincheng

Xiaowei Jiang <xiaoweij@xxxxxxxxx> 于2018年11月22日周四 下午10:50写道:

Hi Timo, thanks for driving this! I think that this is a nice thing to do.
While we are doing this, can we also keep in mind that we want to
eventually have a TableAPI interface only module which users can take
dependency on, but without including any implementation details?

Xiaowei

On Thu, Nov 22, 2018 at 6:37 PM Fabian Hueske <fhueske@xxxxxxxxx> wrote:

Hi Timo,

Thanks for writing up this document.
I like the new structure and agree to prioritize the porting of the
flink-table-common classes.
Since flink-table-runtime is (or should be) independent of the API and
planner modules, we could start porting these classes once the code is
split into the new module structure.
The benefits of a Scala-free flink-table-runtime would be a Scala-free
execution Jar.

Best, Fabian


Am Do., 22. Nov. 2018 um 10:54 Uhr schrieb Timo Walther <
twalthr@xxxxxxxxxx
:
Hi everyone,

I would like to continue this discussion thread and convert the outcome
into a FLIP such that users and contributors know what to expect in the
upcoming releases.

I created a design document [1] that clarifies our motivation why we
want to do this, how a Maven module structure could look like, and a
suggestion for a migration plan.

It would be great to start with the efforts for the 1.8 release such
that new features can be developed in Java and major refactorings such
as improvements to the connectors and external catalog support are not
blocked.

Please let me know what you think.

Regards,
Timo

[1]


https://docs.google.com/document/d/1PPo6goW7tOwxmpFuvLSjFnx7BF8IVz0w3dcmPPyqvoY/edit?usp=sharing

Am 02.07.18 um 17:08 schrieb Fabian Hueske:
Hi Piotr,

thanks for bumping this thread and thanks for Xingcan for the
comments.
I think the first step would be to separate the flink-table module
into
multiple sub modules. These could be:

- flink-table-api: All API facing classes. Can be later divided
further
into Java/Scala Table API/SQL
- flink-table-planning: involves all planning (basically everything
we
do
with Calcite)
- flink-table-runtime: the runtime code

IMO, a realistic mid-term goal is to have the runtime module and
certain
parts of the planning module ported to Java.
The api module will be much harder to port because of several
dependencies
to Scala core classes (the parser framework, tree iterations, etc.).
I'm
not saying we should not port this to Java, but it is not clear to me
(yet)
how to do it.

I think flink-table-runtime should not be too hard to port. The code
does
not make use of many Scala features, i.e., it's writing very
Java-like.
Also, there are not many dependencies and operators can be
individually
ported step-by-step.
For flink-table-planning, we can have certain packages that we port
to
Java
like planning rules or plan nodes. The related classes mostly extend
Calcite's Java interfaces/classes and would be natural choices for
being
ported. The code generation classes will require more effort to port.
There
are also some dependencies in planning on the api module that we
would
need
to resolve somehow.

For SQL most work when adding new features is done in the planning
and
runtime modules. So, this separation should already reduce
"technological
dept" quite a lot.
The Table API depends much more on Scala than SQL.

Cheers, Fabian



2018-07-02 16:26 GMT+02:00 Xingcan Cui <xingcanc@xxxxxxxxx>:

Hi all,

I also think about this problem these days and here are my thoughts.

1) We must admit that it’s really a tough task to interoperate with
Java
and Scala. E.g., they have different collection types (Scala
collections
v.s. java.util.*) and in Java, it's hard to implement a method which
takes
Scala functions as parameters. Considering the major part of the
code
base
is implemented in Java, +1 for this goal from a long-term view.

2) The ideal solution would be to just expose a Scala API and make
all
the
other parts Scala-free. But I am not sure if it could be achieved
even
in a
long-term. Thus as Timo suggested, keep the Scala codes in
"flink-table-core" would be a compromise solution.

3) If the community makes the final decision, maybe any new features
should be added in Java (regardless of the modules), in order to
prevent
the Scala codes from growing.

Best,
Xingcan


On Jul 2, 2018, at 9:30 PM, Piotr Nowojski <
piotr@xxxxxxxxxxxxxxxxx>
wrote:
Bumping the topic.

If we want to do this, the sooner we decide, the less code we will
have
to rewrite. I have some objections/counter proposals to Fabian's
proposal
of doing it module wise and one module at a time.
First, I do not see a problem of having java/scala code even within
one
module, especially not if there are clean boundaries. Like we could
have
API in Scala and optimizer rules/logical nodes written in Java in
the
same
module. However I haven’t previously maintained mixed scala/java
code
bases
before, so I might be missing something here.
Secondly this whole migration might and most like will take longer
then
expected, so that creates a problem for a new code that we will be
creating. After making a decision to migrate to Java, almost any new
Scala
line of code will be immediately a technological debt and we will
have
to
rewrite it to Java later.
Thus I would propose first to state our end goal - modules
structure
and
which parts of modules we want to have eventually Scala-free.
Secondly
taking all steps necessary that will allow us to write new code
complaint
with our end goal. Only after that we should/could focus on
incrementally
rewriting the old code. Otherwise we could be stuck/blocked for
years
writing new code in Scala (and increasing technological debt),
because
nobody have found a time to rewrite some non important and not
actively
developed part of some module.
Piotrek

On 14 Jun 2018, at 15:34, Fabian Hueske <fhueske@xxxxxxxxx>
wrote:
Hi,

In general, I think this is a good effort. However, it won't be
easy
and I
think we have to plan this well.
I don't like the idea of having the whole code base fragmented
into
Java
and Scala code for too long.

I think we should do this one step at a time and focus on
migrating
one
module at a time.
IMO, the easiest start would be to port the runtime to Java.
Extracting the API classes into an own module, porting them to
Java,
and
removing the Scala dependency won't be possible without breaking
the
API
since a few classes depend on the Scala Table API.

Best, Fabian


2018-06-14 10:33 GMT+02:00 Till Rohrmann <trohrmann@xxxxxxxxxx>:

I think that is a noble and honorable goal and we should strive
for
it.
This, however, must be an iterative process given the sheer size
of
the
code base. I like the approach to define common Java modules
which
are
used
by more specific Scala modules and slowly moving classes from
Scala
to
Java. Thus +1 for the proposal.

Cheers,
Till

On Wed, Jun 13, 2018 at 12:01 PM Piotr Nowojski <
piotr@xxxxxxxxxxxxxxxxx>
wrote:

Hi,

I do not have an experience with how scala and java interacts
with
each
other, so I can not fully validate your proposal, but generally
speaking
+1
from me.

Does it also mean, that we should slowly migrate
`flink-table-core`
to
Java? How would you envision it? It would be nice to be able to
add
new
classes/features written in Java and so that they can coexist
with
old
Scala code until we gradually switch from Scala to Java.

Piotrek

On 13 Jun 2018, at 11:32, Timo Walther <twalthr@xxxxxxxxxx>
wrote:
Hi everyone,

as you all know, currently the Table & SQL API is implemented
in
Scala.
This decision was made a long-time ago when the initital code
base
was
created as part of a master's thesis. The community kept Scala
because of
the nice language features that enable a fluent Table API like
table.select('field.trim()) and because Scala allows for quick
prototyping
(e.g. multi-line comments for code generation). The committers
enforced
not
splitting the code-base into two programming languages.
However, nowadays the flink-table module more and more becomes
an
important part in the Flink ecosystem. Connectors, formats, and
SQL
client
are actually implemented in Java but need to interoperate with
flink-table
which makes these modules dependent on Scala. As mentioned in an
earlier
mail thread, using Scala for API classes also exposes member
variables
and
methods in Java that should not be exposed to users [1]. Java is
still
the
most important API language and right now we treat it as a
second-class
citizen. I just noticed that you even need to add Scala if you
just
want
to
implement a ScalarFunction because of method clashes between
`public
String
toString()` and `public scala.Predef.String toString()`.
Given the size of the current code base, reimplementing the
entire
flink-table code in Java is a goal that we might never reach.
However, we
should at least treat the symptoms and have this as a long-term
goal
in
mind. My suggestion would be to convert user-facing and runtime
classes
and
split the code base into multiple modules:
flink-table-java {depends on flink-table-core}
Implemented in Java. Java users can use this. This would
require
to
convert classes like TableEnvironment, Table.
flink-table-scala {depends on flink-table-core}
Implemented in Scala. Scala users can use this.

flink-table-common
Implemented in Java. Connectors, formats, and UDFs can use
this.
It
contains interface classes such as descriptors, table sink,
table
source.
flink-table-core {depends on flink-table-common and
flink-table-runtime}
Implemented in Scala. Contains the current main code base.

flink-table-runtime
Implemented in Java. This would require to convert classes in
o.a.f.table.runtime but would improve the runtime potentially.
What do you think?


Regards,

Timo

[1]
http://apache-flink-mailing-list-archive.1008284.n3.
nabble.com/DISCUSS-Convert-main-Table-API-classes-into-
traits-tp21335.html