OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Long-term goal of making flink-table Scala-free


Bumping the topic.

If we want to do this, the sooner we decide, the less code we will have to rewrite. I have some objections/counter proposals to Fabian's proposal of doing it module wise and one module at a time. 

First, I do not see a problem of having java/scala code even within one module, especially not if there are clean boundaries. Like we could have API in Scala and optimizer rules/logical nodes written in Java in the same module. However I haven’t previously maintained mixed scala/java code bases before, so I might be missing something here.

Secondly this whole migration might and most like will take longer then expected, so that creates a problem for a new code that we will be creating. After making a decision to migrate to Java, almost any new Scala line of code will be immediately a technological debt and we will have to rewrite it to Java later. 

Thus I would propose first to state our end goal - modules structure and which parts of modules we want to have eventually Scala-free. Secondly taking all steps necessary that will allow us to write new code complaint with our end goal. Only after that we should/could focus on incrementally rewriting the old code. Otherwise we could be stuck/blocked for years writing new code in Scala (and increasing technological debt), because nobody have found a time to rewrite some non important and not actively developed part of some module.

Piotrek

> On 14 Jun 2018, at 15:34, Fabian Hueske <fhueske@xxxxxxxxx> wrote:
> 
> Hi,
> 
> In general, I think this is a good effort. However, it won't be easy and I
> think we have to plan this well.
> I don't like the idea of having the whole code base fragmented into Java
> and Scala code for too long.
> 
> I think we should do this one step at a time and focus on migrating one
> module at a time.
> IMO, the easiest start would be to port the runtime to Java.
> Extracting the API classes into an own module, porting them to Java, and
> removing the Scala dependency won't be possible without breaking the API
> since a few classes depend on the Scala Table API.
> 
> Best, Fabian
> 
> 
> 2018-06-14 10:33 GMT+02:00 Till Rohrmann <trohrmann@xxxxxxxxxx>:
> 
>> I think that is a noble and honorable goal and we should strive for it.
>> This, however, must be an iterative process given the sheer size of the
>> code base. I like the approach to define common Java modules which are used
>> by more specific Scala modules and slowly moving classes from Scala to
>> Java. Thus +1 for the proposal.
>> 
>> Cheers,
>> Till
>> 
>> On Wed, Jun 13, 2018 at 12:01 PM Piotr Nowojski <piotr@xxxxxxxxxxxxxxxxx>
>> wrote:
>> 
>>> Hi,
>>> 
>>> I do not have an experience with how scala and java interacts with each
>>> other, so I can not fully validate your proposal, but generally speaking
>> +1
>>> from me.
>>> 
>>> Does it also mean, that we should slowly migrate `flink-table-core` to
>>> Java? How would you envision it? It would be nice to be able to add new
>>> classes/features written in Java and so that they can coexist with old
>>> Scala code until we gradually switch from Scala to Java.
>>> 
>>> Piotrek
>>> 
>>>> On 13 Jun 2018, at 11:32, Timo Walther <twalthr@xxxxxxxxxx> wrote:
>>>> 
>>>> Hi everyone,
>>>> 
>>>> as you all know, currently the Table & SQL API is implemented in Scala.
>>> This decision was made a long-time ago when the initital code base was
>>> created as part of a master's thesis. The community kept Scala because of
>>> the nice language features that enable a fluent Table API like
>>> table.select('field.trim()) and because Scala allows for quick
>> prototyping
>>> (e.g. multi-line comments for code generation). The committers enforced
>> not
>>> splitting the code-base into two programming languages.
>>>> 
>>>> However, nowadays the flink-table module more and more becomes an
>>> important part in the Flink ecosystem. Connectors, formats, and SQL
>> client
>>> are actually implemented in Java but need to interoperate with
>> flink-table
>>> which makes these modules dependent on Scala. As mentioned in an earlier
>>> mail thread, using Scala for API classes also exposes member variables
>> and
>>> methods in Java that should not be exposed to users [1]. Java is still
>> the
>>> most important API language and right now we treat it as a second-class
>>> citizen. I just noticed that you even need to add Scala if you just want
>> to
>>> implement a ScalarFunction because of method clashes between `public
>> String
>>> toString()` and `public scala.Predef.String toString()`.
>>>> 
>>>> Given the size of the current code base, reimplementing the entire
>>> flink-table code in Java is a goal that we might never reach. However, we
>>> should at least treat the symptoms and have this as a long-term goal in
>>> mind. My suggestion would be to convert user-facing and runtime classes
>> and
>>> split the code base into multiple modules:
>>>> 
>>>>> flink-table-java {depends on flink-table-core}
>>>> Implemented in Java. Java users can use this. This would require to
>>> convert classes like TableEnvironment, Table.
>>>> 
>>>>> flink-table-scala {depends on flink-table-core}
>>>> Implemented in Scala. Scala users can use this.
>>>> 
>>>>> flink-table-common
>>>> Implemented in Java. Connectors, formats, and UDFs can use this. It
>>> contains interface classes such as descriptors, table sink, table source.
>>>> 
>>>>> flink-table-core {depends on flink-table-common and
>>> flink-table-runtime}
>>>> Implemented in Scala. Contains the current main code base.
>>>> 
>>>>> flink-table-runtime
>>>> Implemented in Java. This would require to convert classes in
>>> o.a.f.table.runtime but would improve the runtime potentially.
>>>> 
>>>> 
>>>> What do you think?
>>>> 
>>>> 
>>>> Regards,
>>>> 
>>>> Timo
>>>> 
>>>> [1]
>>> http://apache-flink-mailing-list-archive.1008284.n3.
>> nabble.com/DISCUSS-Convert-main-Table-API-classes-into-traits-tp21335.html
>>>> 
>>> 
>>> 
>>