[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Long-term goal of making flink-table Scala-free

Hi Timo,

Thanks for writing this down +1 from my side :)

> I'm wondering that whether we can have rule in the interim when Java and Scala coexist that dependency can only be one-way. I found that in the current code base there are cases where a Scala class extends Java and vise versa. This is quite painful. I'm thinking if we could say that extension can only be from Java to Scala, which will help the situation. However, I'm not sure if this is practical.

Xuefu: I’m also not sure what’s the best approach here, probably we will have to work it out as we go. One thing to consider is that from now on, every single new code line written in Scala anywhere in Flink-table (except of Flink-table-api-scala) is an instant technological debt. From this perspective I would be in favour of tolerating quite big inchonvieneces just to avoid any new Scala code.


> On 23 Nov 2018, at 03:25, Zhang, Xuefu <xuefu.z@xxxxxxxxxxxxxxx> wrote:
> Hi Timo,
> Thanks for the effort and the Google writeup. During our external catalog rework, we found much confusion between Java and Scala, and this Scala-free roadmap should greatly mitigate that.
> I'm wondering that whether we can have rule in the interim when Java and Scala coexist that dependency can only be one-way. I found that in the current code base there are cases where a Scala class extends Java and vise versa. This is quite painful. I'm thinking if we could say that extension can only be from Java to Scala, which will help the situation. However, I'm not sure if this is practical.
> Thanks,
> Xuefu
> ------------------------------------------------------------------
> Sender:jincheng sun <sunjincheng121@xxxxxxxxx>
> Sent at:2018 Nov 23 (Fri) 09:49
> Recipient:dev <dev@xxxxxxxxxxxxxxxx>
> Subject:Re: [DISCUSS] Long-term goal of making flink-table Scala-free
> Hi Timo,
> Thanks for initiating this great discussion.
> Currently when using SQL/TableAPI should include many dependence. In
> particular, it is not necessary to introduce the specific implementation
> dependencies which users do not care about. So I am glad to see your
> proposal, and hope when we consider splitting the API interface into a
> separate module, so that the user can introduce minimum of dependencies.
> So, +1 to [separation of interface and implementation; e.g. `Table` &
> `TableImpl`] which you mentioned in the google doc.
> Best,
> Jincheng
> Xiaowei Jiang <xiaoweij@xxxxxxxxx> 于2018年11月22日周四 下午10:50写道:
>> Hi Timo, thanks for driving this! I think that this is a nice thing to do.
>> While we are doing this, can we also keep in mind that we want to
>> eventually have a TableAPI interface only module which users can take
>> dependency on, but without including any implementation details?
>> Xiaowei
>> On Thu, Nov 22, 2018 at 6:37 PM Fabian Hueske <fhueske@xxxxxxxxx> wrote:
>>> Hi Timo,
>>> Thanks for writing up this document.
>>> I like the new structure and agree to prioritize the porting of the
>>> flink-table-common classes.
>>> Since flink-table-runtime is (or should be) independent of the API and
>>> planner modules, we could start porting these classes once the code is
>>> split into the new module structure.
>>> The benefits of a Scala-free flink-table-runtime would be a Scala-free
>>> execution Jar.
>>> Best, Fabian
>>> Am Do., 22. Nov. 2018 um 10:54 Uhr schrieb Timo Walther <
>>> twalthr@xxxxxxxxxx
>>>> :
>>>> Hi everyone,
>>>> I would like to continue this discussion thread and convert the outcome
>>>> into a FLIP such that users and contributors know what to expect in the
>>>> upcoming releases.
>>>> I created a design document [1] that clarifies our motivation why we
>>>> want to do this, how a Maven module structure could look like, and a
>>>> suggestion for a migration plan.
>>>> It would be great to start with the efforts for the 1.8 release such
>>>> that new features can be developed in Java and major refactorings such
>>>> as improvements to the connectors and external catalog support are not
>>>> blocked.
>>>> Please let me know what you think.
>>>> Regards,
>>>> Timo
>>>> [1]
>> https://docs.google.com/document/d/1PPo6goW7tOwxmpFuvLSjFnx7BF8IVz0w3dcmPPyqvoY/edit?usp=sharing
>>>> Am 02.07.18 um 17:08 schrieb Fabian Hueske:
>>>>> Hi Piotr,
>>>>> thanks for bumping this thread and thanks for Xingcan for the
>> comments.
>>>>> I think the first step would be to separate the flink-table module
>> into
>>>>> multiple sub modules. These could be:
>>>>> - flink-table-api: All API facing classes. Can be later divided
>> further
>>>>> into Java/Scala Table API/SQL
>>>>> - flink-table-planning: involves all planning (basically everything
>> we
>>> do
>>>>> with Calcite)
>>>>> - flink-table-runtime: the runtime code
>>>>> IMO, a realistic mid-term goal is to have the runtime module and
>>> certain
>>>>> parts of the planning module ported to Java.
>>>>> The api module will be much harder to port because of several
>>>> dependencies
>>>>> to Scala core classes (the parser framework, tree iterations, etc.).
>>> I'm
>>>>> not saying we should not port this to Java, but it is not clear to me
>>>> (yet)
>>>>> how to do it.
>>>>> I think flink-table-runtime should not be too hard to port. The code
>>> does
>>>>> not make use of many Scala features, i.e., it's writing very
>> Java-like.
>>>>> Also, there are not many dependencies and operators can be
>> individually
>>>>> ported step-by-step.
>>>>> For flink-table-planning, we can have certain packages that we port
>> to
>>>> Java
>>>>> like planning rules or plan nodes. The related classes mostly extend
>>>>> Calcite's Java interfaces/classes and would be natural choices for
>>> being
>>>>> ported. The code generation classes will require more effort to port.
>>>> There
>>>>> are also some dependencies in planning on the api module that we
>> would
>>>> need
>>>>> to resolve somehow.
>>>>> For SQL most work when adding new features is done in the planning
>> and
>>>>> runtime modules. So, this separation should already reduce
>>> "technological
>>>>> dept" quite a lot.
>>>>> The Table API depends much more on Scala than SQL.
>>>>> Cheers, Fabian
>>>>> 2018-07-02 16:26 GMT+02:00 Xingcan Cui <xingcanc@xxxxxxxxx>:
>>>>>> Hi all,
>>>>>> I also think about this problem these days and here are my thoughts.
>>>>>> 1) We must admit that it’s really a tough task to interoperate with
>>> Java
>>>>>> and Scala. E.g., they have different collection types (Scala
>>> collections
>>>>>> v.s. java.util.*) and in Java, it's hard to implement a method which
>>>> takes
>>>>>> Scala functions as parameters. Considering the major part of the
>> code
>>>> base
>>>>>> is implemented in Java, +1 for this goal from a long-term view.
>>>>>> 2) The ideal solution would be to just expose a Scala API and make
>> all
>>>> the
>>>>>> other parts Scala-free. But I am not sure if it could be achieved
>> even
>>>> in a
>>>>>> long-term. Thus as Timo suggested, keep the Scala codes in
>>>>>> "flink-table-core" would be a compromise solution.
>>>>>> 3) If the community makes the final decision, maybe any new features
>>>>>> should be added in Java (regardless of the modules), in order to
>>> prevent
>>>>>> the Scala codes from growing.
>>>>>> Best,
>>>>>> Xingcan
>>>>>>> On Jul 2, 2018, at 9:30 PM, Piotr Nowojski <
>> piotr@xxxxxxxxxxxxxxxxx>
>>>>>> wrote:
>>>>>>> Bumping the topic.
>>>>>>> If we want to do this, the sooner we decide, the less code we will
>>> have
>>>>>> to rewrite. I have some objections/counter proposals to Fabian's
>>>> proposal
>>>>>> of doing it module wise and one module at a time.
>>>>>>> First, I do not see a problem of having java/scala code even within
>>> one
>>>>>> module, especially not if there are clean boundaries. Like we could
>>> have
>>>>>> API in Scala and optimizer rules/logical nodes written in Java in
>> the
>>>> same
>>>>>> module. However I haven’t previously maintained mixed scala/java
>> code
>>>> bases
>>>>>> before, so I might be missing something here.
>>>>>>> Secondly this whole migration might and most like will take longer
>>> then
>>>>>> expected, so that creates a problem for a new code that we will be
>>>>>> creating. After making a decision to migrate to Java, almost any new
>>>> Scala
>>>>>> line of code will be immediately a technological debt and we will
>> have
>>>> to
>>>>>> rewrite it to Java later.
>>>>>>> Thus I would propose first to state our end goal - modules
>> structure
>>>> and
>>>>>> which parts of modules we want to have eventually Scala-free.
>> Secondly
>>>>>> taking all steps necessary that will allow us to write new code
>>>> complaint
>>>>>> with our end goal. Only after that we should/could focus on
>>>> incrementally
>>>>>> rewriting the old code. Otherwise we could be stuck/blocked for
>> years
>>>>>> writing new code in Scala (and increasing technological debt),
>> because
>>>>>> nobody have found a time to rewrite some non important and not
>>> actively
>>>>>> developed part of some module.
>>>>>>> Piotrek
>>>>>>>> On 14 Jun 2018, at 15:34, Fabian Hueske <fhueske@xxxxxxxxx>
>> wrote:
>>>>>>>> Hi,
>>>>>>>> In general, I think this is a good effort. However, it won't be
>> easy
>>>>>> and I
>>>>>>>> think we have to plan this well.
>>>>>>>> I don't like the idea of having the whole code base fragmented
>> into
>>>> Java
>>>>>>>> and Scala code for too long.
>>>>>>>> I think we should do this one step at a time and focus on
>> migrating
>>>> one
>>>>>>>> module at a time.
>>>>>>>> IMO, the easiest start would be to port the runtime to Java.
>>>>>>>> Extracting the API classes into an own module, porting them to
>> Java,
>>>> and
>>>>>>>> removing the Scala dependency won't be possible without breaking
>> the
>>>> API
>>>>>>>> since a few classes depend on the Scala Table API.
>>>>>>>> Best, Fabian
>>>>>>>> 2018-06-14 10:33 GMT+02:00 Till Rohrmann <trohrmann@xxxxxxxxxx>:
>>>>>>>>> I think that is a noble and honorable goal and we should strive
>> for
>>>> it.
>>>>>>>>> This, however, must be an iterative process given the sheer size
>> of
>>>> the
>>>>>>>>> code base. I like the approach to define common Java modules
>> which
>>>> are
>>>>>> used
>>>>>>>>> by more specific Scala modules and slowly moving classes from
>> Scala
>>>> to
>>>>>>>>> Java. Thus +1 for the proposal.
>>>>>>>>> Cheers,
>>>>>>>>> Till
>>>>>>>>> On Wed, Jun 13, 2018 at 12:01 PM Piotr Nowojski <
>>>>>> piotr@xxxxxxxxxxxxxxxxx>
>>>>>>>>> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> I do not have an experience with how scala and java interacts
>> with
>>>>>> each
>>>>>>>>>> other, so I can not fully validate your proposal, but generally
>>>>>> speaking
>>>>>>>>> +1
>>>>>>>>>> from me.
>>>>>>>>>> Does it also mean, that we should slowly migrate
>>> `flink-table-core`
>>>> to
>>>>>>>>>> Java? How would you envision it? It would be nice to be able to
>>> add
>>>>>> new
>>>>>>>>>> classes/features written in Java and so that they can coexist
>> with
>>>> old
>>>>>>>>>> Scala code until we gradually switch from Scala to Java.
>>>>>>>>>> Piotrek
>>>>>>>>>>> On 13 Jun 2018, at 11:32, Timo Walther <twalthr@xxxxxxxxxx>
>>> wrote:
>>>>>>>>>>> Hi everyone,
>>>>>>>>>>> as you all know, currently the Table & SQL API is implemented
>> in
>>>>>> Scala.
>>>>>>>>>> This decision was made a long-time ago when the initital code
>> base
>>>> was
>>>>>>>>>> created as part of a master's thesis. The community kept Scala
>>>>>> because of
>>>>>>>>>> the nice language features that enable a fluent Table API like
>>>>>>>>>> table.select('field.trim()) and because Scala allows for quick
>>>>>>>>> prototyping
>>>>>>>>>> (e.g. multi-line comments for code generation). The committers
>>>>>> enforced
>>>>>>>>> not
>>>>>>>>>> splitting the code-base into two programming languages.
>>>>>>>>>>> However, nowadays the flink-table module more and more becomes
>> an
>>>>>>>>>> important part in the Flink ecosystem. Connectors, formats, and
>>> SQL
>>>>>>>>> client
>>>>>>>>>> are actually implemented in Java but need to interoperate with
>>>>>>>>> flink-table
>>>>>>>>>> which makes these modules dependent on Scala. As mentioned in an
>>>>>> earlier
>>>>>>>>>> mail thread, using Scala for API classes also exposes member
>>>> variables
>>>>>>>>> and
>>>>>>>>>> methods in Java that should not be exposed to users [1]. Java is
>>>> still
>>>>>>>>> the
>>>>>>>>>> most important API language and right now we treat it as a
>>>>>> second-class
>>>>>>>>>> citizen. I just noticed that you even need to add Scala if you
>>> just
>>>>>> want
>>>>>>>>> to
>>>>>>>>>> implement a ScalarFunction because of method clashes between
>>> `public
>>>>>>>>> String
>>>>>>>>>> toString()` and `public scala.Predef.String toString()`.
>>>>>>>>>>> Given the size of the current code base, reimplementing the
>>> entire
>>>>>>>>>> flink-table code in Java is a goal that we might never reach.
>>>>>> However, we
>>>>>>>>>> should at least treat the symptoms and have this as a long-term
>>> goal
>>>>>> in
>>>>>>>>>> mind. My suggestion would be to convert user-facing and runtime
>>>>>> classes
>>>>>>>>> and
>>>>>>>>>> split the code base into multiple modules:
>>>>>>>>>>>> flink-table-java {depends on flink-table-core}
>>>>>>>>>>> Implemented in Java. Java users can use this. This would
>> require
>>> to
>>>>>>>>>> convert classes like TableEnvironment, Table.
>>>>>>>>>>>> flink-table-scala {depends on flink-table-core}
>>>>>>>>>>> Implemented in Scala. Scala users can use this.
>>>>>>>>>>>> flink-table-common
>>>>>>>>>>> Implemented in Java. Connectors, formats, and UDFs can use
>> this.
>>> It
>>>>>>>>>> contains interface classes such as descriptors, table sink,
>> table
>>>>>> source.
>>>>>>>>>>>> flink-table-core {depends on flink-table-common and
>>>>>>>>>> flink-table-runtime}
>>>>>>>>>>> Implemented in Scala. Contains the current main code base.
>>>>>>>>>>>> flink-table-runtime
>>>>>>>>>>> Implemented in Java. This would require to convert classes in
>>>>>>>>>> o.a.f.table.runtime but would improve the runtime potentially.
>>>>>>>>>>> What do you think?
>>>>>>>>>>> Regards,
>>>>>>>>>>> Timo
>>>>>>>>>>> [1]
>>>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.
>>>>>>>>> nabble.com/DISCUSS-Convert-main-Table-API-classes-into-
>>>>>> traits-tp21335.html