OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support


I have an open PR that does everything we can do for preparing the code base for Scala 2.12 without breaking the API: https://github.com/apache/flink/pull/6784

> On 8. Oct 2018, at 09:56, Chesnay Schepler <chesnay@xxxxxxxxxx> wrote:
> 
> I'd rather not maintain 2 master branches. Beyond the maintenance overhead I'm
> wondering about the benefit, as the API break still has to happen at some point.
> 
> @Aljoscha how much work for supporting scala 2.12 can be merged without breaking the API?
> If this is the only blocker I suggest to make the breaking change in 1.8.
> 
> On 05.10.2018 10:31, Till Rohrmann wrote:
>> Thanks Aljoscha for starting this discussion. The described problem brings
>> us indeed a bit into a pickle. Even with option 1) I think it is somewhat
>> API breaking because everyone who used lambdas without types needs to add
>> them now. Consequently, I only see two real options out of the ones you've
>> proposed:
>> 
>> 1) Disambiguate the API (either by removing
>> reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ)
>> 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out
>> 
>> Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit
>> problematic because then all Scala API users who have implemented a
>> GroupReduceFunction need to convert it into a Scala lambda. Moreover, I
>> think it will be problematic with RichGroupReduceFunction which you need to
>> get access to the RuntimeContext.
>> 
>> Maintaining two master branches puts a lot of burden onto the developers to
>> always keep the two branches in sync. Ideally I would like to avoid this.
>> 
>> I also played a little bit around with implicit conversions to add the
>> lambda methods in Scala 2.11 on demand, but I was not able to get it work
>> smoothly.
>> 
>> I'm cross posting this thread to user as well to get some more user
>> feedback.
>> 
>> Cheers,
>> Till
>> 
>> On Thu, Oct 4, 2018 at 7:36 PM Elias Levy <fearsome.lucidity@xxxxxxxxx>
>> wrote:
>> 
>>> The second alternative, with the addition of methods that take functions
>>> with Scala types, seems the most sensible.  I wonder if there is a need
>>> then to maintain the *J Java parameter methods, or whether users could just
>>> access the functionality by converting the Scala DataStreams to Java via
>>> .javaStream and whatever the equivalent is for DataSets.
>>> 
>>> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek <aljoscha@xxxxxxxxxx>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I'm currently working on
>>> https://issues.apache.org/jira/browse/FLINK-7811,
>>>> with the goal of adding support for Scala 2.12. There is a bit of a
>>> hurdle
>>>> and I have to explain some context first.
>>>> 
>>>> With Scala 2.12, lambdas are implemented using the lambda mechanism of
>>>> Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This
>>>> means that the following two method definitions can both take a lambda:
>>>> 
>>>> def map[R](mapper: MapFunction[T, R]): DataSet[R]
>>>> def map[R](fun: T => R): DataSet[R]
>>>> 
>>>> The Scala compiler gives precedence to the lambda version when you call
>>>> map() with a lambda in simple cases, so it works here. You could still
>>> call
>>>> map() with a lambda if the lambda version of the method weren't here
>>>> because they are now considered the same. For Scala 2.11 we need both
>>>> signatures, though, to allow calling with a lambda and with a
>>> MapFunction.
>>>> The problem is with more complicated method signatures, like:
>>>> 
>>>> def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit):
>>>> DataSet[R]
>>>> 
>>>> def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R]
>>>> 
>>>> (for reference, GroupReduceFunction is a SAM with void
>>>> reduce(java.lang.Iterable<T> values, Collector<O> out))
>>>> 
>>>> These two signatures are not the same but similar enough for the Scala
>>>> 2.12 compiler to "get confused". In Scala 2.11, I could call
>>> reduceGroup()
>>>> with a lambda that doesn't have parameter type definitions and things
>>> would
>>>> be fine. With Scala 2.12 I can't do that because the compiler can't
>>> figure
>>>> out which method to call and requires explicit type definitions on the
>>>> lambda parameters.
>>>> 
>>>> I see some solutions for this:
>>>> 
>>>> 1. Keep the methods as is, this would force people to always explicitly
>>>> specify parameter types on their lambdas.
>>>> 
>>>> 2. Rename the second method to reduceGroupJ() to signal that it takes a
>>>> user function that takes Java-style interfaces (the first parameter is
>>>> java.lang.Iterable while the Scala lambda takes a scala.Iterator). This
>>>> disambiguates the code, users can use lambdas without specifying explicit
>>>> parameter types but breaks the API.
>>>> 
>>>> One effect of 2. would be that we can add a reduceGroup() method that
>>>> takes a api.scala.GroupReduceFunction that takes proper Scala types, thus
>>>> it would allow people to implement user functions without having to cast
>>>> the various Iterator/Iterable parameters.
>>>> 
>>>> Either way, people would have to adapt their code when moving to Scala
>>>> 2.12 in some way, depending on what style of methods they use.
>>>> 
>>>> There is also solution 2.5:
>>>> 
>>>> 2.5 Rename the methods only in the Scala 2.12 build of Flink and keep the
>>>> old method names for Scala 2.11. This would require some infrastructure
>>> and
>>>> I don't yet know how it can be done in a sane way.
>>>> 
>>>> What do you think? I personally would be in favour of 2. but it breaks
>>> the
>>>> existing API.
>>>> 
>>>> Best,
>>>> Aljoscha
>>>> 
>>>> 
>>>> 
>>>> 
>