Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support
Thanks Aljoscha for starting this discussion. The described problem brings
us indeed a bit into a pickle. Even with option 1) I think it is somewhat
API breaking because everyone who used lambdas without types needs to add
them now. Consequently, I only see two real options out of the ones you've
1) Disambiguate the API (either by removing
reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ)
2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out
Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit
problematic because then all Scala API users who have implemented a
GroupReduceFunction need to convert it into a Scala lambda. Moreover, I
think it will be problematic with RichGroupReduceFunction which you need to
get access to the RuntimeContext.
Maintaining two master branches puts a lot of burden onto the developers to
always keep the two branches in sync. Ideally I would like to avoid this.
I also played a little bit around with implicit conversions to add the
lambda methods in Scala 2.11 on demand, but I was not able to get it work
I'm cross posting this thread to user as well to get some more user
On Thu, Oct 4, 2018 at 7:36 PM Elias Levy <fearsome.lucidity@xxxxxxxxx>
> The second alternative, with the addition of methods that take functions
> with Scala types, seems the most sensible. I wonder if there is a need
> then to maintain the *J Java parameter methods, or whether users could just
> access the functionality by converting the Scala DataStreams to Java via
> .javaStream and whatever the equivalent is for DataSets.
> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek <aljoscha@xxxxxxxxxx>
> > Hi,
> > I'm currently working on
> > with the goal of adding support for Scala 2.12. There is a bit of a
> > and I have to explain some context first.
> > With Scala 2.12, lambdas are implemented using the lambda mechanism of
> > Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This
> > means that the following two method definitions can both take a lambda:
> > def map[R](mapper: MapFunction[T, R]): DataSet[R]
> > def map[R](fun: T => R): DataSet[R]
> > The Scala compiler gives precedence to the lambda version when you call
> > map() with a lambda in simple cases, so it works here. You could still
> > map() with a lambda if the lambda version of the method weren't here
> > because they are now considered the same. For Scala 2.11 we need both
> > signatures, though, to allow calling with a lambda and with a
> > The problem is with more complicated method signatures, like:
> > def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit):
> > DataSet[R]
> > def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R]
> > (for reference, GroupReduceFunction is a SAM with void
> > reduce(java.lang.Iterable<T> values, Collector<O> out))
> > These two signatures are not the same but similar enough for the Scala
> > 2.12 compiler to "get confused". In Scala 2.11, I could call
> > with a lambda that doesn't have parameter type definitions and things
> > be fine. With Scala 2.12 I can't do that because the compiler can't
> > out which method to call and requires explicit type definitions on the
> > lambda parameters.
> > I see some solutions for this:
> > 1. Keep the methods as is, this would force people to always explicitly
> > specify parameter types on their lambdas.
> > 2. Rename the second method to reduceGroupJ() to signal that it takes a
> > user function that takes Java-style interfaces (the first parameter is
> > java.lang.Iterable while the Scala lambda takes a scala.Iterator). This
> > disambiguates the code, users can use lambdas without specifying explicit
> > parameter types but breaks the API.
> > One effect of 2. would be that we can add a reduceGroup() method that
> > takes a api.scala.GroupReduceFunction that takes proper Scala types, thus
> > it would allow people to implement user functions without having to cast
> > the various Iterator/Iterable parameters.
> > Either way, people would have to adapt their code when moving to Scala
> > 2.12 in some way, depending on what style of methods they use.
> > There is also solution 2.5:
> > 2.5 Rename the methods only in the Scala 2.12 build of Flink and keep the
> > old method names for Scala 2.11. This would require some infrastructure
> > I don't yet know how it can be done in a sane way.
> > What do you think? I personally would be in favour of 2. but it breaks
> > existing API.
> > Best,
> > Aljoscha