[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Can we allow SimpleFunction and SerializableFunction to throw Exception?

Reminder that I'm looking for review on https://github.com/apache/beam/pull/7160

On Thu, Nov 29, 2018, 11:48 AM Jeff Klukas <jklukas@xxxxxxxxxxx wrote:
I created a JIRA and a PR for this:

On naming, I'm proposing that SerializableFunction extend ProcessFunction (since this new superinterface is particularly appropriate for user code executed inside a ProcessElement method) and that SimpleFunction extend InferableFunction (since type information and coder inference are what distinguish this from ProcessFunction).

We originally discussed deprecating SerializableFunction and SimpleFunction in favor of the new types, but there appear to be two fairly separate use cases for SerializableFunction. It's either defining user code that will be executed in a DoFn, in which case I think we always want to prefer the new interface that allows declared exceptions. But it's also used where the code is to be executed as part of pipeline construction, in which case it may be reasonable to want to restrict checked exceptions. In any case, deprecating SerializableFunction and SimpleFunction can be discussed further in the future.

On Wed, Nov 28, 2018 at 9:53 PM Kenneth Knowles <kenn@xxxxxxxxxx> wrote:
Nice! A clean solution and an opportunity to bikeshed on names. This has everything I love.


On Wed, Nov 28, 2018 at 6:43 PM Jeff Klukas <jklukas@xxxxxxxxxxx> wrote:
It looks like we can add make the new interface a superinterface for the existing SerializableFunction while maintaining binary compatibility [0].

We'd have:

public interface NewSerializableFunction<InputT, OutputT> extends Serializable {
  OutputT apply(InputT input) throws Exception;

and then modify SerializableFunction to inherit from it:

public interface SerializableFunction<InputT, OutputT> extends NewSerializableFunction<InputT, OutputT>, Serializable {
  OutputT apply(InputT input);

IIUC, we can then more or less replace all references to SerializableFunction with NewSerializableFunction across the beam codebase without having to introduce any new overrides. I'm working on a proof of concept now.

It's not clear what the actual names for NewSerializableFunction and NewSimpleFunction should be.

On Mon, Nov 26, 2018 at 9:54 PM Thomas Weise <thw@xxxxxxxxxx> wrote:
+1 for introducing the new interface now and deprecating the old one. The major version change then provides the opportunity to remove deprecated code.

On Mon, Nov 26, 2018 at 10:09 AM Lukasz Cwik <lcwik@xxxxxxxxxx> wrote:
Before 3.0 we will still want to introduce this giving time for people to migrate, would it make sense to do that now and deprecate the alternatives that it replaces?

On Mon, Nov 26, 2018 at 5:59 AM Jeff Klukas <jklukas@xxxxxxxxxxx> wrote:
Picking up this thread again. Based on the feedback from Kenn, Reuven, and Romain, it sounds like there's no objection to the idea of SimpleFunction and SerializableFunction declaring that they throw Exception. So the discussion at this point is about whether there's an acceptable way to introduce that change.

IIUC correctly, Kenn was suggesting that we need to ensure backwards compatibility for existing user code both at runtime and recompile, which means we can't simply add the declaration to the existing interfaces, since that would cause errors at compile time for user code directly invoking SerializableFunction instances.

I don't see an obvious way that introducing a new functional interface would help without littering the API with more variants (it's already a bit confusing that i.e. MapElements has multiple via() methods to support three different function interfaces).

Perhaps this kind of cleanup is best left for Beam 3.0?

On Mon, Oct 15, 2018 at 6:51 PM Reuven Lax <relax@xxxxxxxxxx> wrote:
Compilation compatibility is a big part of what Beam aims to provide with its guarantees. Romain makes a good point that most users are not invoking SeralizableFunctions themselves (they are usually invoked inside of Beam classes such as MapElements), however I suspect some users do these things.

On Sun, Oct 14, 2018 at 2:38 PM Kenneth Knowles <kenn@xxxxxxxxxx> wrote:
Romain has brought up two good aspects of backwards compatibility:

 - runtime replacement vs recompile
 - consumer (covariant) vs producer (contravariant, such as interfaces a user implements)

In this case, I think the best shoice is covariant and contravariant (invariant) backwards compat including recompile compat. But we shouldn't assume there is one obvious definition of "backwards compatibility".

Does it help to introduce a new functional interface?


On Sun, Oct 14, 2018 at 6:25 AM Romain Manni-Bucau <rmannibucau@xxxxxxxxx> wrote:
Beam does not catch Exception for function usage so it will have to do it in some places.

A user does not have to execute the function so worse case it impacts tests and in any case the most important: it does not impact the user until it recompiles the code (= runtime is not impacted).

Romain Manni-Bucau
@rmannibucau |  Blog | Old BlogGithub | LinkedIn | Book

Le dim. 14 oct. 2018 à 15:19, Reuven Lax <relax@xxxxxxxxxx> a écrit :
What in Beam codebase is not ready, and how do we know that user code doesn't have the same issue?

On Sun, Oct 14, 2018 at 6:04 AM Romain Manni-Bucau <rmannibucau@xxxxxxxxx> wrote:
Hmm, tested also and it works until something in the codeflow does not respect that constraint - see com.sun.tools.javac.comp.Flow.FlowAnalyzer#errorUncaught. In other words beam codebase is not ready for that and will make it fail but it is ok cause we can fix it but user code does not rely on that so it is fine to update it normally.

Romain Manni-Bucau
@rmannibucau |  Blog | Old BlogGithub | LinkedIn | Book

Le dim. 14 oct. 2018 à 14:39, Reuven Lax <relax@xxxxxxxxxx> a écrit :
Just tried it, doesn't appear to work :(

error: unreported exception Exception; must be caught or declared to be thrown

On Sun, Oct 14, 2018 at 1:55 AM Romain Manni-Bucau <rmannibucau@xxxxxxxxx> wrote:
not with java>=8 AFAIK

Romain Manni-Bucau
@rmannibucau |  Blog | Old BlogGithub | LinkedIn | Book

Le dim. 14 oct. 2018 à 10:49, Reuven Lax <relax@xxxxxxxxxx> a écrit :
But it means that other functions that call SerializableFunctions must now declare exceptions, right? If yes, this is incompatible.

On Sun, Oct 14, 2018 at 1:37 AM Romain Manni-Bucau <rmannibucau@xxxxxxxxx> wrote:
No, only parameter types and return type is used to lookup methods.

Romain Manni-Bucau
@rmannibucau |  Blog | Old BlogGithub | LinkedIn | Book

Le dim. 14 oct. 2018 à 09:45, Reuven Lax <relax@xxxxxxxxxx> a écrit :
I've run into this problem before as well. Doesn't changing the signature involve a backwards-incompatible change though?

On Wed, Oct 3, 2018 at 5:11 PM Jeff Klukas <jklukas@xxxxxxxxxxx> wrote:
I'm working on https://issues.apache.org/jira/browse/BEAM-5638 to add exception handling options to single message transforms in the Java SDK.

MapElements' via() method is overloaded to accept either a SimpleFunction, a SerializableFunction, or a Contextful, all of which are ultimately stored as a Contextful where the mapping functionis expected to have signature:

OutputT apply(InputT element, Context c) throws Exception;

So Contextful.Fn allows throwing checked exceptions, but neither SerializableFunction nor SimpleFunction do. The user-provided function has to satisfy the more restrictive signature:

OutputT apply(InputT input);

Is there background about why we allow arbitrary checked exceptions to be thrown in one case but not the other two? Could we consider expanding SerializableFunction and SimpleFunction to the following?:

OutputT apply(InputT input) throws Exception;

This would, for example, simplify the implementation of ParseJsons and AsJsons, where we have to catch an IOException in MapElements#via only to rethrow as RuntimeException.