osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [BEAM-5442] Store duplicate unknown (runner) options in a list argument




On Fri, Oct 12, 2018 at 10:11 AM, Charles Chen <ccy@xxxxxxxxxx> wrote:
For context, I made comments on https://github.com/apache/beam/pull/6600 noting that the changes being made were not good for Beam backwards-compatibility.  The change as is allows users to use pipeline options without explicitly defining them, which is not the type of usage we would like to encourage since we prefer to be explicit whenever possible.  If users write pipelines with this sort of pattern, they will potentially encounter pain when upgrading to a later version since this is an implementation detail and not an officially supported pattern.  I agree with the comments above that this is ultimately a scoping issue.  I would not have a problem with these changes if they were explicitly scoped under either a runner or unparsed options namespace.

As a second note, since the 2.8.0 release is being cut right now, because of these backwards-compatibility concerns, I would suggest reverting these changes, at least until 2.8.0 is cut, so we can have a discussion here before committing to and releasing any API-level changes.

+1 I would like to revert the changes in order not rush this into the release. Once this discussion results in an agreement changes can be brought back.
 

On Fri, Oct 12, 2018 at 9:26 AM Henning Rohde <herohde@xxxxxxxxxx> wrote:
Agree that pipeline options lack some mechanism for scoping. It is also not always possible distinguish options meant to be consumed at pipeline construction time, by the runner, by the SDK harness, by the user code or any combination -- and this causes confusion every now and then.

For Dataflow, we have been using "experiments" for arbitrary runner-specific options. It's simply a string list pipeline option that all SDKs support and, for Go at least, is sent to portable runners. Flink can do the same in the short term to move forward.

Henning


On Fri, Oct 12, 2018 at 8:50 AM Thomas Weise <thw@xxxxxxxxxx> wrote:
[moving to the list]

The requirement driving this part of the change was to allow a user to specify pipeline options that a runner supports without having to declare those in each language SDK.

In the specific scenario, we have options that the Flink runner supports (and can validate), that are not enumerated in the Python SDK.

I think we have a bigger problem scoping pipeline options. For example, the runner options are dumped into the SDK worker. There is also a possibility of name collisions. So I think this would benefit from broader feedback.

Thanks,
Thomas


---------- Forwarded message ---------
From: Charles Chen <notifications@xxxxxxxxxx>
Date: Fri, Oct 12, 2018 at 8:36 AM
Subject: Re: [apache/beam] [BEAM-5442] Store duplicate unknown options in a list argument (#6600)
To: apache/beam <beam@xxxxxxxxxxxxxxxxxx>
Cc: Thomas Weise <thomas.weise@xxxxxxxxx>, Mention <mention@xxxxxxxxxxxxxxxxxx>


CC: @tweise


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.