The current release branch
(https://github.com/apache/beam/commits/release-2.8.0) was cut after the
revert went in. Sent out https://github.com/apache/beam/pull/6683 as a
revert of the revert. Regarding your comment above, I can help out with
the design / PR reviews for common Python code as you suggest.
On Fri, Oct 12, 2018 at 4:48 PM Thomas Weise <thw@xxxxxxxxxx
Thanks, will tag you and looking forward to feedback so we can
ensure that changes work for everyone.
Looking at the PR, I see agreement from Max to revert the change on
the release branch, but not in master. Would you mind to restore it
On Fri, Oct 12, 2018 at 4:40 PM Ahmet Altay <altay@xxxxxxxxxx
On Fri, Oct 12, 2018 at 11:31 AM, Charles Chen <ccy@xxxxxxxxxx
What I mean is that a user may find that it works for them
to pass "--myarg blah" and access it as "options.myarg"
without explicitly defining a "my_arg" flag due to the added
logic. This is not the intended behavior and we may want to
change this implementation detail in the future. However,
having this logic in a released version makes it hard to
change this behavior since users may erroneously depend on
this undocumented behavior. Instead, we should namespace /
scope this so that it is obvious that this is meant for
runner (and not Beam user) consumption.
On Fri, Oct 12, 2018 at 10:48 AM Thomas Weise
<thw@xxxxxxxxxx <mailto:thw@xxxxxxxxxx>> wrote:
Can you please elaborate more what practical problems
this introduces for users?
I can see that this change allows a user to specify a
runner specific option, which in the future may change
because we decide to scope differently. If this only
affects users of the portable Flink runner (like us),
then no need to revert, because at this early stage we
prefer something that works over being blocked.
It would also be really great if some of the core Python
SDK developers could help out with the design aspects
and PR reviews of changes that affect common Python
code. Anyone who specifically wants to be tagged on
relevant JIRAs and PRs?
I would be happy to be tagged, and I can also help with
including other relevant folks whenever possible. In general I
think Robert, Charles, myself are good candidates.
On Fri, Oct 12, 2018 at 10:20 AM Ahmet Altay
<altay@xxxxxxxxxx <mailto:altay@xxxxxxxxxx>> wrote:
On Fri, Oct 12, 2018 at 10:11 AM, Charles Chen
<ccy@xxxxxxxxxx <mailto:ccy@xxxxxxxxxx>> wrote:
For context, I made comments on
that the changes being made were not good for
Beam backwards-compatibility. The change as is
allows users to use pipeline options without
explicitly defining them, which is not the type
of usage we would like to encourage since we
prefer to be explicit whenever possible. If
users write pipelines with this sort of pattern,
they will potentially encounter pain when
upgrading to a later version since this is an
implementation detail and not an officially
supported pattern. I agree with the comments
above that this is ultimately a scoping issue.
I would not have a problem with these changes if
they were explicitly scoped under either a
runner or unparsed options namespace.
As a second note, since the 2.8.0 release is
being cut right now, because of these
backwards-compatibility concerns, I would
suggest reverting these changes, at least until
2.8.0 is cut, so we can have a discussion here
before committing to and releasing any API-level
+1 I would like to revert the changes in order not
rush this into the release. Once this discussion
results in an agreement changes can be brought back.
On Fri, Oct 12, 2018 at 9:26 AM Henning Rohde
Agree that pipeline options lack some
mechanism for scoping. It is also not always
possible distinguish options meant to be
consumed at pipeline construction time, by
the runner, by the SDK harness, by the user
code or any combination -- and this causes
confusion every now and then.
For Dataflow, we have been using
"experiments" for arbitrary runner-specific
options. It's simply a string list pipeline
option that all SDKs support and, for Go at
least, is sent to portable runners. Flink
can do the same in the short term to move
On Fri, Oct 12, 2018 at 8:50 AM Thomas Weise
<thw@xxxxxxxxxx <mailto:thw@xxxxxxxxxx>> wrote:
[moving to the list]
The requirement driving this part of the
change was to allow a user to specify
pipeline options that a runner supports
without having to declare those in each
In the specific scenario, we have
options that the Flink runner supports
(and can validate), that are not
enumerated in the Python SDK.
I think we have a bigger problem scoping
pipeline options. For example, the
runner options are dumped into the SDK
worker. There is also a possibility of
name collisions. So I think this would
benefit from broader feedback.
---------- Forwarded message ---------
From: *Charles Chen*
Date: Fri, Oct 12, 2018 at 8:36 AM
Subject: Re: [apache/beam] [BEAM-5442]
Store duplicate unknown options in a
list argument (#6600)
To: apache/beam <beam@xxxxxxxxxxxxxxxxxx
Cc: Thomas Weise <thomas.weise@xxxxxxxxx
CC: @tweise <https://github.com/tweise>
You are receiving this because you were
Reply to this email directly, view it on
or mute the thread