[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss] Upgrade story for Beam's execution engines

The main problem here is that users are forced to upgrade infrastructure to obtain new features in Beam, even when those features actually don't require such changes. As an example, another update to Flink 1.6.0 was proposed (without supporting new functionality in Beam) and we already know that it breaks compatibility (again).

I think that upgrading to a Flink X.Y.0 version isn't a good idea to start with. But besides that, if we want to grow adoption, then we need to focus on stability and delivering improvements to Beam without disrupting users.

In the specific case, ideally the surface of Flink would be backward compatible, allowing us to stick to a minimum version and be able to submit pipelines to Flink endpoints of higher versions. Some work in that direction is underway (like versioning the REST API). FYI, lowest common version is what most projects that depend on Hadoop 2.x follow.

Since Beam with Flink 1.5.x client won't talk to Flink 1.6 and there are code changes required to make it compile, we would need to come up with a more involved strategy to support multiple Flink versions. Till then, I would prefer we favor existing users over short lived experiments, which would mean stick with 1.5.x and not support 1.6.0.


On Wed, Sep 12, 2018 at 1:15 PM Lukasz Cwik <lcwik@xxxxxxxxxx> wrote:
As others have already suggested, I also believe LTS releases is the best we can do as a community right now until portability allows us to decouple what a user writes with and how it runs (the SDK and the SDK environment) from the runner (job service + shared common runner libs + Flink/Spark/Dataflow/Apex/Samza/...).

Dataflow would be highly invested in having the appropriate tooling within Apache Beam to support multiple SDK versions against a runner. This in turn would allow people to use any SDK with any runner and as Robert had mentioned, certain optimizations and features would be disabled depending on the capabilities of the runner and the capabilities of the SDK.

On Wed, Sep 12, 2018 at 6:38 AM Robert Bradshaw <robertwb@xxxxxxxxxx> wrote:
The target audience is people who want to use the latest Beam but do not want to use the latest version of the runner, right? 

I think this will be somewhat (though not entirely) addressed by Beam LTS releases, where those not wanting to upgrade the runner at least have a well-supported version of Beam. In the long term, we have the division

    Runner <-> BeamRunnerSpecificCode <-> CommonBeamRunnerLibs <-> SDK.

(which applies to the job submission as well as execution).

Insomuch as the BeamRunnerSpecificCode uses the public APIs of the runner, hopefully upgrading the runner for minor versions should be a no-op, and we can target the lowest version of the runner that makes sense, allowing the user to link against higher versions at his or her discretion. We should provide built targets that allow this. For major versions, it may make sense to have two distinct BeamRunnerSpecificCode libraries (which may or may not share some common code). I hope these wrappers are not too thick. 

There is a tight coupling at the BeamRunnerSpecificCode <-> CommonBeamRunnerLibs layer, but hopefully the bulk of the code lives on the right hand side and can be updated as needed independent of the runner. There may be code of the form "if the runner supports X, do this fast path, otherwise, do this slow path (or reject the pipeline). 

I hope the CommonBeamRunnerLibs <-> SDK coupling is fairly loose, to the point that one could use SDKs from different versions of Beam (or even developed outside of Beam) with an older/newer runner. We may need to add versioning to the Fn/Runner/Job API itself to support this. Right now of course we're still in a pre-1.0, rapid-development phase wrt this API. 

On Wed, Sep 12, 2018 at 2:10 PM Etienne Chauchot <echauchot@xxxxxxxxxx> wrote:
Hi Max,

I totally agree with your points especially the users priorities (stick to the already working version) , and the need to leverage important new features. It is indeed a difficult balance to find .

I can talk for a part I know: for the Spark runner, the aim was to support Dataset native spark API (in place of RDD). For that we needed to upgrade to spark 2.x (and we will probably leverage Beam Row as well).
But such an upgrade is a good amount of work which makes it difficult to commit on a schedule such as "if there is a major new feature on an execution engine that we want to leverage, then the upgrade in Beam will be done within x months".

Regarding your point on portability : decoupling SDK from runner with runner harness and SDK harness might make pipeline authors work easy regarding pipeline maintenance. But, still, if we upgrade runner libs, then the users might have their runner harness not work with their engine version.
If such SDK/runner decoupling is 100% functional, then we could imaging having multiple runner harnesses shipping different versions of the runner libs to solve this problem.
But we would need to support more than one version of the runner libs. We chose not to do this on spark runner.



Le mardi 11 septembre 2018 à 15:42 +0200, Maximilian Michels a écrit :
Hi Beamers,

In the light of the discussion about Beam LTS releases, I'd like to kick 
off a thread about how often we upgrade the execution engine of each 
Runner. By upgrade, I mean major/minor versions which typically break 
the binary compatibility of Beam pipelines.

For the Flink Runner, we try to track the latest stable version. Some 
users reported that this can be problematic, as it requires them to 
potentially upgrade their Flink cluster with a new version of Beam.

 From a developer's perspective, it makes sense to migrate as early as 
possible to the newest version of the execution engine, e.g. to leverage 
the newest features. From a user's perspective, you don't care about the 
latest features if your use case still works with Beam.

We have to please both parties. So I'd suggest to upgrade the execution 
engine whenever necessary (e.g. critical new features, end of life of 
current version). On the other hand, the upcoming Beam LTS releases will 
contain a longer-supported version.

Maybe we don't need to discuss much about this but I wanted to hear what 
the community has to say about it. Particularly, I'd be interested in 
how the other Runner authors intend to do it.

As far as I understand, with the portability being stable, we could 
theoretically upgrade the SDK without upgrading the runtime components. 
That would allow us to defer the upgrade for a longer time.