[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Tracking what works with portability

Thanks Henning! This spreadsheet is super helpful and long needed. 

I like how this is also serving as portability JIRA index, which I found extremely hard to navigate till now. How about making a permalink and reference it on https://beam.apache.org/contribute/portability/  ?

Let's hope that the deep red areas in the Flink runner start to change to friendlier colors soon as pieces are making their way back from the prototype branch..


On Fri, May 11, 2018 at 12:47 PM, Henning Rohde <herohde@xxxxxxxxxx> wrote:
For runners*SDK pairs that don't have a batch/streaming distinction how about collapsing the columns?

There is also often a difference in whether we've actually tried them or whether there are regression tests. Once we have a clearer (= greener and bluer) picture, I'm fine with collapsing some columns. But, for now, I'd like to see how it plays out.


On Fri, May 11, 2018 at 12:16 PM Henning Rohde <herohde@xxxxxxxxxx> wrote:
> Yea so I guess the column is more just "what works?" and not "what works with portability?" 

Yeah - the Direct runner column is just "what works". It's included, because direct runners are still relevant in the portable world and it's useful to see what is supported there in comparison with the portable runners. I clarified the caption.


On Fri, May 11, 2018 at 12:12 PM Kenneth Knowles <klk@xxxxxxxxxx> wrote:
On Fri, May 11, 2018 at 11:46 AM Lukasz Cwik <lcwik@xxxxxxxxxx> wrote:

On Fri, May 11, 2018 at 11:40 AM Kenneth Knowles <klk@xxxxxxxxxx> wrote:
This is great. "The Beam Vision in a spreadsheet" and/or what the capability matrix wishes it always had been.

 - I don't know how to interpret the DirectRunner column. Is it that it uses ye olde proto round trip? Another level is that it actually directly links in the SDK harness as a dep and uses the exact code paths (seems like overkill).

Its up to the direct runner here to decide what level of execution is actually done via portability APIs but it is meant to be a single process to ease debugging for users.

Yea so I guess the column is more just "what works?" and not "what works with portability?" in this case. Just a clarification - either way is fine by me. I wasn't sure if the column was to track progress on making the direct runners respect the model or whatnot. Without a proto round trip, a DirectRunner can easily have non-model behaviors by using information that it shouldn't.

 - For runners*SDK pairs that don't have a batch/streaming distinction how about collapsing the columns?

Runners may not have a distinction but the portability framework may require more work from a runner to support a use case. A good example of this is side input readiness checking for streaming pipelines.

What do you mean the portability framework? Do you mean an SDK harness? Or that the protos do not express enough information?


 - Anyone have spreadsheet-fu to do a permanent global automatic hyperlinking of BEAM-xxxx?


On Fri, May 11, 2018 at 10:38 AM Henning Rohde <herohde@xxxxxxxxxx> wrote:
Hi everyone,

 While the portability framework moves forward, it is often hard to figure out exactly what is supported to work at any given time. There are still many irregularities, TODOs, bugs and small differences between batch and streaming and the portable SDK and runner implementations. For example, the answer to the question "Does Wordcount run portably?" depends on the SDK, Runner and where the output is written.

To this end, I've started a spreadsheet to better track the "swiss cheese" of what works portably:

Note that is is a work in progress. The intended audience is for everyone working on or interested in portability. I am hoping we can populate, expand and maintain the information as a community, until the portability framework support is mature enough to allow SDKs and runners to be considered independently.

Comments and suggestions welcome!