Thanks for moving forward with this, Lukasz!
Unfortunately, can't make it on Friday but I'll sync with somebody on
the call (e.g. Ryan) about your discussion.
On 08.09.18 02:00, Lukasz Cwik wrote:
> Thanks for everyone who wanted to fill out the doodle poll. The most
> popular time was Friday Sept 14th from 11am-noon PST. I'll send out a
> calendar invite and meeting link early next week.
> I have received a lot of feedback on the document and have addressed
> some parts of it including:
> * clarifying terminology
> * processing skew due to some restrictions having their watermarks much
> further behind then others affecting scheduling of bundles by runners
> * external throttling & I/O wait overhead reporting to make sure we
> don't overscale
> Areas that still need additional feedback and details are:
> * reporting progress around the work that is done and is active
> * more examples
> * unbounded restrictions being caused by an unbounded number of splits
> of existing unbounded restrictions (infinite work growth)
> * whether we should be reporting this information at the PTransform
> level or at the bundle level
> On Wed, Sep 5, 2018 at 1:53 PM Lukasz Cwik <lcwik@xxxxxxxxxx
> <mailto:lcwik@xxxxxxxxxx>> wrote:
> Thanks to all those who have provided interest in this topic by the
> questions they have asked on the doc already and for those
> interested in having this discussion. I have setup this doodle to
> allow people to provide their availability:
> I'll send out the chosen time based upon peoples availability and a
> Hangout link by end of day Friday so please mark your availability
> using the link above.
> The agenda of the meeting will be as follows:
> * Overview of the proposal
> * Enumerate and discuss/answer questions brought up in the meeting
> Note that all questions and any discussions/answers provided will be
> added to the doc for those who are unable to attend.
> On Fri, Aug 31, 2018 at 9:47 AM Jean-Baptiste Onofré
> <jb@xxxxxxxxxxxx <mailto:jb@xxxxxxxxxxxx>> wrote:
> Le 31 août 2018, à 18:22, Lukasz Cwik <lcwik@xxxxxxxxxx
> <mailto:lcwik@xxxxxxxxxx>> a écrit:
> That is possible, I'll take people's date/time suggestions
> and create a simple online poll with them.
> On Fri, Aug 31, 2018 at 2:22 AM Robert Bradshaw
> <robertwb@xxxxxxxxxx <mailto:robertwb@xxxxxxxxxx>> wrote:
> Thanks for taking this up. I added some comments to the
> doc. A European-friendly time for discussion would
> be great.
> On Fri, Aug 31, 2018 at 3:14 AM Lukasz Cwik
> <lcwik@xxxxxxxxxx <mailto:lcwik@xxxxxxxxxx>> wrote:
> I came up with a proposal for a progress model
> solely based off of the backlog and that splits
> should be based upon the remaining backlog we want
> the SDK to split at. I also give recommendations to
> runner authors as to how an autoscaling system could
> work based upon the measured backlog. A lot of
> discussions around progress reporting and splitting
> in the past has always been around finding an
> optimal solution, after reading a lot of information
> about work stealing, I don't believe there is a
> general solution and it really is upto
> SplittableDoFns to be well behaved. I did not do
> much work in classifying what a well behaved
> SplittableDoFn is though. Much of this work builds
> off ideas that Eugene had documented in the past.
> I could use the communities wide knowledge of
> different I/Os to see if computing the backlog is
> practical in the way that I'm suggesting and to
> gather people's feedback.
> If there is a lot of interest, I would like to hold
> a community video conference between Sept 10th and
> 14th about this topic. Please reply with your
> availability by Sept 6th if your interested.
> 1: https://s.apache.org/beam-bundles-backlog-splitting
> 2: https://s.apache.org/beam-breaking-fusion
> On Mon, Aug 13, 2018 at 10:21 AM Jean-Baptiste
> Onofré <jb@xxxxxxxxxxxx <mailto:jb@xxxxxxxxxxxx>> wrote:
> Awesome !
> Thanks Luke !
> I plan to work with you and others on this one.
> Le 13 août 2018, à 19:14, Lukasz Cwik
> <lcwik@xxxxxxxxxx <mailto:lcwik@xxxxxxxxxx>> a
> I wanted to reach out that I will be
> continuing from where Eugene left off with
> SplittableDoFn. I know that many of you have
> done a bunch of work with IOs and/or runner
> integration for SplittableDoFn and would
> appreciate your help in advancing this
> awesome idea. If you have questions or
> things you want to get reviewed related to
> SplittableDoFn, feel free to send them my
> way or include me on anything SplittableDoFn
> I was part of several discussions with
> Eugene and I think the biggest outstanding
> design portion is to figure out how dynamic
> work rebalancing would play out with the
> portability APIs. This includes reporting of
> progress from within a bundle. I know that
> Eugene had shared some documents in this
> regard but the position / split models
> didn't work too cleanly in a unified sense
> for bounded and unbounded SplittableDoFns.
> It will likely take me awhile to gather my
> thoughts but could use your expertise as to
> how compatible these ideas are with respect
> to to IOs and runners
> Flink/Spark/Dataflow/Samza/Apex/... and
> obviously help during implementation.