[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Dataflow test cluster load grows infinitely due to never ending jobs (Warning big pictures)

I believe this affected the stability of other test suites that schedule jobs on Dataflow. I'll monitor those suites to see if things go back to normal.
Thanks Andrew and Mikhail for looking into this!


On Tue, Aug 7, 2018 at 3:29 PM Mikhail Gryzykhin <migryz@xxxxxxxxxx> wrote:
Cool. Thank you for taking care of this.


Have feedback

On Tue, Aug 7, 2018 at 2:21 PM Andrew Pilloud <apilloud@xxxxxxxxxx> wrote:
Sorry, this is me again. Above some threshold of work Nexmark Query 7 never competes in streaming mode on dataflow. No idea what the cause is, but I've tuned the test to prevent it from happening again. I also canceled all the leaked jobs. All the Dataflow Nexmark jobs are now completing in under an hour: https://builds.apache.org/job/beam_PostCommit_Java_Nexmark_Dataflow/


On Tue, Aug 7, 2018 at 2:15 PM Mikhail Gryzykhin <migryz@xxxxxxxxxx> wrote:
Hi everyone,

Pablo found that load on our Dataflow test cluster started to grow couple of days ago:

I've done some digging and seems that we schedule jobs that never end:

I didn't manage to find code for who schedules these jobs, but suspect that it might be Nexmark jobs since we were fixing those recently.

Can someone help me confirm that this is the reason and find culprit/fix it?

Thank you,

Have feedback
Got feedback? go/pabloem-feedback