[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Gradle Races in beam-examples-java, beam-runners-apex

I had originally suggested to use some Linux kernel tooling such as inotifywait[1] to watch what is happening.

It is likely that we have some Gradle task which is running something in parallel to a different Gradle task when it shouldn't which means that the jar file is being changed/corrupted. I believe fixing our Gradle task dependency tree wrt to this would solve the problem. This crash does not reproduce on my desktop after 20 runs which makes it hard for me to test for.

1: https://www.linuxjournal.com/content/linux-filesystem-events-inotify

On Mon, Sep 10, 2018 at 1:15 PM Ryan Williams <ryan@xxxxxxxxxxxxxxx> wrote:
this continues to be an issue locally (cf. some discussion in #beam slack)

commands like `./gradlew javaPreCommit` or `./gradlew build` reliably fail with a range of different JVM crashes in a few different tasks, with messages that suggest filing a bug against the Java compiler

what do we know about the actual race condition that is allowing one task to attempt to read from a JAR that is being overwritten by another task? presumably this is just a bug in our Gradle configs?

On Mon, Aug 27, 2018 at 2:28 PM Andrew Pilloud <apilloud@xxxxxxxxxx> wrote:
It appears that there is no one working on a fix for the flakes, so I've merged the change to disable parallel tasks on precommit.


On Fri, Aug 24, 2018 at 1:30 PM Andrew Pilloud <apilloud@xxxxxxxxxx> wrote:
I'm seeing failures due to this on 12 of the last 16 PostCommits. Precommits take about 22 minutes run in parallel, so at a 25% pass rate that puts the expected time to a good test run at 264 minutes assuming you immediately restart on each failure. We are looking at 56 minutes for a precommit that isn't run in parallel: https://builds.apache.org/job/beam_PreCommit_Java_Phrase/266/ I'd rather have tests take a little longer then have to monitor them for several hours.


On Fri, Aug 24, 2018 at 10:47 AM Lukasz Cwik <lcwik@xxxxxxxxxx> wrote:
I believe it would mitigate the issue but also make the jobs take much longer to complete.

On Thu, Aug 23, 2018 at 2:44 PM Andrew Pilloud <apilloud@xxxxxxxxxx> wrote:
There seems to be a misconfiguration of gradle that is causing a high rate of failure for the last several weeks in building beam-examples-java and beam-runners-apex. It appears to be some sort of race condition in building dependencies. Given that no one has made progress on fixing the root cause, is this something we could mitigate by running jobs with `--no-parallel` flag?