[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Gradle Races in beam-examples-java, beam-runners-apex

Do we have inotifywait available on Travis and could set it up to log concurrent access to the relevant Jar files?

On 10.09.18 22:41, Lukasz Cwik wrote:
I had originally suggested to use some Linux kernel tooling such as inotifywait[1] to watch what is happening.

It is likely that we have some Gradle task which is running something in parallel to a different Gradle task when it shouldn't which means that the jar file is being changed/corrupted. I believe fixing our Gradle task dependency tree wrt to this would solve the problem. This crash does not reproduce on my desktop after 20 runs which makes it hard for me to test for.

1: https://www.linuxjournal.com/content/linux-filesystem-events-inotify

On Mon, Sep 10, 2018 at 1:15 PM Ryan Williams <ryan@xxxxxxxxxxxxxxx <mailto:ryan@xxxxxxxxxxxxxxx>> wrote:

    this continues to be an issue locally (cf. some discussion in #beam

    commands like `./gradlew javaPreCommit` or `./gradlew build`
    reliably fail with a range of different
    JVM crashes
    in a few different tasks, with messages that suggest filing a bug
    against the Java compiler

    what do we know about the actual race condition that is allowing one
    task to attempt to read from a JAR that is being overwritten by
    another task? presumably this is just a bug in our Gradle configs?

    On Mon, Aug 27, 2018 at 2:28 PM Andrew Pilloud <apilloud@xxxxxxxxxx
    <mailto:apilloud@xxxxxxxxxx>> wrote:

        It appears that there is no one working on a fix for the flakes,
        so I've merged the change to disable parallel tasks on precommit.


        On Fri, Aug 24, 2018 at 1:30 PM Andrew Pilloud
        <apilloud@xxxxxxxxxx <mailto:apilloud@xxxxxxxxxx>> wrote:

            I'm seeing failures due to this on 12 of the last 16
            PostCommits. Precommits take about 22 minutes run in
            parallel, so at a 25% pass rate that puts the expected time
            to a good test run at 264 minutes assuming you immediately
            restart on each failure. We are looking at 56 minutes for a
            precommit that isn't run in parallel:
            https://builds.apache.org/job/beam_PreCommit_Java_Phrase/266/ I'd
            rather have tests take a little longer then have to monitor
            them for several hours.

            I've opened a PR: https://github.com/apache/beam/pull/6274


            On Fri, Aug 24, 2018 at 10:47 AM Lukasz Cwik
            <lcwik@xxxxxxxxxx <mailto:lcwik@xxxxxxxxxx>> wrote:

                I believe it would mitigate the issue but also make the
                jobs take much longer to complete.

                On Thu, Aug 23, 2018 at 2:44 PM Andrew Pilloud
                <apilloud@xxxxxxxxxx <mailto:apilloud@xxxxxxxxxx>> wrote:

                    There seems to be a misconfiguration of gradle that
                    is causing a high rate of failure for the last
                    several weeks in building beam-examples-java and
                    beam-runners-apex. It appears to be some sort of
                    race condition in building dependencies. Given that
                    no one has made progress on fixing the root cause,
                    is this something we could mitigate by running jobs
                    with `--no-parallel` flag?