[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: org.apache.beam.runners.flink.PortableTimersExecutionTest is very flakey

Thanks Maximilian, let me know if you need any help. Usually I debug this sort of thing by pausing the IntelliJ debugger to see all the different threads which are waiting on various conditions. If you find any insights from that, please post them here and we can try to figure out the source of the stuckness. Perhaps it may be some concurrency issue leading to deadlock?

On Thu, Nov 22, 2018 at 12:57 PM Maximilian Michels <mxm@xxxxxxxxxx> wrote:
I couldn't fix it thus far. The issue does not seem to be in the Flink
Runner but in the way the tests utilizes the EMBEDDED environment to run
multiple portable jobs in a row.

When it gets stuck it is in RemoteBundle#close and it is independent of
the test type (batch and streaming have different implementations).

Will give it another look tomorrow.


On 22.11.18 13:07, Maximilian Michels wrote:
> Hi Alex,
> The test seems to have gotten flaky after we merged support for portable
> timers in Flink's batch mode.
> Looking into this now.
> Thanks,
> Max
> On 21.11.18 23:56, Alex Amato wrote:
>> Hello, I have noticed
>> that org.apache.beam.runners.flink.PortableTimersExecutionTest is very
>> flakey, and repro'd this test timeout on the master branch in 40/50 runs.
>> I filed a JIRA issue: BEAM-6111
>> <https://issues.apache.org/jira/browse/BEAM-6111>. I was just
>> wondering if anyone knew why this may be occurring, and to check if
>> anyone else has been experiencing this.
>> Thanks,
>> Alex