[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[jira] [Created] (FLINK-10761) MetricGroup#getAllVariables can deadlock

Chesnay Schepler created FLINK-10761:

             Summary: MetricGroup#getAllVariables can deadlock
                 Key: FLINK-10761
                 URL: https://issues.apache.org/jira/browse/FLINK-10761
             Project: Flink
          Issue Type: Bug
          Components: Metrics
    Affects Versions: 1.6.2, 1.5.5, 1.7.0
            Reporter: Chesnay Schepler
            Assignee: Chesnay Schepler
             Fix For: 1.5.6, 1.6.3, 1.7.0

{{AbstractMetricGroup#getAllVariables}} acquires the locks of both the current and all parent groups when assembling the variables map. This can lead to a deadlock if metrics are registered concurrently on a child and parent if the child registration is applied first and the reporter uses said method (which many do).

Assume we have a MetricGroup Mc(hild) and Mp(arent).

2 separate threads Tc and Tp each register a metric on their respective group, acquiring the lock.
Let's assume that Tc has a slight headstart.
Tc will now call {{MetricRegistry#register}} first, acquiring the MR lock.
Tp will block on this lock.

Tc now iterates over all reporters calling {{MetricReporter#notifyOfAddedMetric}}. Assume that in this method {{MetricGroup#getAllVariables}} is called on Mc by Tc.
Tc still holds the lock to Mc, and attempts to acquire the lock to Mp.
The lock to Mp is still held by Tp however, which waits for the MR lock to be released by Tc.

Thus a deadlock is created. This may deadlock anything, be it minor threads, tasks, or entire components.

This has not surfaced so far since usually metrics are no longer added to a group once children have been created (since the component initialization at that point is complete).

This message was sent by Atlassian JIRA