osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HELP] Bisect


Hi devs,

At tag release-1.5.1, I find LOG contains noise like below.

```
13:58:40,752 INFO
org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Job
a18a72a9c335b5cc964f43075418dedc reached globally terminal state FINISHED.
13:58:40,753 INFO  org.apache.flink.runtime.minicluster.MiniCluster
      - Shutting down Flink Mini Cluster
13:58:40,753 INFO
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Shutting
down rest endpoint.
13:58:40,754 INFO  org.apache.flink.runtime.jobmaster.JobMaster
      - Stopping the JobMaster for job Flink Java Job at Thu Aug 02
13:58:29 CST 2018(a18a72a9c335b5cc964f43075418dedc).
13:58:40,754 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor
      - Stopping TaskExecutor akka://flink/user/taskmanager_0.
13:58:40,755 INFO
org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager  -
Shutting down TaskExecutorLocalStateStoresManager.
13:58:40,755 INFO  org.apache.flink.runtime.jobmaster.JobMaster
      - Close ResourceManager connection 58178cf2968f728837df335f4a8a63a5:
JobManager is shutting down..
13:58:40,755 INFO
org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Stopping
dispatcher akka://flink/user/dispatcher65cb755a-d441-48b5-8942-52465b77bae2.
13:58:40,755 INFO
org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Stopping
all currently running jobs of dispatcher
akka://flink/user/dispatcher65cb755a-d441-48b5-8942-52465b77bae2.
13:58:40,756 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPool
      - Suspending SlotPool.
13:58:40,756 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPool
      - Stopping SlotPool.
13:58:40,758 INFO
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  -
Disconnect job manager
93c37c8311c692f1a5b0c7caaa074a3a@akka://flink/user/jobmanager_1
for job a18a72a9c335b5cc964f43075418dedc from the resource manager.
13:58:40,758 INFO
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - Closing
the SlotManager.
13:58:40,758 INFO
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  -
Suspending the SlotManager.
13:58:40,759 INFO
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  -
Unregister TaskManager 8c28415e2960d2f7a49a12cf329e2444 from the
SlotManager.
13:58:40,759 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor
      - JobManager for job a18a72a9c335b5cc964f43075418dedc with leader id
93c37c8311c692f1a5b0c7caaa074a3a lost leadership.
13:58:40,759 INFO
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - Slot
request with allocation id a3f91d383ba2e647c07894a0974af2fb for job
a18a72a9c335b5cc964f43075418dedc failed.
org.apache.flink.util.FlinkException: The assigned slot
2cd8d254-283d-472c-8ae2-6499e4ba04bf_7 was removed.
at
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:895)
...
13:58:40,761 INFO
org.apache.flink.runtime.rest.handler.legacy.backpressure.StackTraceSampleCoordinator
- Shutting down stack trace sample coordinator.
13:58:40,763 INFO
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - Slot
request with allocation id 25f5cd03e1bd5654c54c20143fd05a7e for job
a18a72a9c335b5cc964f43075418dedc failed.
org.apache.flink.util.FlinkException: The assigned slot
2cd8d254-283d-472c-8ae2-6499e4ba04bf_6 was removed.
at
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:895)
...
13:58:40,766 INFO
org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Stopped
dispatcher akka://flink/user/dispatcher65cb755a-d441-48b5-8942-52465b77bae2.
13:58:40,766 INFO
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - Slot
request with allocation id 32047761d6066628b196ac13b7b2993e for job
a18a72a9c335b5cc964f43075418dedc failed.
org.apache.flink.util.FlinkException: The assigned slot
2cd8d254-283d-472c-8ae2-6499e4ba04bf_5 was removed.
at
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:895)
...

(repeat likely)
13:58:40,768 INFO
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - Slot
request with allocation id befc46831f1f0ec5ef54cd660d12aa88 for job
a18a72a9c335b5cc964f43075418dedc failed.
org.apache.flink.util.FlinkException: The assigned slot
2cd8d254-283d-472c-8ae2-6499e4ba04bf_1 was removed.
at
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:895)
...
13:58:40,784 INFO
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Removing
cache directory
/var/folders/ff/92vml6qd147dncsvvw28qlrc0000gn/T/flink-web-ui
13:58:40,785 INFO  org.apache.flink.runtime.io.disk.iomanager.IOManager
      - I/O manager removed spill file directory
/var/folders/ff/92vml6qd147dncsvvw28qlrc0000gn/T/flink-io-22be945e-daeb-49e9-ab3b-22521e9d2280
13:58:40,785 INFO  org.apache.flink.runtime.io.network.NetworkEnvironment
      - Shutting down the network environment and its components.
13:58:40,796 INFO  org.apache.flink.runtime.taskexecutor.JobLeaderService
      - Stop job leader service.
13:58:40,796 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor
      - Stopped TaskExecutor akka://flink/user/taskmanager_0.
13:58:40,804 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService
      - Stopping Akka RPC service.
13:58:40,834 INFO  org.apache.flink.runtime.blob.PermanentBlobCache
      - Shutting down BLOB cache
13:58:40,834 INFO  org.apache.flink.runtime.blob.TransientBlobCache
      - Shutting down BLOB cache
13:58:40,836 INFO  org.apache.flink.runtime.blob.BlobServer
      - Stopped BLOB server at 0.0.0.0:50685
13:58:40,836 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService
      - Stopped Akka RPC service.
```

Note that the job is FINISHED and RM, TM, JM and so on shutdown correctly.

To reproduce it, run
`flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/ml/LinearRegression.java`

The code snip relevant is here
<https://github.com/apache/flink/blob/628b71dfabe72a99eb6d54994fc870abed1f0268/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/SlotManager.java#L783>
.

But it could not be reproduced in master, which infers there is a commit
suppress it. So I come to here ask for help to accurately bisect it.

Thanks,
tison