[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: runtime.resourcemanager


Please investigate logs/standard output/error from the task manager that has failed (the logs that you showed are from job manager). Probably there is some obvious error/exception explaining why has it failed. Most common reasons:
- out of memory
- long GC pause
- seg fault or other error from some native library
- task manager killed via for example SIGKILL


> On 6 Dec 2018, at 17:34, Alieh <saeedi@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> Hello all,
> I have an algorithm x () which contains several joins and usage of 3 times of gelly ConnectedComponents. The problem is that if I call x() inside a script more than three times, I receive the messages listed below in the log and the program is somehow stopped. It happens even if I run it with a toy example of a graph with less that 10 vertices. Do you have any clue what is the problem?
> Cheers,
> Alieh
> 129149 [flink-akka.actor.default-dispatcher-20] DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Trigger heartbeat request.
> 129149 [flink-akka.actor.default-dispatcher-20] DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Trigger heartbeat request.
> 129150 [flink-akka.actor.default-dispatcher-20] DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor  - Received heartbeat request from e80ec35f3d0a04a68000ecbdc555f98b.
> 129150 [flink-akka.actor.default-dispatcher-22] DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Received heartbeat from 78cdd7a4-0c00-4912-992f-a2990a5d46db.
> 129151 [flink-akka.actor.default-dispatcher-22] DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Received new slot report from TaskManager 78cdd7a4-0c00-4912-992f-a2990a5d46db.
> 129151 [flink-akka.actor.default-dispatcher-22] DEBUG org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Received slot report from instance 4c3e3654c11b09fbbf8e993a08a4c2da.
> 129200 [flink-akka.actor.default-dispatcher-15] DEBUG org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Release TaskExecutor 4c3e3654c11b09fbbf8e993a08a4c2da because it exceeded the idle timeout.
> 129200 [flink-akka.actor.default-dispatcher-15] DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Worker 78cdd7a4-0c00-4912-992f-a2990a5d46db could not be stopped.