[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Jobs running on a yarn per-job cluster fail to restart when a task manager is lost


I am running a streaming job without checkpointing enabled. A failute rate restart strategy have been set with StreamExecutionEvironment.setRestartStrategy.

When a task manager is lost because of memory problems, the job manager try to restart the job without launching a new task manager, and failed with NoResourceAvailableException: Not enough slots available to run the job.

The job is running on flink 1.4.2 and Hadoop 2.7.4.