[jira] [Created] (FLINK-9779) Remove SlotRequest timeout
陈梓立 created FLINK-9779:
Summary: Remove SlotRequest timeout
Issue Type: Improvement
Components: JobManager, ResourceManager, TaskManager
As is involved in FLINK-8643 and FLINK-8653, we use external timeout to replace internal timeout of slot request. Follow the question: why not entirely remove this timeout mechanism? In our industrial case, this timeout mechanism causes more no-needed fail and makes resource allocation inaccurate.
I would propose to get rid of slot request timeout. Instead, we handle TM fail in RM where properly cancel pending request and if TM cannot offer slot to JM, we introduce a blacklist mechanism to nudge RM realloc for pending request.
This message was sent by Atlassian JIRA