We are running into issues where GC pause will result into Taskmanagers being marked dead incorrectly.
Flink documentation documents some knobs of Akka configurations to play around.
Focusing on “akka.watch.heartbeat.pause”, it mentions “Higher value increases the time to detect a dead TaskManager”
Can someone please help me understand the downside of increasing the time to detect a dead taskmanager?
Will this affect the fault tolerance guarantees / state management/ checkpointing?