I was trying to cancel a job with savepoint, but the CLI command failed with "akka.pattern.AskTimeoutException: Ask timed out".
The stack trace reveals that ask timeout is 10 seconds:
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/jobmanager_0#106635280]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
Indeed it's documented that the default value for akka.ask.timeout="10 s" in
Behind the scenes the savepoint creation & job cancellation succeeded, that was to be expected, kind of. So my problem is just getting a proper response back from the CLI call instead of timing out so eagerly.
To be exact, what I ran was:
flink-1.5.2/bin/flink cancel b7c7d19d25e16a952d3afa32841024e5 -m yarn-cluster -yid application_1533676784032_0001 --withSavepoint
Should I change the akka.ask.timeout to have a longer timeout? If yes, can I override it just for the CLI call somehow? Maybe it might have undesired side-effects if set globally for the actual flink jobs to use?
What about akka.client.timeout? The default for it is also rather low: "60 s". Should it also be increased accordingly if I want to accept longer than 60 s for savepoint creation?
Finally, that default timeout is so low that I would expect this to be a common problem. I would say that Flink CLI should have higher default timeout for cancel and savepoint creation ops.