OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Trigerring Savepoint for the Flink Job


Hi Anil,

Glad to know that you upgrade the system to 1.4, from our experience there are quite a bit of changes requires to adapt to the new deployment model in 1.4 if I remember correctly.
The Deployment model "run detach" in AthenaX does not support reattach back to the job, we use REST API to do all the subsequent life-cycle management.

There are a couple of ways I can think of to workaround if upgrade to 1.5 is not an option:
- try to use CLI API [1] instead of REST API by replacing the life-cycle management component in WatchdogPolicy, so that you can trigger savepoints.
- try to modify the deployment model of AthenaX to not use "run detach" mode by modifying the "YarnClusterDescriptor"

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/cli.html#savepoints

Hope this can help your use case.

Thanks,
Rong

On Thu, May 31, 2018 at 8:38 PM, Anil <anilsingh.jsr@xxxxxxxxx> wrote:
Thanks for the reply Rong. We had updated Athenax to version 1.4.

I had checked Flink 1.4, it's rest endpoint dose not support only creating
Savepoint. It has cancel With Savepoint. I think creating Savepoint is
supported in 1.5. Since we can't upgrade to 1.5 at the moment it would like
to find a workaround for the moment.

Can you tell me how to reattaches to a running job in the cluster.