[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Checkpointing on cluster shutdown

If a TM goes down any data generated after the last successful checkpoint cannot be guaranteed to be consistent across the cluster.
Hence, this data is discarded and we go back to the last known consistent state, the last checkpoint that was successfully created.

On 05.06.2018 13:06, Garvit Sharma wrote:
But job should be terminated gracefully. Why is this behavior not there?

On Tue, Jun 5, 2018 at 4:19 PM, Chesnay Schepler <chesnay@xxxxxxxxxx> wrote:
No checkpoint will be triggered when the cluster is shutdown. For this case you will have to manually trigger a savepoint.

If a TM goes down it does not create a checkpoint. IN these cases the job will be restarted from the last successful checkpoint.

On 05.06.2018 12:01, Data Engineer wrote:

Suppose I have a working Flink cluster with 1 taskmanager and 1 jobmanager and I have enabled checkpointing with say an interval of 1 minute.
Now if I shut down the Flink cluster in between checkpoints (say for some upgrade), will the JobManager automatically trigger a checkpoint before going down?

Or is it mandatory to manually trigger savepoints in these cases?
Also am I correct in my understanding that if a taskmanager goes down first, there is no way the TaskManager can trigger the checkpoint on its own?


Garvit Sharma

No Body is a Scholar by birth, its only hard work and strong determination that makes him master.