Re: Setting an allowable number of checkpoint failures
Your understanding of "
*CheckpointConfig#setFailOnCheckpointingErrors(false)*" is correct, If this
is set to false, the task will only decline a the checkpoint and continue
I think it is also a good choice to allow a number of failures to be set.
Flink currently only supports whether the Task fails if the checkpoint
fails. It is not supported to configure a threshold.
You can create an issue in JIRA to feedback this requirement.
2018-08-04 4:28 GMT+08:00 Lakshmi Gururaja Rao <lrao@xxxxxxxx>:
> We are running into intermittent checkpoint failures while checkpointing to
> As described in this thread -
> we see that the job restarts when it encounters such a failure.
> As mentioned in the thread, I see that there is an option to not fail tasks
> on checkpoint errors -
> *CheckpointConfig#setFailOnCheckpointingErrors(false)**. *However, this
> would mean that the job would continue running even in the case of
> persistent checkpoint failures. Is my understanding here correct?
> If above is true, then is there a way to configure an allowable number of
> checkpoint failures? i.e. something along the lines of "Don't fail the job
> if there are <=X number of checkpoint failures", so that *only *transient
> failures can be ignored.