[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Are savepoints / checkpoints co-ordinated?

Hi Anand,

About "Cancel with savepoint" congxian is right.

And for the duplicates, You should use kafka producer transaction (since 0.11) provided EXACTLY_ONCE semantic[1].

Thanks, vino.

Congxian Qiu <qcx978132955@xxxxxxxxx> 于2018年10月12日周五 下午7:55写道:
AFAIK, "Cancel Job with Savepoint" will stop checkpointScheduler -->  trigger a savepoint, then cancel your job. there will no more checkpoints.

<anand.gopinath@xxxxxxx> 于2018年10月12日周五 上午1:30写道:



I had a couple questions about savepoints / checkpoints


When I issue "Cancel Job with Savepoint", how is that instruction co-ordinated with check points? Am I certain the savepoint will be the last operation (i.e. no more check points)? 


I have a kafka src>operation>kafka sink task in flink. And it looks like on restart from the savepoint there are duplicates written to the sink topic in kafka. The dupes overlap with the last few events prior to save point, and I am trying to work out what could have happened.

My FlinkKafkaProducer011  is set to Semantic.AT_LEAST_ONCE, but env.enableCheckpointing(parameters.getInt("checkpoint.interval"), CheckpointingMode.EXACTLY_ONCE).

I thought at least once still implies flushes to kafka still only occur with a checkpoint.


One  theory is a further checkpoint occurred after/ during the savepoint - which would have flushed events to kafka that are not in my savepoint.


Any pointers to schoolboy errors I may have made would be appreciated.



Also  am I right in thinking if I have managed state with rocksdb back end that is using 1G on disk, but substantially less keyed state in memory, a savepoint needs to save the full 1G to complete?