[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Small-files source - partitioning based on prefix of file

Hi Averall,

As Vino said, checkpoints store the state of all operators of an application.
The state of a monitoring source function is the position in the currently read split and all splits that have been received and are currently pending.

In case of a recovery, the splits are recovered and the source is reset to the split that was read when the checkpoint was taken and set to the correct reading position.
Once, that is done, records that have been read before are read again. However, that does not affect the exactly-once guarantees of the operators state because all of them have been reset to the same position.

Best, Fabian

2018-08-08 9:26 GMT+02:00 vino yang <yanghua1127@xxxxxxxxx>:
Hi Averell,

You need to understand that Flink reflects the recovery of the state, not the recovery of the record. 
Of course, sometimes your record is state, but sometimes the intermediate result of your record is the state. 
It depends on your business logic and your operators.

Thanks, vino.

Averell <lvhuyen@xxxxxxxxx> 于2018年8月8日周三 下午1:17写道:
Thank you Fabian.
"/In either case, some record will be read twice but if reading position can
be reset, you can still have exactly-once state consistency because the
state is reset as well./"
I do not quite understand this statement. If I have read 30 lines from the
checkpoint and sent those 30 records to the next operator, then when the
streaming is recovered - resumed from the last checkpoint, the subsequent
operator would receive those 30 lines again, am I right?


Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/