[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Small-files source - partitioning based on prefix of file

Thank you Vino, Jorn, and Fabian.
Please forgive me for my ignorant, as I am still not able to fully
understand state/checkpointing and the statement that Fabian gave earlier:
"/In either case, some record will be read twice but if reading position can
be reset, you can still have exactly-once state consistency because the
state is reset as well./"

My current understanding is: checkpointing is managed at the
Execution-Environment level, and it would happen at the same time at all the
operators of the pipeline. Is this true?
My concern here is how to manage that synchronization? It would be quite
possible that at different operators, checkpointing happens at some
milliseconds apart, which would lead to duplicated or missed records,
wouldn't it?

I tried to read Flink's document about managing State  here
. However, I have not been able to find the information I am looking for.
Please help point me to the right place.

Thanks and best regards,

Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/