[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Message guarantees with S3 Sink

Hi Amit,

Can you elaborate how you write using "S3 sink" and which version of Flink you are using?

If you are using BucketingSink[1], you can checkout the API doc and configure to flush before closing your sink.
This way your sink is "integrated with the checkpointing mechanism to provide exactly once semantics"[2]


[1] https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/filesystem_sink.html
[2] https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.html

On Thu, May 17, 2018 at 2:57 AM, Amit Jain <aj2011it@xxxxxxxxx> wrote:

We are using Flink to process click stream data from Kafka and pushing
the same in 128MB file in S3.

What is the message processing guarantees with S3 sink? In my
understanding, S3A client buffers the data on memory/disk. In failure
scenario on particular node, TM would not trigger Writer#close hence
buffered data can lose entirely assuming this buffer contains data of
last successful checkpointing.