[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Message guarantees with S3 Sink

Hi Amit,

The BucketingSink doesn't have well defined semantics when used with S3. Data
loss is possible but I am not sure whether it is the only problem. There are
plans to rewrite the BucketingSink in Flink 1.6 to enable eventually consistent
file systems [1][2].


[1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/sink-with-BucketingSink-to-S3-files-override-td18433.html
[2] https://issues.apache.org/jira/browse/FLINK-6306

On Thu, May 17, 2018 at 11:57 AM, Amit Jain <aj2011it@xxxxxxxxx> wrote:

We are using Flink to process click stream data from Kafka and pushing
the same in 128MB file in S3.

What is the message processing guarantees with S3 sink? In my
understanding, S3A client buffers the data on memory/disk. In failure
scenario on particular node, TM would not trigger Writer#close hence
buffered data can lose entirely assuming this buffer contains data of
last successful checkpointing.