Re: Message guarantees with S3 Sink
Sure, there are issues with updates in S3. You may want to look over
EMRFS guarantees of the consistent view . I'm not sure, is it
possible in non-EMR AWS system or not.
I'm creating a JIRA issue regarding data loss possibility in S3. IMHO,
Flink docs should mention about possible data loss in S3.
On Fri, May 18, 2018 at 2:48 AM, Gary Yao <gary@xxxxxxxxxxxxxxxxx> wrote:
> Hi Amit,
> The BucketingSink doesn't have well defined semantics when used with S3.
> loss is possible but I am not sure whether it is the only problem. There are
> plans to rewrite the BucketingSink in Flink 1.6 to enable eventually
> file systems .
>  https://issues.apache.org/jira/browse/FLINK-6306
> On Thu, May 17, 2018 at 11:57 AM, Amit Jain <aj2011it@xxxxxxxxx> wrote:
>> We are using Flink to process click stream data from Kafka and pushing
>> the same in 128MB file in S3.
>> What is the message processing guarantees with S3 sink? In my
>> understanding, S3A client buffers the data on memory/disk. In failure
>> scenario on particular node, TM would not trigger Writer#close hence
>> buffered data can lose entirely assuming this buffer contains data of
>> last successful checkpointing.