[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Message guarantees with S3 Sink

Thanks Gary!

Sure, there are issues with updates in S3. You may want to look over
EMRFS guarantees of the consistent view [1]. I'm not sure, is it
possible in non-EMR AWS system or not.

I'm creating a JIRA issue regarding data loss possibility in S3. IMHO,
Flink docs should mention about possible data loss in S3.

[1] https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-consistent-view.html


On Fri, May 18, 2018 at 2:48 AM, Gary Yao <gary@xxxxxxxxxxxxxxxxx> wrote:
> Hi Amit,
> The BucketingSink doesn't have well defined semantics when used with S3.
> Data
> loss is possible but I am not sure whether it is the only problem. There are
> plans to rewrite the BucketingSink in Flink 1.6 to enable eventually
> consistent
> file systems [1][2].
> Best,
> Gary
> [1]
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/sink-with-BucketingSink-to-S3-files-override-td18433.html
> [2] https://issues.apache.org/jira/browse/FLINK-6306
> On Thu, May 17, 2018 at 11:57 AM, Amit Jain <aj2011it@xxxxxxxxx> wrote:
>> Hi,
>> We are using Flink to process click stream data from Kafka and pushing
>> the same in 128MB file in S3.
>> What is the message processing guarantees with S3 sink? In my
>> understanding, S3A client buffers the data on memory/disk. In failure
>> scenario on particular node, TM would not trigger Writer#close hence
>> buffered data can lose entirely assuming this buffer contains data of
>> last successful checkpointing.
>> --
>> Thanks,
>> Amit