osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[jira] [Created] (FLINK-10841) Reduce the number of ListObjects calls when checkpointing to S3


Pawel Bartoszek created FLINK-10841:
---------------------------------------

             Summary: Reduce the number of ListObjects calls when checkpointing to S3
                 Key: FLINK-10841
                 URL: https://issues.apache.org/jira/browse/FLINK-10841
             Project: Flink
          Issue Type: Improvement
          Components: FileSystem
    Affects Versions: 1.6.2, 1.5.5
            Reporter: Pawel Bartoszek


With S3 configured as checkpoint store using S3 AWS Hadoop filesystem we see loads of ListObjects calls. For instance the job with ~1600 tasks requires around 23000 ListObjects calls for every checkpoint including clearing it up by Flink. With checkpoint interval set to 5 minutes this adds up to hundreds of dollars pay month just for ListObjects calls. I am aware that implementation details might be hidden in Hadoop jar and maybe difficult to change, but at least maybe some workaround might be suggested?

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)