[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Support Hadoop 2.6 for StreamingFileSink

Hi guys !
I have a question regarding new StreamingFileSink (introduced in 1.6
version) . We use this sink to write data into Parquet format. But I faced
with issue when trying to run job on Yarn cluster and save result to HDFS.
In our case we use latest Cloudera distributive (CHD 5.15) and it contains
HDFS 2.6.0  . This version is not support truncate method . I would like to
create Pull request but I want to ask your advice how better design this
fix and which ideas are behind this decision . I saw similiar PR for
BucketingSink https://github.com/apache/flink/pull/6108 . Maybe I could
also add support of valid-length files for older Hadoop versions ?

P.S.Unfortently CHD 5.15 (with Hadoop 2.6) is the latest version of
Cloudera distributive and we can't upgrade hadoop to 2.7 Hadoop .

Best regards,