[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Support Hadoop 2.6 for StreamingFileSink

I have an idea to create new version of HadoopRecoverableFsDataOutputStream
class (for example with name LegacyHadoopRecoverableFsDataOutputStream :) )
which will works with valid-length files without invoking truncate. And
modify check in HadoopRecoverableWriter to use
LegacyHadoopRecoverableFsDataOutputStream in case if Hadoop version is
lower then 2.7 . I will try to provide PR soon if no objections. I hope I
am on the right way.

On Mon, 20 Aug 2018 at 14:40, Artsem Semianenka <artfulonline@xxxxxxxxx>

> Hi guys !
> I have a question regarding new StreamingFileSink (introduced in 1.6
> version) . We use this sink to write data into Parquet format. But I faced
> with issue when trying to run job on Yarn cluster and save result to HDFS.
> In our case we use latest Cloudera distributive (CHD 5.15) and it contains
> HDFS 2.6.0  . This version is not support truncate method . I would like to
> create Pull request but I want to ask your advice how better design this
> fix and which ideas are behind this decision . I saw similiar PR for
> BucketingSink https://github.com/apache/flink/pull/6108 . Maybe I could
> also add support of valid-length files for older Hadoop versions ?
> P.S.Unfortently CHD 5.15 (with Hadoop 2.6) is the latest version of
> Cloudera distributive and we can't upgrade hadoop to 2.7 Hadoop .
> Best regards,
> Artsem


С уважением,
Артем Семененко