osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Support Hadoop 2.6 for StreamingFileSink


Hi Artsem,

if I recall correctly, then we explicitly decided to not support the valid
file length files with the new StreamingFileSink because they are really
hard to handle for the user. I've pulled Klou into this conversation who is
more knowledgeable and can give you a bit more advice.

Cheers,
Till

On Mon, Aug 20, 2018 at 2:53 PM Artsem Semianenka <artfulonline@xxxxxxxxx>
wrote:

> I have an idea to create new version of HadoopRecoverableFsDataOutputStream
> class (for example with name LegacyHadoopRecoverableFsDataOutputStream :) )
> which will works with valid-length files without invoking truncate. And
> modify check in HadoopRecoverableWriter to use
> LegacyHadoopRecoverableFsDataOutputStream in case if Hadoop version is
> lower then 2.7 . I will try to provide PR soon if no objections. I hope I
> am on the right way.
>
> On Mon, 20 Aug 2018 at 14:40, Artsem Semianenka <artfulonline@xxxxxxxxx>
> wrote:
>
> > Hi guys !
> > I have a question regarding new StreamingFileSink (introduced in 1.6
> > version) . We use this sink to write data into Parquet format. But I
> faced
> > with issue when trying to run job on Yarn cluster and save result to
> HDFS.
> > In our case we use latest Cloudera distributive (CHD 5.15) and it
> contains
> > HDFS 2.6.0  . This version is not support truncate method . I would like
> to
> > create Pull request but I want to ask your advice how better design this
> > fix and which ideas are behind this decision . I saw similiar PR for
> > BucketingSink https://github.com/apache/flink/pull/6108 . Maybe I could
> > also add support of valid-length files for older Hadoop versions ?
> >
> > P.S.Unfortently CHD 5.15 (with Hadoop 2.6) is the latest version of
> > Cloudera distributive and we can't upgrade hadoop to 2.7 Hadoop .
> >
> > Best regards,
> > Artsem
> >
>
>
> --
>
> С уважением,
> Артем Семененко
>