osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Support Hadoop 2.6 for StreamingFileSink


Thanks for reply, Till !

Buy the way, If Flink going to support compatibility with Hadoop 2.6 I
don't see another way how to achieve it.
As I mention before one of popular distributive Cloudera still based on
Hadoop 2.6 and it very sad if Flink unsupport it.
I really want to help Flink comunity to support this legacy. But currently
I see only one way to acheve it by emulate 'truncate' logic and recreate
new file with needed lenght and replace old .

Cheers,
Artsem

On Tue, 21 Aug 2018 at 14:41, Till Rohrmann <trohrmann@xxxxxxxxxx> wrote:

> Hi Artsem,
>
> if I recall correctly, then we explicitly decided to not support the valid
> file length files with the new StreamingFileSink because they are really
> hard to handle for the user. I've pulled Klou into this conversation who is
> more knowledgeable and can give you a bit more advice.
>
> Cheers,
> Till
>
> On Mon, Aug 20, 2018 at 2:53 PM Artsem Semianenka <artfulonline@xxxxxxxxx>
> wrote:
>
> > I have an idea to create new version of
> HadoopRecoverableFsDataOutputStream
> > class (for example with name LegacyHadoopRecoverableFsDataOutputStream
> :) )
> > which will works with valid-length files without invoking truncate. And
> > modify check in HadoopRecoverableWriter to use
> > LegacyHadoopRecoverableFsDataOutputStream in case if Hadoop version is
> > lower then 2.7 . I will try to provide PR soon if no objections. I hope I
> > am on the right way.
> >
> > On Mon, 20 Aug 2018 at 14:40, Artsem Semianenka <artfulonline@xxxxxxxxx>
> > wrote:
> >
> > > Hi guys !
> > > I have a question regarding new StreamingFileSink (introduced in 1.6
> > > version) . We use this sink to write data into Parquet format. But I
> > faced
> > > with issue when trying to run job on Yarn cluster and save result to
> > HDFS.
> > > In our case we use latest Cloudera distributive (CHD 5.15) and it
> > contains
> > > HDFS 2.6.0  . This version is not support truncate method . I would
> like
> > to
> > > create Pull request but I want to ask your advice how better design
> this
> > > fix and which ideas are behind this decision . I saw similiar PR for
> > > BucketingSink https://github.com/apache/flink/pull/6108 . Maybe I
> could
> > > also add support of valid-length files for older Hadoop versions ?
> > >
> > > P.S.Unfortently CHD 5.15 (with Hadoop 2.6) is the latest version of
> > > Cloudera distributive and we can't upgrade hadoop to 2.7 Hadoop .
> > >
> > > Best regards,
> > > Artsem
> > >
> >
> >
> > --
> >
> > С уважением,
> > Артем Семененко
> >
>


-- 

С уважением,
Артем Семененко