OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Question about datasource replication


Flink 1.3.1 (I'm waiting 1.5 before upgrading..)

On Fri, May 4, 2018 at 2:50 PM, Amit Jain <aj2011it@xxxxxxxxx> wrote:
Hi Flavio,

Which version of Flink are you using?

--
Thanks,
Amit

On Fri, May 4, 2018 at 6:14 PM, Flavio Pompermaier <pompermaier@xxxxxxxx> wrote:
> Hi all,
> I've a Flink batch job that reads a parquet dataset and then applies 2
> flatMap to it (see pseudocode below).
> The problem is that this dataset is quite big and Flink duplicates it before
> sending the data to these 2 operators (I've guessed this from the doubling
> amount of sent bytes) .
> Is there a way to avoid this behaviour?
>
> -------------------------------------------------------
> Here's the pseudo code of my job:
>
> DataSet X = readParquetDir();
> X1 = X.flatMap(...);
> X2 = X.flatMap(...);
>
> Best,
> Flavio