[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Question about datasource replication

Hi all,
I've a Flink batch job that reads a parquet dataset and then applies 2 flatMap to it (see pseudocode below).
The problem is that this dataset is quite big and Flink duplicates it before sending the data to these 2 operators (I've guessed this from the doubling amount of sent bytes) .
Is there a way to avoid this behaviour?

Here's the pseudo code of my job:

DataSet X = readParquetDir();
X1 = X.flatMap(...); 
X2 = X.flatMap(...);