Latency with cross operation on Datasets

Hello flink community,

I am trying to understand the latency involved in cross operation. Below are
my tests.

In plain Java:
1. Create 2D array 1 - populated with 1 million rows and 3 columns with
randomly generated double values 
2. Create 2D array 1 - populated with 100 rows and 3 columns with randomly
generated double values 
3. Run nested for loop for 1 million X 100 times and perform
EuclideanDistance calculation inside the nested loop 
4. Collect the output in a List of doubles and print size of the list at

above steps are complete in about 15 seconds in plain java on my laptop.

In flink batch:
1. Read avro files with 1 million and 100 rows in same format as above
2. Perform cross operation from 100 rows dataset with 1 million row with
crossWithHuge hint as the broadcasted 1 million dataset is bigger in this
3. Apply map function that will perform distance function. 
4. After cross I am doing a count at the end as a closure step. 

When I package and submit jar to flink cluster it takes about 2 min and 10
sec to complete. I can see that 1 million dataset finishes population from
avro file in a minute and its indicated as broadcast which makes sense.
Since I am running it on a single slot I believe there is not data shipped
across the network. I am wondering why it still takes another 70 seconds to
run cross operation. I understand cartesian product can be expensive but I
am guessing it should be close to the nested loop in Java for this case.
Please advise. 

Thanks for your help in advance!



