I am using time-windowed join only. Here's a sample query -
SELECT a1.order_id, a2.order.restaurant_id FROM awz_s3_stream1 a1 INNER JOIN
awz_s3_stream2 a2 ON CAST(a1.order_id AS VARCHAR) = a2.order_id AND
a1.to_state = 'PLACED' AND a1.proctime BETWEEN a2.proctime - INTERVAL '2'
HOUR AND a2.proctime + INTERVAL '2' HOUR GROUP BY HOP(a2.proctime, INTERVAL
'2' MINUTE, INTERVAL '1' HOUR), a2.`order`.restaurant_id
Just to simplify my question -
Suppose I have a TM with 4 slots and I deploy a flink job with parallelism=4
with 2 container - 1 JM and 1 TM. Each parallel instance will be deployed in
one task slot each in the TM (the entire job pipeline running per slot ).My
jobs does a join(SQL time-windowed join on non-keyed stream) and they buffer
last few hours of data. My question is will these threads running in
different task slot share this data buffered for join. What all data is
shared across these threads.
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/