[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Flink Rebalance


Elias and Paul have good points.
I think the performance degradation is mostly to the lack of function chaining in the rebalance case.

If all steps are just map functions, they can be chained in the no-rebalance case.
That means, records are passed via function calls.
If you add rebalancing, records will be passed between map functions via serialization, network transfer, and deserialization.
This is of course much more expensive than calling a method.

Best, Fabian

2018-08-10 4:25 GMT+02:00 Paul Lam <paullin3280@xxxxxxxxx>:
Hi Antonio, 

AFAIK, there are two reasons for this: 

1. Rebalancing itself brings latency because it takes time to redistribute the elements. 
2. Rebalancing also messes up the order in the Kafka topic partitions, and often makes a event-time window wait longer to trigger in case you’re using event time characteristic. 

Best Regards,
Paul Lam

在 2018年8月10日,05:49,antonio saldivar <ansale10@xxxxxxxxx> 写道:


Sending ~450 elements per second ( the values are in milliseconds start to end)
I went from:
with Rebalance
32131.0853   |

to this without rebalance

| 70.2077    |

El jue., 9 ago. 2018 a las 17:42, Elias Levy (<fearsome.lucidity@xxxxxxxxx>) escribió:
What do you consider a lot of latency?  The rebalance will require serializing / deserializing the data as it gets distributed.  Depending on the complexity of your records and the efficiency of your serializers, that could have a significant impact on your performance.

On Thu, Aug 9, 2018 at 2:14 PM antonio saldivar <ansale10@xxxxxxxxx> wrote:

Does anyone know why when I add "rebalance()" to my .map steps is adding a lot of latency rather than not having rebalance.

I have kafka partitions in my topic 44 and 44 flink task manager

execution plan looks like this when I add rebalance but it is adding a lot of latency

kafka-src -> rebalance -> step1 -> rebalance ->step2->rebalance -> kafka-sink

Thank you