[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Tuning checkpoint

Hi all,

I have several questions regarding the checkpoint. The background is I'm using a ProcessFunction keyed by user_id somehow works like following:

.keyBy(x => getUserKey(x))
It runs on yarn with 40 TMs * 2 slots each, when I look at the checkpoint metrics, only a small number of subtasks have a large "alignment buffered/duration", and looks like either all the 2 slots on the same TM are both high or both low.  What may probably cause this?
  1. maybe data skew, but I see the amount of data is almost same
  2. or network?
  3. The system is under back pressure, but I don't understand why only like 4 out of 80 subtasks perform like this.
Another question is about the alignment buffer, I thought it was only used for multiple input stream cases. But for keyed process function , what is actually aligned?
The last question is about tuning rocksdb, I try to assign some memory to writebuffer and block cache, and the doc says "typically by decreasing the JVM heap size of the TaskManagers by the same amount" , and taskmanager heap size is "On YARN setups, this value is automatically configured to the size of the TaskManager's YARN container, minus a certain tolerance value." This looks like I should decrease the taskmanager heap and the value is set by YARN automatically, so what should I do?

This communication may contain privileged or other confidential information of Red. If you have received it in error, please advise the sender by reply e-mail and immediately delete the message and any attachments without copying or disclosing the contents. Thank you.

Attachment: WX20180813-001831.png
Description: WX20180813-001831.png