[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Guidelines around how to scale with worker nodes?

Hi everyone,

I'm occasionally observing tasks stuck in "queued" for a long time despite
trying various edits of parameter values in airflow.cfg and I'm guessing it
would help to increase the number of worker nodes (right now I have one
worker node).

Are there any guidelines for:
1. How to determine if the # of worker nodes is indeed the bottleneck
causing tasks to be stuck in "queued" ? It doesn't seem the memory/CPU
usage on the worker node is close to 100%.
2. How to determine the optimal number and CPU/memory specs of the worker
nodes if I want to be able to handle X simultaneous tasks without them
getting stuck in "queued" ?
I'm using CeleryExecutor + RabbitMQ on EC2.

Jerry Chi ジェリー・チー | Data Science Manager | +81-70-2668-5491 | LINE/Skype:
peacej | 카톡: peacej2 | WeChat: jerrychijerry