[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Guidelines around how to scale with worker nodes?

Hi Jerry,
This may require you to profile the tasks running on your machines
yourself--so you can get an idea how much computation resources your tasks
are consuming. Generally # of worker nodes is the root cause for tasks
gettign stuck in QUEUED state. You can verify that by comparing the # of
running tasks with the number of worker nodes you have to see if all
workers are busy. Additionally you can check out CgroupTaskRunner, which
would allow you to run tasks in cgroups and thus make it possible to run
multiple tasks on one machine.

Kevin Y

On Sun, Oct 28, 2018 at 5:41 PM Jerry Chi <jerry.chi@xxxxxxxxxxxxx> wrote:

> Sorry, any tips or hints related the below questions? Thank you.
> Jerry
> 2018年10月24日(水) 3:37、Jerry Chi さん(jerry.chi@xxxxxxxxxxxxx)のメッセージ:
> > Hi everyone,
> >
> > I'm occasionally observing tasks stuck in "queued" for a long time
> despite
> > trying various edits of parameter values in airflow.cfg and I'm guessing
> it
> > would help to increase the number of worker nodes (right now I have one
> > worker node).
> >
> > Are there any guidelines for:
> > 1. How to determine if the # of worker nodes is indeed the bottleneck
> > causing tasks to be stuck in "queued" ? It doesn't seem the memory/CPU
> > usage on the worker node is close to 100%.
> > 2. How to determine the optimal number and CPU/memory specs of the worker
> > nodes if I want to be able to handle X simultaneous tasks without them
> > getting stuck in "queued" ?
> > I'm using CeleryExecutor + RabbitMQ on EC2.
> >
> > Thanks~
> > Jerry Chi ジェリー・チー | Data Science Manager | +81-70-2668-5491 | LINE/Skype:
> > peacej | 카톡: peacej2 | WeChat: jerrychijerry
> >