osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Benchmarking of Airflow Scheduler with Celery Executor


If you're concerned about scheduler scalability I'd go with a bigger box.
The scheduler uses multiprocessing so more CPU power means more throughput.

Also you may want to provision a beefy MySQL box to make sure that doesn't
become the bottleneck. 10k tasks heartbeating to the DB every 30 seconds is
significant load.

Perhaps Airbnb folks chime in about their scale and hardware setup?

Max

On Fri, Apr 13, 2018 at 9:14 AM, ramandumcs@xxxxxxxxx <ramandumcs@xxxxxxxxx>
wrote:

> Thanks Ry,
> Just wondering if there is any approximate number on concurrent tasks a
> scheduler can run on say 16 GB RAM and 8 core machine.
> If its already been done that would be useful.
> We did some benchmarking with local executor and observed that each
> TaskInstance was taking ~100MB of memory so we could only run ~130
> concurrent tasks on 16 GB RAM and 8 core machine.
>
> -Raman Gupta
>
>
>
> On 2018/04/12 16:32:37, Ry Walker <ry@xxxxxxxxxxxxx> wrote:
> > Hi Raman -
> >
> > First, we’d be happy to help you test this out with Airflow. Or you could
> > do it yourself by using http://open.astronomer.io/airflow/ (w/ Docker
> > Engine + Docker Compose) to quickly spin up a test environment.
> Everything
> > is hooked to Prometheus/Grafana to monitor how the system reacts to your
> > workload.
> >
> > -Ry
> > CEO, Astronomer
> >
> > On April 12, 2018 at 12:23:46 PM, ramandumcs@xxxxxxxxx (
> ramandumcs@xxxxxxxxx)
> > wrote:
> >
> > Hi All,
> > We have requirement to run 10k(s) of concurrent tasks. We are exploring
> > Airflow's Celery Executor for same. Horizontally Scaling of worker nodes
> > seem possible but it can only have one active scheduler.
> > So will Airflow scheduler be able to handle these many concurrent tasks.
> > Is there any benchmarking number around airflow scheduler's scalability.
> > Thanks,
> > Raman
> >
>