[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Best Practice of Airflow Setting-Up & Usage

Thanks for sharing, Raman.

Based on what you shared, I think there are two points that may be worth
further discussing/thinking.

*Scaling up (given thousands of DAGs):*
If you have thousands of DAGs, you may encounter longer scheduling latency
(actual start time minus planned start time).
For workers, we can scale horizontally by adding more worker nodes, which
is relatively straightforward.
But *Scheduler* may become another bottleneck.Scheduler can only be running
on one node (please correct me if I'm wrong). Even if we can use multiple
threads for it, it has its limit. HA is another concern. This is also what
our team is looking into at this moment, since scheduler is the biggest
"bottleneck" identified by us so far (anyone has experience tuning
scheduler performance?).

*Broker for Celery Executor*:
you may want to try RabbitMQ rather than Redis/SQL as broker? Actually the
Celery community had the proposal to deprecate Redis as broker (of course
this proposal was rejected eventually) [


On Thu, Sep 6, 2018 at 6:10 PM ramandumcs@xxxxxxxxx <ramandumcs@xxxxxxxxx>

> Hi,
> We have a requirement to scale to run 1000(s) concurrent dags. With celery
> executor we observed that
> Airflow worker gets stuck sometimes if connection to redis/mysql breaks
> (https://github.com/celery/celery/issues/3932
> https://github.com/celery/celery/issues/4457)
> Currently we are using Airflow 1.9 with LocalExecutor but planning to
> switch to Airflow 1.10 with K8 Executor.
> Thanks,
> Raman Gupta
> On 2018/09/05 12:56:38, Deng Xiaodong <xd.deng.r@xxxxxxxxx> wrote:
> > Hi folks,
> >
> > May you kindly share how your organization is setting up Airflow and
> using
> > it? Especially in terms of architecture. For example,
> >
> > - *Setting-Up*: Do you install Airflow in a "one-time" fashion, or
> > containerization fashion?
> > - *Executor:* Which executor are you using (*LocalExecutor*,
> > *CeleryExecutor*, etc)? I believe most production environments are using
> > *CeleryExecutor*?
> > - *Scale*: If using Celery, normally how many worker nodes do you add?
> (for
> > sure this is up to workloads and performance of your worker nodes).
> > - *Queue*: if Queue feature
> > <https://airflow.apache.org/concepts.html#queues> is used in your
> > architecture? For what advantage? (for example, explicitly assign
> > network-bound tasks to a worker node whose parallelism can be much higher
> > than its # of cores)
> > - *SLA*: do you have any SLA for your scheduling? (this is inspired by
> > @yrqls21's PR 3830 <
> https://github.com/apache/incubator-airflow/pull/3830>)
> > - etc.
> >
> > Airflow's setting-up can be quite flexible, but I believe there is some
> > sort of best practice, especially in the organisations where scalability
> is
> > essential.
> >
> > Thanks for sharing in advance!
> >
> >
> > Best regards,
> > XD
> >