OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Best Practice of Airflow Setting-Up & Usage


Hi folks,

May you kindly share how your organization is setting up Airflow and using
it? Especially in terms of architecture. For example,

- *Setting-Up*: Do you install Airflow in a "one-time" fashion, or
containerization fashion?
- *Executor:* Which executor are you using (*LocalExecutor*,
*CeleryExecutor*, etc)? I believe most production environments are using
*CeleryExecutor*?
- *Scale*: If using Celery, normally how many worker nodes do you add? (for
sure this is up to workloads and performance of your worker nodes).
- *Queue*: if Queue feature
<https://airflow.apache.org/concepts.html#queues> is used in your
architecture? For what advantage? (for example, explicitly assign
network-bound tasks to a worker node whose parallelism can be much higher
than its # of cores)
- *SLA*: do you have any SLA for your scheduling? (this is inspired by
@yrqls21's PR 3830 <https://github.com/apache/incubator-airflow/pull/3830>)
- etc.

Airflow's setting-up can be quite flexible, but I believe there is some
sort of best practice, especially in the organisations where scalability is
essential.

Thanks for sharing in advance!


Best regards,
XD