[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Making Airflow Fault-Tolerant when running Airflow on Kubernetes


Hi Kevin,

Have you looked into the KubernetesExecutor? We achieve fault tolerance
using the kubernetes resourceVersion to ensure that all state is
reproducible.

On Wed, Sep 12, 2018 at 1:08 PM Kevin Lam <kevin@xxxxxxxxxxxxxxx> wrote:

> Hi all,
>
> We currently run Airflow as a Deployment in a kubernetes cluster. We also
> use a variant of KubernetesOperator to run our DAGs.
>
> We are investigating how to best make Airflow fault-tolerant, in part, due
> to investigating the use of preemptible vms [1]. *Has there been much
> discussion about about how to deploy Airflow in a fault-tolerant way? Are
> there any best practices? Ideally we'd like our kubernetes-hosted Airflow
> to support rolling updates for Docker image updates and also recover from
> components (worker, scheduler, web) going down temporarily, including when
> DAGs are in flight. *
>
> Any advice, ideas and/or feedback appreciated!
>
> [1] https://cloud.google.com/kubernetes-engine/docs/how-to/preemptible-vms
>