Re: Making Airflow Fault-Tolerant when running Airflow on Kubernetes
Thanks for the reply!
No we haven't looked too deeply into it. Can you elaborate a bit on how
that works? With the KubernetesExecutor, if a DAG is in flight and part of
airflow go down, it will be able to recover? How do airflow workers
reconnect to Pods that were in flight?
On Wed, Sep 12, 2018 at 4:59 PM Daniel Imberman <daniel.imberman@xxxxxxxxx>
> Hi Kevin,
> Have you looked into the KubernetesExecutor? We achieve fault tolerance
> using the kubernetes resourceVersion to ensure that all state is
> On Wed, Sep 12, 2018 at 1:08 PM Kevin Lam <kevin@xxxxxxxxxxxxxxx> wrote:
> > Hi all,
> > We currently run Airflow as a Deployment in a kubernetes cluster. We also
> > use a variant of KubernetesOperator to run our DAGs.
> > We are investigating how to best make Airflow fault-tolerant, in part,
> > to investigating the use of preemptible vms . *Has there been much
> > discussion about about how to deploy Airflow in a fault-tolerant way? Are
> > there any best practices? Ideally we'd like our kubernetes-hosted Airflow
> > to support rolling updates for Docker image updates and also recover from
> > components (worker, scheduler, web) going down temporarily, including
> > DAGs are in flight. *
> > Any advice, ideas and/or feedback appreciated!
> >