Re: Airflow - High Availability and Scale Up vs Scale Out
Once you solve DAG deployments and container orchestration, the
celeryExecutor becomes more interesting. We solve DAG deployments by
putting the DAG code into the container at build time and trigger image
updates on our Kubernetes cluster via webhooks with a private Docker
registry. We are currently using the CeleryExecutor to scale out vs up, but
have begun to explore the KubernetesExecutor to further simplify our stack.
It seems like you would want to go the route of separating concerns as well
if you want to move towards HA. More generally though, I don't think that
HA could be achieved with the current scheduler architecture requiring that
there is only one scheduler running at a time. The next best thing though
is to put the scheduler into container orchestration that will restart it
immediately on failure at which point it will continue to schedule work
where it left off.
- Andy Cooper
On Fri, Jun 8, 2018 at 7:24 AM Sam Sen <sxs@xxxxxxxxxxxxxxxx> wrote:
> We are facing this now. We have tried the celeryexecutor and it adds more
> moving parts. While we have no thrown out this idea, we are going to give
> one big beefy box a try.
> To handle the HA side of things, we are putting the server in an
> auto-scaling group (we use AWS) with a min and Max of 1 server. We deploy
> from an AMI that has airflow baked in and we point the DB config to an RDS
> using service discovery (consul).
> As for the dag code, we can either bake it into the AMI as well or install
> it on bootup. We haven't decided what to do for this but either way, we
> realize it could take a few minutes to fully recover in the event of a
> The other option is to have a standby server if using celery isn't ideal.
> With that, I have tried using Hashicorp nomad to handle the services. In my
> limited trial, it did what we wanted but we need more time to test.
> On Fri, Jun 8, 2018, 4:23 AM Naik Kaxil <k.naik@xxxxxxxxx> wrote:
> > Hi guys,
> > I have 2 specific questions for the guys using Airflow in production?
> > 1. How have you achieved High availability? How does the architecture
> > look like? Do you replicate the master node as well?
> > 2. Scale Up vs Scale Out?
> > 1. What is the preferred approach you take? 1 beefy Airflow VM with
> > Worker, Scheduler and Webserver using Local Executor or a cluster
> > multiple workers using Celery Executor.
> > I think this thread should help others as well with similar question.
> > Regards,
> > Kaxil
> > Kaxil Naik
> > Data Reply
> > 2nd Floor, Nova South
> > 160 Victoria Street, Westminster
> > London SW1E 5LB - UK
> > phone: +44 (0)20 7730 6000 <+44%2020%207730%206000>
> > k.naik@xxxxxxxxx
> > www.reply.com
> > [image: Data Reply]