OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Airflow - High Availability and Scale Up vs Scale Out


Wouldn't you want immutable containers, hence, baking in the code in the
container would be more ideal?

On Sun, Jun 10, 2018, 9:53 AM Arash Soheili <tonyarash@xxxxxxxxx> wrote:

> We are just starting out but our setup is 2 EC2 with one running the web
> server and scheduler and the other having multiple workers. The database is
> an RDS which both are connected to as well as Redis on AWS elastic cache
> for the Celery connection.
>
> All 4 services are run in containers with systemd and we use CodeDeploy and
> sync up the code by mapping volumes from local file to the container. We
> are not yet heavy users of Airflow so I can't speak to performance and
> scale up just yet.
>
> In general I think an AMI with baked in code can be brittle and hard to
> maintain and update. Container is the way to go as you can bake in the code
> in the image if you want. We have chosen not to do that and rely on volume
> mapping to update the latest code in the container. This makes it easier
> that you don't need to keep creating new images.
>
> Arash
>
> On Sat, Jun 9, 2018 at 9:47 AM Naik Kaxil <k.naik@xxxxxxxxx> wrote:
>
> > Let us know after trying the beefy box approach about your findings.
> >
> > On 08/06/2018, 12:24, "Sam Sen" <sxs@xxxxxxxxxxxxxxxx> wrote:
> >
> >     We are facing this now. We have tried the celeryexecutor and it adds
> > more
> >     moving parts. While we have no thrown out this idea, we are going to
> > give
> >     one big beefy box a try.
> >
> >     To handle the HA side of things, we are putting the server in an
> >     auto-scaling group (we use AWS) with a min and Max of 1 server. We
> > deploy
> >     from an AMI that has airflow baked in and we point the DB config to
> an
> > RDS
> >     using service discovery (consul).
> >
> >     As for the dag code, we can either bake it into the AMI as well or
> > install
> >     it on bootup. We haven't decided what to do for this but either way,
> we
> >     realize it could take a few minutes to fully recover in the event of
> a
> >     catastrophe.
> >
> >     The other option is to have a standby server if using celery isn't
> > ideal.
> >     With that, I have tried using Hashicorp nomad to handle the services.
> > In my
> >     limited trial, it did what we wanted but we need more time to test.
> >
> >     On Fri, Jun 8, 2018, 4:23 AM Naik Kaxil <k.naik@xxxxxxxxx> wrote:
> >
> >     > Hi guys,
> >     >
> >     >
> >     >
> >     > I have 2 specific questions for the guys using Airflow in
> production?
> >     >
> >     >
> >     >
> >     >    1. How have you achieved High availability? How does the
> > architecture
> >     >    look like? Do you replicate the master node as well?
> >     >    2. Scale Up vs Scale Out?
> >     >       1. What is the preferred approach you take? 1 beefy Airflow
> VM
> > with
> >     >       Worker, Scheduler and Webserver using Local Executor or a
> > cluster with
> >     >       multiple workers using Celery Executor.
> >     >
> >     >
> >     >
> >     > I think this thread should help others as well with similar
> question.
> >     >
> >     >
> >     >
> >     >
> >     >
> >     > Regards,
> >     >
> >     > Kaxil
> >     >
> >     >
> >     >
> >     >
> >     > Kaxil Naik
> >     >
> >     > Data Reply
> >     > 2nd Floor, Nova South
> >     > 160 Victoria Street, Westminster
> >     > London SW1E 5LB - UK
> >     > phone: +44 (0)20 7730 6000 <+44%2020%207730%206000>
> >     > k.naik@xxxxxxxxx
> >     > www.reply.com
> >     >
> >     > [image: Data Reply]
> >     >
> >
> >
> >
> >
> >
> >
> > Kaxil Naik
> >
> > Data Reply
> > 2nd Floor, Nova South
> > 160 Victoria Street, Westminster
> > London SW1E 5LB - UK
> > phone: +44 (0)20 7730 6000 <+44%2020%207730%206000>
> > k.naik@xxxxxxxxx
> > www.reply.com
> >
>