osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Will redeploying webserver and scheduler in Kubernetes cluster kill running tasks


Interesting, Greg. Do you know if using pg_bouncer would allow you to have
more than 100 running k8s executor tasks at one time if e.g. there is a 100
connection limit on gcp instance?

On Thu, Aug 30, 2018 at 6:39 PM Greg Neiheisel <greg@xxxxxxxxxxxxx> wrote:

> Good point Eamon, maxing connections out is definitely something to look
> out for. We recently added pgbouncer to our helm charts to pool connections
> to the database for all the different airflow processes. Here's our chart
> for reference -
>
> https://github.com/astronomerio/helm.astronomer.io/tree/master/charts/airflow
>
> On Thu, Aug 30, 2018 at 1:17 PM Kyle Hamlin <hamlin.kn@xxxxxxxxx> wrote:
>
> > Thanks for your responses! Glad to hear that tasks can run independently
> if
> > something happens.
> >
> > On Thu, Aug 30, 2018 at 1:13 PM Eamon Keane <eamon.keane1@xxxxxxxxx>
> > wrote:
> >
> > > Adding to Greg's point, if you're using the k8s executor and for some
> > > reason the k8s executor worker pod fails to launch within 120 seconds
> > (e.g.
> > > pending due to scaling up a new node), this counts as a task failure.
> > Also,
> > > if the k8s executor pod has already launched a pod operator but is
> killed
> > > (e.g. manually or due to node upgrade), the  pod operator it launched
> is
> > > not killed and runs to completion so if using retries, you need to
> ensure
> > > idempotency. The worker pods update the db per my understanding, with
> > each
> > > requiring a separate connection to the db, this can tax your connection
> > > budget (100-300 for small postgres instances on gcp or aws).
> > >
> > > On Thu, Aug 30, 2018 at 6:04 PM Greg Neiheisel <greg@xxxxxxxxxxxxx>
> > wrote:
> > >
> > > > Hey Kyle, the task pods will continue to run even if you reboot the
> > > > scheduler and webserver and the status does get updated in the
> airflow
> > > db,
> > > > which is great.
> > > >
> > > > I know the scheduler subscribes to the Kubernetes watch API to get an
> > > event
> > > > stream of pods completing and it keeps a checkpoint so it can
> > resubscribe
> > > > when it comes back up.
> > > >
> > > > I forget if the worker pods update the db or if the scheduler is
> doing
> > > > that, but it should work out.
> > > >
> > > > On Thu, Aug 30, 2018, 9:54 AM Kyle Hamlin <hamlin.kn@xxxxxxxxx>
> wrote:
> > > >
> > > > > gentle bump
> > > > >
> > > > > On Wed, Aug 22, 2018 at 5:12 PM Kyle Hamlin <hamlin.kn@xxxxxxxxx>
> > > wrote:
> > > > >
> > > > > > I'm about to make the switch to Kubernetes with Airflow, but am
> > > > wondering
> > > > > > what happens when my CI/CD pipeline redeploys the webserver and
> > > > scheduler
> > > > > > and there are still long-running tasks (pods). My intuition is
> that
> > > > since
> > > > > > the database hold all state and the tasks are in charge of
> updating
> > > > their
> > > > > > own state, and the UI only renders what it sees in the database
> > that
> > > > this
> > > > > > is not so much of a problem. To be sure, however, here are my
> > > > questions:
> > > > > >
> > > > > > Will task pods continue to run?
> > > > > > Can task pods continue to poll the external system they are
> running
> > > > tasks
> > > > > > on while being "headless"?
> > > > > > Can the tasks pods change/update state in the database while
> being
> > > > > > "headless"?
> > > > > > Will the UI/Scheduler still be aware of the tasks (pods) once
> they
> > > are
> > > > > > live again?
> > > > > >
> > > > > > Is there anything else the might cause issues when deploying
> while
> > > > tasks
> > > > > > (pods) are running that I'm not thinking of here?
> > > > > >
> > > > > > Kyle Hamlin
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Kyle Hamlin
> > > > >
> > > >
> > >
> >
> >
> > --
> > Kyle Hamlin
> >
>
>
> --
> *Greg Neiheisel* / CTO Astronomer.io
>