OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: The need for LocalTaskJob


Let me confirm I'm understanding this right, we're talking specifically
about the CeleryExecutor not starting and `airflow run` (not --raw)
command, and fire up a LocalTaskJob instead? Then we'd still have the
worker fire up the `airflow run --raw` command?

Seems reasonable. One thing to keep in mind is the fact that shelling out
guarantees no `sys.module` caching, which is a real issue for slowly
changing DAG definitions. That's the reason why we'd have to reboot the
scheduler periodically before it used sub-processes to evaluate DAGs. Any
code that needs to evaluate a DAG should probably be done in a subprocess.

Shelling out also allows for doing things like unix impersonation and
applying CGROUPS. This currently happens between `airflow run` and `airflow
run --raw`. The parent process also does heartbeat and listen for external
kill signal (kill pills).

I think what we want is smarter executors and only one level of bash
command: the `airflow run --raw`, and ideally the system that fires this up
is not Airflow itself, and cannot be DAG-aware (or it will need to get
restarted to flush the cache).

To me that really brings up the whole question of what should be handled by
the Executor, and what belongs in core Airflow. The Executor needs to do
more, and Airflow core less.

When you think about how this should all work on Kubernetes, it looks
something like this:
* the scheduler, through KubeExecutor, calls the k8s API, tells it to fire
up and Airflow task
* container boots up and starts an `airflow run --raw` command
* k8s handles heartbeats, monitors tasks, knows how to kill a running task
* the scheduler process (call it supervisor), talks with k8s through
KubeExecutor
and handles zombie cleanup and sending kill pills

Now because Celery doesn't offer as many guarantees it gets a bit more
tricky. Is there even a way to send a kill pill through Celery? Are there
other ways than using a parent process to accomplish this?

At a higher level, it seems like we need to move more logic from core
Airflow into the executors. For instance, the heartbeat construct should
probably be 100% handled by the executor, and not an assumption in the core
code base.

I think I drifted a bit, hopefully that's still helpful.

Max

On Sat, Aug 4, 2018 at 6:42 AM Dan Davydov <ddavydov@xxxxxxxxxxx.invalid>
wrote:

> Alex (cc'd) brought this up to me about this a while ago too, and I agreed
> with him. It is definitely something we should do, I remember there were
> some things that were a bit tricky about removing the intermediate process
> and would be a bit of work to fix (something about the tasks needing to
> heartbeat the parent process maybe?).
>
> TLDR: No blockers from me, just might be a bit of work to implement.
>
>
> On Sat, Aug 4, 2018 at 9:15 AM Bolke de Bruin <bdbruin@xxxxxxxxx> wrote:
>
> > Hi Max, Dan et al,
> >
> > Currently, when a scheduled task runs this happens in three steps:
> >
> > 1. Worker
> > 2. LocalTaskJob
> > 3. Raw task instance
> >
> > It uses (by default) 5 (!) different processes:
> >
> > 1. Worker
> > 2. Bash + Airflow
> > 3. Bash + Airflow
> >
> > I think we can merge worker and LocalTaskJob as the latter seems exist
> > only to track a particular task. This can be done within the worker
> without
> > side effects. Next to thatI think we can limit the amount of (airflow)
> > processes to 2 if we remove the bash dependency. I don’t see any reason
> to
> > depend on bash.
> >
> > Can you guys shed some light on what the thoughts were around those
> > choices? Am I missing anything on why they should exist?
> >
> > Cheers
> > Bolke
> >
> > Verstuurd vanaf mijn iPad
>