Re: airflow.exceptions.AirflowException dag_id not found
One more thing is if one of your worker has a missing dependency required
for a specific DAG. For example you read configuration from zookeeper in
the DAG file, but only one worker is missing the Zookeeper client python
lib, but the scheduler has the lib. You can imagine that the scheduler will
send the job over to the worker, and the worker can't interpret the DAG
On Mon, Jun 11, 2018 at 3:22 PM Stephane Bonneaud <stephane@xxxxxxxxxxxxxxx>
> Thank you for the quick response, that is very helpful and great material
> for my investigations!
> Thanks again,
> > On Jun 11, 2018, at 3:11 PM, Maxime Beauchemin <
> maximebeauchemin@xxxxxxxxx> wrote:
> > DagBag import timeouts happen when people do more than just
> > as code" in their module scope (say doing actual compute in module scope,
> > which is a no-no). They may also happen if you read things from flimsy
> > external systems that may introduce delays. Say you read pipeline
> > configuration from Zookeeper or from a database or network drive and
> > somehow that operation is timing out.
> > Also with Airflow (at the moment) you are responsible to synchronize the
> > pipeline definitions (DAGS_FOLDER) on all machines across the cluster. If
> > they are not in sync you'll have problems with symptoms that may look
> > "dag_id not found". That happens when the scheduler is aware of DAGs that
> > workers may not be aware of.
> > Max
> > On Mon, Jun 11, 2018 at 12:42 PM Stephane Bonneaud <
> > wrote:
> >> Hi there,
> >> We’re using Airflow in our startup and it’s been great in many ways,
> >> thanks for the work you guys are doing!
> >> Unfortunately, we’re hitting a bunch of issues with ops timing out, DAGs
> >> failing for unclear reasons, with no logs or the following error:
> >> "airflow.exceptions.AirflowException: dag_id could not be found”. This
> >> seems to happen when enough DAGs are running at the same time, though it
> >> can also happen more rarely here and there. But, the best way to
> >> the error with our setup is to run enough DAGs at once. Most of the
> >> clearing the DAG run or ops that have failed and letting the DAG re-run
> >> enough to fix the problem.
> >> I found resources pointing to the dagbag_import_timeout, e.g.,
> >> <
> >>> .
> >> I did play with that parameter, and other parameters as well. And it
> >> seem that they help, i.e., I can run more DAGs at once, but
> >> (1) if I run enough DAGs at once, I still see ops and DAGs
> >> failing, so the problem is not fixed ;
> >> (2) more importantly, I don’t fully understand the problem. I
> >> some ideas on what is happening, but maybe I’m totally wrong?
> >> Any recommendations on how I should investigate that?
> >> Thank you very much!
> >> Have a nice rest of the day,
> >> Stéphane
> >> http://stephanebonneaud.com <http://stephanebonneaud.com/>