答复: 答复: How to know the DAG is starting to run
A dedicated task at front could solve this pipeline level environment setup.
# About the "pipeline environment setup"
For my case I am trying to expose some pipeline variables by XCOM that tasks could get this when running, so that I want to this "expose some pipeline variables" could be done only once in pipeline level.
A task could do it but something like this is better built-in into the pipeline.
发件人: Jiening Wen <jieningwen@xxxxxxxxxxx>
发送时间: 2018年5月14日 11:13
主题: RE: 答复: How to know the DAG is starting to run
I would question that hooking into DAG.run is "more gracefully" than having a root task node that does the pipeline environment setup.
IMO it'd be easier and cleaner to catch setup errors when it's done in a separate task.
From: Song Liu [mailto:songliu@xxxxxxxxxxx]
Sent: Saturday 12 May 2018 9:06 AM
Subject: 答复: 答复: How to know the DAG is starting to run [External]
Yes, I want to know the event about the creation of a DagRun.
发件人: crispy16@xxxxxxxxx <crispy16@xxxxxxxxx> 代表 Chris Palmer <chris@xxxxxxxxxxxx>
发送时间: 2018年5月11日 15:46
主题: Re: 答复: How to know the DAG is starting to run
It's not even clear to me what it means for a DAG to start running. The creation of a DagRun for a specific execution date is completely independent of the scheduling of any TaskInstances for that DagRun. There could be a significant delay between those two events, either deliberately encoded into the DAG or due to resource constraints.
What event are you actually interested in knowing about? The creation of a DagRun? The starting of any task for a DagRun? Something else?
Maybe if you provided more details on what exactly the "pipeline environment setup" you are trying to do, it would help others understand the problem you are trying to solve.
On Fri, May 11, 2018 at 10:59 AM, Song Liu <songliu@xxxxxxxxxxx> wrote:
> Overriding the "DAG.run" sounds like a workaround, so that if it's
> running a first operation of DAG then do some setup etc.
> 发件人: Victor Noagbodji <vnoagbodji@xxxxxxxxxxxxxxxxxx>
> 发送时间: 2018年5月11日 12:50
> 收件人: dev@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
> 主题: Re: How to know the DAG is starting to run
> I don't know if airflow has a concept of DAG-level events or callbacks.
> (Operators do have callbacks though.). You might get away with
> subclassing the DAG class or having a class decorator.
> The source suggests that ".run()" is the method you want to override.
> You may want to call the original "super().run()" then do what you
> need to do afterwards.
> Let's see if that works for you.
> > On May 11, 2018, at 8:26 AM, Song Liu <songliu@xxxxxxxxxxx> wrote:
> > Yes, I have though this approach, but more elegant way is doing in
> > the
> DAG since we don't want to add this "pipeline environment setup" as a
> single operator, which should be done in the DAG more gracefully.
> > ________________________________
> > 发件人: James Meickle <jmeickle@xxxxxxxxxxxxxx>
> > 发送时间: 2018年5月11日 12:09
> > 收件人: dev@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
> > 主题: Re: How to know the DAG is starting to run
> > Song:
> > You can put an operator as the very first node in the DAG, and have
> > everything else in the DAG depend on it. For example, this is the
> > we use to only execute DAG tasks on stock market trading days.
> > -James M.
> > On Fri, May 11, 2018 at 3:57 AM, Song Liu <songliu@xxxxxxxxxxx> wrote:
> >> Hi,
> >> I have something just want to be done only once when DAG is
> >> constructed, but it seems that DAG will be instanced every time
> >> when run each of operator.
> >> So is that there function in DAG that tell us it is starting to run
> >> now
> >> Thanks,
> >> Song