OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Convert Dag Run from Backfill to Scheduled?


Well I’ve gone ahead and run the UPDATE query now, so the scheduler is picking up tasks.

When I cleared the tasks, every DAG run that had a cleared task in it was set to running. Because I’d backfilled them all they were all `backfill_` dag runs.  Inspection of various tasks via `task_failed_deps` indicated the tasks had all their dependencies filled. After running the update query, they’re all `scheduled__` dag runs.

On May 29, 2018, 5:02 PM -0700, Maxime Beauchemin <maximebeauchemin@xxxxxxxxx>, wrote:
> While this may work it's clearly not the prescribed way to do this.
> Clearing should just work.
>
> I'm trying to understand why the scheduler is not picking up the cleared
> task. Clearing should remove the task instance state and set the state of
> the related DAG Run to running so that the scheduler picks those up.
> Perhaps there's a conflict between the backfill and scheduler-related DAG
> Runs? Which DAG runs are set to running? The backfill or scheduler-related
> ones?
>
> Originally when I introduced DAG runs, backfill was operating without any
> consideration related to DAG runs (DAG runs were a scheduler-specific
> construct), later on Bolke added backfill-specific DAG runs and I'm not
> 100% sure how that works.
>
> Let's get to the bottom of this.
>
> Max
>
> On Fri, May 25, 2018 at 7:48 PM Ruiqin Yang <yrqls21@xxxxxxxxx> wrote:
>
> > If you are sure the update query targets the desired rows, the behavior
> > should be the same.
> >
> > Scott Halgrim <scott.halgrim@xxxxxxxxxx.invalid>于2018年5月25日 周五下午4:23写道:
> >
> > > So far no ill effects from:
> > >
> > > update dag_run
> > > set run_id = concat('scheduled__', substring(run_id, 10, 19))
> > > where dag_id = 'daily'
> > > and execution_date > '2017-08-31' and execution_date < '2018-01-11'
> > > and run_id like 'backfill_%'
> > > order by execution_date;
> > >
> > > On May 25, 2018, 4:03 PM -0700, Scott Halgrim <scott.halgrim@xxxxxxxxxx
> > > ,
> > > wrote:
> > > > Oh wow, that will work? Thanks! Is there any reason for me not to just
> > > run a mass UPDATE on those dag runs directly in the metadata database?
> > > >
> > > > On May 25, 2018, 4:01 PM -0700, Ruiqin Yang <yrqls21@xxxxxxxxx>,
> > wrote:
> > > > > Airflow is not going to schedule backfill DAG runs, by looking at the
> > > dag
> > > > > run ID (which will start by 'backfill__'). If you want the scheduler
> > to
> > > > > schedule those tasks, you can click the DAG run and edit its name
> > back
> > > to
> > > > > 'scheduled__<something>'
> > > > >
> > > > > Cheers,
> > > > > Kevin Y
> > > > >
> > > > > On Fri, May 25, 2018 at 3:53 PM, Scott Halgrim <
> > > > > scott.halgrim@xxxxxxxxxx.invalid> wrote:
> > > > >
> > > > > > I’ve got four months of dag runs that were scheduled dag runs,
> > then I
> > > > > > backfilled them. And now when I clear a task from one of those the
> > > dag run
> > > > > > goes to “running,” but none of the tasks get scheduled (unless I
> > > manually
> > > > > > backfill each of them)
> > > > > >
> > > > > > What I really should have done here was just cleared a mid-dag task
> > > as
> > > > > > well as all downstream tasks for these dag runs, but, well, now I’m
> > > here
> > > > > > and I’m wondering what the best way to fix this.
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > >
> > >
> >