osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Is `airflow backfill` disfunctional?


@Jeremiah Lowin <jlowin@xxxxxxxxx> & @Bolke de Bruin <bdbruin@xxxxxxxxx> I
think you may have some context on why this may have changed at some point.
I'm assuming that when DagRun handling was added to the backfill logic, the
behavior just happened to change to what it is now.

Any opposition in moving back towards re-running failed tasks when starting
a backfill? I think it's a better behavior, though it's a change in
behavior that we should mention in UPDATE.md.

One of our goals is to make sure that a failed or killed backfill can be
restarted and just seamlessly pick up where it left off.

Max

On Tue, Jun 5, 2018 at 3:25 PM Tao Feng <fengtao04@xxxxxxxxx> wrote:

> After discussing with Max, we think it would be great if `airflow backfill`
> could be able to auto pick up and rerun those failed tasks. Currently, it
> will throw exceptions(
>
> https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L2489
> )
> without rerunning the failed tasks.
>
> But since it broke some of the previous assumptions for backfill, we would
> like to get some feedback and see if anyone has any concerns(pr could be
> found at https://github.com/apache/incubator-airflow/pull/3464/files).
>
> Thanks,
> -Tao
>
> On Thu, May 24, 2018 at 10:26 AM, Maxime Beauchemin <
> maximebeauchemin@xxxxxxxxx> wrote:
>
> > So I'm running a backfill for what feels like the first time in years
> using
> > a simple `airflow backfill --local` commands.
> >
> > First I start getting a ton of `logging.info` of each tasks that cannot
> be
> > started just yet at every tick flooding my terminal with the keyword
> > `FAILED` in it, looking like a million of lines like this one:
> >
> > [2018-05-24 14:33:07,852] {models.py:1123} INFO - Dependencies not met
> for
> > <TaskInstance: some_dag.some_task_id 2018-01-28 00:00:00 [scheduled]>,
> > dependency 'Trigger Rule' FAILED: Task's trigger rule 'all_success' re
> > quires all upstream tasks to have succeeded, but found 1 non-success(es).
> > upstream_tasks_state={'successes': 0L, 'failed': 0L, 'upstream_failed':
> > 0L,
> > 'skipped': 0L, 'done': 0L}, upstream_task_ids=['some_other_task_id']
> >
> > Good thing I triggered 1 month and not 2 years like I actually need, just
> > the logs here would be "big data". Now I'm unclear whether there's
> anything
> > actually running or if I did something wrong, so I decide to kill the
> > process so I can set a smaller date range and get a better picture of
> > what's up.
> >
> > I check my logging level, am I in DEBUG? Nope. Just INFO. So I take a
> note
> > that I'll need to find that log-flooding line and demote it to DEBUG in a
> > quick PR, no biggy.
> >
> > Now I restart with just a single schedule, and get an error `Dag
> {some_dag}
> > has reached maximum amount of 3 dag runs`. Hmmm, I wish backfill could
> just
> > pickup where it left off. Maybe I need to run an `airflow clear` command
> > and restart? Ok, ran my clear command, same error is showing up. Dead
> end.
> >
> > Maybe there is some new `airflow clear --reset-dagruns` option? Doesn't
> > look like it... Maybe `airflow backfill` has some new switches to pick up
> > where it left off? Can't find it. Am I supposed to clear the DAG Runs
> > manually in the UI?  This is a pre-production, in-development DAG, so
> it's
> > not on the production web server. Am I supposed to fire up my own web
> > server to go and manually handle the backfill-related DAG Runs? Cannot to
> > my staging MySQL and do manually clear some DAG runs?
> >
> > So. Fire up a web server, navigate to my dag_id, delete the DAG runs, it
> > appears I can finally start over.
> >
> > Next thought was: "Alright looks like I need to go Linus on the mailing
> > list".
> >
> > What am I missing? I'm really hoping these issues specific to 1.8.2!
> >
> > Backfilling is core to Airflow and should work very well. I want to
> restate
> > some reqs for Airflow backfill:
> > * when failing / interrupted, it should seamlessly be able to pickup
> where
> > it left off
> > * terminal logging at the INFO level should be a clear, human consumable,
> > indicator of progress
> > * backfill-related operations (including restarts) should be doable
> through
> > CLI interactions, and not require web server interactions as the typical
> > sandbox (dev environment) shouldn't assume the existence of a web server
> >
> > Let's fix this.
> >
> > Max
> >
>