OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using large numbers of sensors, resource consumption


I also have that requirement and I'm working on a proposal for
rescheduling tasks. My current PoC can be found at [1] which uses
up_for_retry state which has some problems. I started to make some
changes, I hope can make a first proposal this week.

The basic idea is:
* A new "reschedule" flag for sensors, if set to True it will raise an
AirflowRescheduleException (with the new schedule date) that causes a
reschedule
* Reschedule requests are recorded in new `task_reschedule` table and
visualized in the Gantt view.
* A new TI dependency that checks if a task is ready to be re-scheduled

Advantages:
* This change is backward compatible. Existing sensors behave like
before. But it's possible to set the "reschedule" flag.
* The timeout and poke_interval are still respected and used to
calculate the next schedule time
* Custom sensor implementations can even define the next sensible
schedule date.
* This mechanism can also be used by non-sensor operators

Kind Regards,
Stefan

[1] https://github.com/seelmann/incubator-airflow/tree/reschedule-sensor-3

On 07/10/2018 04:05 PM, Pedro Machado wrote:
> I have a few DAGs that use time sensors to wait until data is ready, which
> can be several days.
> 
> I have one daily DAG where, for each execution date, I have to repull the
> data for the next 7 days to capture changes (late arriving revenue data).
> This DAG currently starts 7 TimeDeltaSensors for each execution days with
> delays that range from 0 to 6 days.
> 
> I was wondering what the recommendation is for cases like this where a
> large number of sensors is needed.
> 
> Are there ways to reduce the footprint of these sensors so that they use
> less CPU and memory?
> 
> I noticed that in one of the DAGs that Germain Tanguy had in the
> presentation he shared today a sensor was set to time out every 30 seconds
> but had a large retry count so instead of running constantly, it runs every
> 15 minutes for 30 seconds and then dies.
> 
> Are other people using this pattern? Do you have other suggestions?
> 
> Thanks,
> 
> Pedro
>