[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Capturing data changes that happen after the initial data pull

I am working with an API that provides daily data the day after the period
completes. For example, 2018-06-01 data is available on 2018-06-02 at 12 PM.

I have a daily DAG that pulls this data and loads it into Redshift.

The issue is that this data provider says that the data may be revised and
it won't be finalized until the Tuesday after the end of the week.

For example, for the week of 2018-05-27 through 2018-06-02, the data will
be "final" on Tuesday 2018-06-05.

I'd like to add another DAG that takes care of repulling the data for the
previous week every Tuesday and I am wondering about the best way to
implement this.

Should I just develop another DAG that pulls one week at a time using the
appropriate dates?

Is there a way to leverage the existing daily DAG and have another dag
trigger it with the appropriate execution date? If so, I suppose it would
create new DAG runs. How will I be able to tell these new dag runs apart
from the daily ones if they have the same execution dates?