Re: Dealing with data latency
On Mon, Jun 4, 2018 at 12:47 PM Maxime Beauchemin <
> The common standard is to have the execution_date aligned with the
> partition date in the database (say 2018-08-08) and contain data from
> to 2018-08-09T23:59:999.
> The partition date and execution_date match and correspond to the left
> bound of the time interval processed.
> Then you'd use some sensors to make sure this cannot run until the desired
> time or conditions are met.
> On Mon, Jun 4, 2018 at 5:46 AM Pedro Machado <pedro@xxxxxxxxxxxxxx> wrote:
> > Hi. What is the recommended way to deal with data latency? For example, I
> > have a feed that is not considered final until 72 hours have passed after
> > the end of the daily period.
> > For example, Monday's data would be ready by Thursday at 23:59.
> > Should I pull data based on the execution date minus a 72 hour offset or
> > use the execution date and somehow delay the data pull for 72 hours?
> > The latter would be more intuitive (data pull date = execution date) but
> > am not sure if it's a good pattern.
> > Thanks,
> > Pedro