OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Dealing with data latency


Thanks, Max!

On Mon, Jun 4, 2018 at 12:47 PM Maxime Beauchemin <
maximebeauchemin@xxxxxxxxx> wrote:

> The common standard is to have the execution_date aligned with the
> partition date in the database (say 2018-08-08) and contain data from
> 2018-08-08T00:00:000
> to 2018-08-09T23:59:999.
>
> The partition date and execution_date match and correspond to the left
> bound of the time interval processed.
>
> Then you'd use some sensors to make sure this cannot run until the desired
> time or conditions are met.
>
> Max
>
> On Mon, Jun 4, 2018 at 5:46 AM Pedro Machado <pedro@xxxxxxxxxxxxxx> wrote:
>
> > Hi. What is the recommended way to deal with data latency? For example, I
> > have a feed that is not considered final until 72 hours have passed after
> > the end of the daily period.
> >
> > For example, Monday's data would be ready by Thursday at 23:59.
> >
> > Should I pull data based on the execution date minus a 72 hour offset or
> > use the execution date and somehow delay the data pull for 72 hours?
> >
> > The latter would be more intuitive (data pull date = execution date) but
> I
> > am not sure if it's a good pattern.
> >
> > Thanks,
> >
> > Pedro
> >
>