osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Modeling rate limited api calls in airflow


Have you looked into pools?  Pools allow you to specify how many tasks at
any given time should use a common resource.
That way you could limit this to 1, 2, or 3 for example. Pools are not
dynamic however, so it only allows you to upper limit how many
number of clients are going to hit the API at any moment, not determine how
many when the rate limit is in effect
(unless.... you use code to reconfigure the pool on demand, but I'm not
sure if I should recommend that, i.e. reconfigure the # of clients
on the basis of hitting the rate limit.)  It sounds as if this logic is
best introduced at the hook level, where it determines that it passes
out an API interface only when the rate limit is not in place, where
operators specify how many retries should occur.

The Adwords API does allow increasing the rate limit threshold though and
you're probably better off negotiating
with Google to up that threshold, explaining your business case etc.?

Gerard



On Thu, Aug 9, 2018 at 10:43 AM rob@xxxxxxxxxxx <rob@xxxxxxxxxxx> wrote:

> Hello,
>
> I am in the process of migrating a bespoke data pipe line built around
> celery into airflow.
>
> We have a number of different tasks which interact with the Adwords API
> which has a rate limiting policy. The policy isn't a fixed number of
> requests its variable.
>
> In our celery code we have handled this by capturing a rate limit error
> response and setting a key in redis to make sure that no tasks execute
> against the API until it's expired. Any task that does get executed checks
> for the presence of the key and if the key exists issues a retry for when
> the rate limit is due to expire.
>
> Moving over to Airflow I can't find a way to go about scheduling a task to
> retry in a specific amount of time. Doing some reading it seems a Sensor
> could work to prevent other dags from executing whilst the rate limit is
> present.
>
> I also can't seem to find an example of handling different exceptions from
> a python task and adapting the retry logic accordingly.
>
> Any pointers would be much appreciated,
>
> Rob
>