[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SubdagOperator and Pools


to clarify, I created a Gist with instructions for how to reproduce this issue:


thanks, regards

On 08/09/2018 07:41 AM, Andreas Koeltringer wrote:
Hi Tao,

thanks for your response.

That's just the thing: I am talking about ONE SubdagOperator: the tasks within in execute in parallel. That's what confuses me.

Kind regards,

On 08/08/2018 06:41 PM, Tao Feng wrote:
Hi Andreas,

The default executor for SubdagOperator is SequentialExecutor which makes
sure all the tasks within subdag are executed in sequential order. But if
you have too many subdags within single DAG and want to control with
pooling(https://airflow.apache.org/concepts.html#pools), subdagOperator u
nfortunately doesn't respect pooling(
https://issues.apache.org/jira/browse/AIRFLOW-2371) at this momement. My
understanding is that airflow uses backfill Scheduler to schedule
subdagOperator instead of the normal scheduler which backfill scheduler has
certain discrepancies with the normal scheduler on pooling support.


On Wed, Aug 8, 2018 at 9:14 AM, Andreas Koeltringer <
andreas.koeltringer@xxxxxxxxx> wrote:


we have a SubdagOperator with lots of tasks in it. We want to limit the
parallelism, with which these tasks execute. Therefore we created a pool
and added the tasks within the SubdagOperator to this pool.

However, this setting is not respected (see image attached).

Now we am wondering why that is. In 'subdag_operator.py' on the master
branch there is a comment that

     "Airflow pool is not honored by SubDagOperator."

This comment is not in the file in v1.9.0 (which I am using).

So this means that Pools are not respected for Subdags?

On the other handside it states that Subdags use the SequentialExecutor,
which *should* execute tasks sequentially?

Can anyone clarify this, please?
And if pools do not work, what options do we have to limit parallelism in
a Subdag?

Thanks in advance,