OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Single Airflow Instance Vs Multiple Airflow Instance


We didn't specify scheduler_zombie_task_threshold in our config and uses
512 as max_tis_per_query( we're using the most powerful RDS instance). One
thing that you guys might want to pay attention besides core/PARALLELISM
is scheduler/MAX_THREADS. If your DAG files parses fast( most DAG files
should), and the scheduler loops runs slow, then the scheduler loop may not
harvest the DAG parsing result frequent enough, thus making each harvest
bigger can be an option( in Airbnb we use 64 and may bump it up soon. At
the same time, I'm working with Dan Davydov on some fixes/improvements that
will remove some bottlenecks in scheduler, goal is to make Airflow able to
handle 100k+ tasks--in a common/Airbnb DAG distribution pattern).

Cheers,
Kevin Y

On Fri, Jun 8, 2018 at 3:28 AM ramandumcs@xxxxxxxxx <ramandumcs@xxxxxxxxx>
wrote:

> Thanks Kevin,
> I am specifically interested in scheduler settings
> like scheduler_zombie_task_threshold, max_tis_per_query
> We are expecting the load in terms of 1000(s) concurrent Dags so any
> airflow setting which might help us in achieving this target would be
> useful.
> There will be 1000(s)  local DAG file increase with schedule set to @once.
>
>
>
> On 2018/06/08 05:13:39, Ruiqin Yang <yrqls21@xxxxxxxxx> wrote:
> > Not sure about 1.9 but parallelism seems to be supported on master
> > <
> https://github.com/apache/incubator-airflow/blob/272952a9dce932cb2c648f82c9f9f2cafd572ff1/airflow/executors/base_executor.py#L113
> >.
> > We are using 1.8 with some bug fixing cherry-picks. The machine is just
> out
> > of the box AWS EC2 instances. We've been using I3 for scheduler and R3
> for
> > worker, but I urge you to checkout the new generations which are more
> > powerful and cheaper. As always, you may pick the best series by profile
> > your machine usage( on I/O, ram, cpu, etc). I don't think we've tuned too
> > much on the default Airflow settings and the best setting for you guys
> > should be different that the one best for us( that being said, I can
> > provide some more details when I'm back to the office if you are curious
> on
> > some particular settings).
> >
> > Cheers,
> > Kevin Y
> >
> > On Thu, Jun 7, 2018 at 9:02 PM ramandumcs@xxxxxxxxx <
> ramandumcs@xxxxxxxxx>
> > wrote:
> >
> > > We have similar use case where we need to support multiple teams and
> > > expected load is 1000(s) active TIs. We are exploring setting up
> multiple
> > > airflow cluster on for each team and scale that cluster horizontally
> > > through   celery executor.
> > > @Ruiquin could you please share some details on airflow setup like
> > > Airflow Version, Machine configuration, Airflow cfg settings etc..
> > > How can we configure infinity(0) for cluster-wide setting. (We are
> using
> > > airflow v1.9 and it seems that
> > > airflow cfg's parallelism = 0 is not supported in v1.9)
> > >
> > > On 2018/06/07 22:27:20, Ruiqin Yang <yrqls21@xxxxxxxxx> wrote:
> > > > Here to provide a datapoint from Airbnb--all users share the same
> cluster
> > > > (~8k active DAGs and ~15k running tasks at peak).
> > > >
> > > > For the cluster-wide concurrency setting, we put infinity( 0) there
> and
> > > > scale up on the # of workers if we need more worker slot.
> > > >
> > > > For the scheduler & Airflow UI coupling, I believe Airflow UI is not
> > > > coupled with the scheduler. Actually in Airbnb we couple airflow
> worker
> > > and
> > > > airflow webserver together on the same EC2 instance--but you can
> always
> > > > have a set of instances only hosting webservers.
> > > >
> > > > If you have some critical users that don't want their DAG affected by
> > > > changes from other users( adhoc new DAGs/tasks), you can probably
> set up
> > > > dedicated celery queue( assuming you are using celery executor, local
> > > > executor is in theory not for production) for the user, or, you can
> > > enforce
> > > > DAG level concurrency( maybe a CI or through policy
> > > > <
> > >
> https://github.com/apache/incubator-airflow/blob/master/airflow/settings.py#L109
> > > >--which
> > > > I'm not sure is a good practice since it is more for task level
> > > attributes).
> > > >
> > > > With the awesome RBAC change in place, I think it make sense to
> share the
> > > > same cluster, easier maintenance, less user confusion, etc.
> > > >
> > > > Cheers,
> > > > Kevin Y
> > > >
> > > > On Thu, Jun 7, 2018 at 1:59 PM Ananth Durai <vananth22@xxxxxxxxx>
> wrote:
> > > >
> > > > > At Slack, We follow a similar pattern of deploying multiple airflow
> > > > > instances. Since the Airflow UI & the scheduler coupled, it
> introduces
> > > > > friction as the user need to know underlying deployment strategy.
> (like
> > > > > which Airflow URL I should visit to see my DAGs, multiple teams
> > > > > collaborating on the same DAG, pipeline operations, etc.)
> > > > >
> > > > > In one of the forum question, max mentioned renaming the scheduler
> to
> > > > > supervisor as the scheduler do more than just scheduling.
> > > > > It would be super cool if we can make multiple supervisors share
> the
> > > same
> > > > > airflow metadata storage and the Airflow UI. (maybe introducing a
> > > unique
> > > > > config param `supervisor.id` for each instance)
> > > > >
> > > > > The approach will help us to scale Airflow scheduler horizontally
> and
> > > while
> > > > > keeping the simplicity from the user perspective.
> > > > >
> > > > >
> > > > > Regards,
> > > > > Ananth.P,
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On 7 June 2018 at 04:08, Arturo Michel <
> Arturo.Michel@xxxxxxxxxxxxxx>
> > > > > wrote:
> > > > >
> > > > > > We have had up to 50 dags with multiple tasks each. Many of them
> run
> > > in
> > > > > > parallel, we've had some issues with compute as it was meant to
> be a
> > > > > > temporary deployment but somehow it's now the permanent
> production
> > > one
> > > > > and
> > > > > > resources are not great.
> > > > > > Oranisationally it is very similar to what Gerard described. More
> > > than
> > > > > one
> > > > > > group working with different engineering practices and standards,
> > > this is
> > > > > > probably one of the sources of problems.
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Gerard Toonstra <gtoonstra@xxxxxxxxx>
> > > > > > Sent: Wednesday, June 6, 2018 5:02 PM
> > > > > > To: dev@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
> > > > > > Subject: Re: Single Airflow Instance Vs Multiple Airflow Instance
> > > > > >
> > > > > > We are using two cluster instances. One cluster is for the
> > > engineering
> > > > > > teams that are in the "tech" wing and which rigorously follow
> tech
> > > > > > principles, the other instance is for use by business analysts
> and
> > > more
> > > > > > ad-hoc, experimental work, who do not necessarily follow the
> > > principles.
> > > > > We
> > > > > > have a nomad engineer helping out the ad-hoc cluster, setting it
> up,
> > > > > > connecting it to all systems and resolving programming
> questions. All
> > > > > > clusters are fully puppetized, so we reuse configs and ways how
> > > things
> > > > > are
> > > > > > configured, plus have a common "platform code" package that is
> reused
> > > > > > across both clusters.
> > > > > >
> > > > > > G>
> > > > > >
> > > > > >
> > > > > > On Wed, Jun 6, 2018 at 5:50 PM, James Meickle <
> > > jmeickle@xxxxxxxxxxxxxx>
> > > > > > wrote:
> > > > > >
> > > > > > > An important consideration here is that there are several
> settings
> > > > > > > that are cluster-wide. In particular, cluster-wide concurrency
> > > > > > > settings could result in Team B's DAG refusing to schedule
> based
> > > on an
> > > > > > error in Team A's DAG.
> > > > > > >
> > > > > > > Do your teams follow similar practices in how eagerly they ship
> > > code,
> > > > > > > or have similar SLAs for resolving issues? If so, you are
> probably
> > > > > > > fine using co-tenancy. If not, you should probably talk about
> it
> > > first
> > > > > > > to make sure the teams are okay with co-tenancy.
> > > > > > >
> > > > > > > On Wed, Jun 6, 2018 at 11:24 AM, gauthiermartin86@xxxxxxxxx <
> > > > > > > gauthiermartin86@xxxxxxxxx> wrote:
> > > > > > >
> > > > > > > > Hi Everyone,
> > > > > > > >
> > > > > > > > We have been experimenting with airflow for about 6 months
> now.
> > > > > > > > We are planning to have multiple departments to use it.
> Since we
> > > > > > > > don't have any internal experience with Airflow we are
> wondering
> > > if
> > > > > > > > single instance per department is more suited than single
> > > instance
> > > > > > > > with multi-tenancy? We have been aware about the upcoming
> > > release of
> > > > > > > > airflow
> > > > > > > > 1.10 and changes that will be made to the RBAC which will be
> more
> > > > > > > > suited for multi-tenancy.
> > > > > > > >
> > > > > > > > Any advice on this ? Any tips could be helpful to us.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > This e-mail message and any attachments are confidential and are
> for
> > > the
> > > > > > exclusive use of the addressee only.  If you are not the intended
> > > > > > recipient, you should not use the content, place any reliance on
> it
> > > or
> > > > > > disclose it to anyone else.  Please notify the sender
> immediately by
> > > > > > replying to it and then ensure that it is deleted from your
> system
> > > > > > (including any attachments).
> > > > > >
> > > > >
> > > >
> > >
> >
>