osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: About the project support in Airflow


Hi Song,

Just noted that we are also working on dag-level access on top of
RBAC(AIRFLOW-2267) which should provide dag-level acl functionality. The
WIP pr could be found at
https://github.com/apache/incubator-airflow/pull/3197

On Wed, Apr 25, 2018 at 7:42 PM, 刘松(Cycle++开发组) <liusong02@xxxxxxxxxx>
wrote:

> Hi Taylor,
>
> Yes, I know that this RBAC feature would be released within the 1.10
> release.
>
> # About multi-user support
>
> But Why not deploy one instance of Airflow per user ? (
> With this feature, don’t you think that the Airflow is to be more likely
> as a platform to serve more different users.
> Also multi-user case would exhaust the Airflow resource more easily if we
> are talking the scalability capability of Airflow.
>
> # About multi-project support
>
> You could see the “project” concept is some kind of logical group of the
> DAGs to let the DAGs be organized more structural.
> I can’t see it will beat the “scalability” of Airflow somehow, it just let
> the user experience be more friendly I see.
>
> So that is why I want to use the “multi-user support” case to argue why
> suggest using multi-instance for “multi-project”,
> since that I think the “multi-user” support is kindly of pushing the
> Airflow in the way of “be more scalable”, but “multi-project” just be more
> intuitive and more user-experience friendly.
>
> Thanks,
> Song
>
> On 26/04/2018, 4:50 AM, "Taylor Edmiston" <tedmiston@xxxxxxxxx> wrote:
>
>     Something else that might be relevant for your multi-user use case is
> the
>     new RBAC support that Joy Gao added.
>
>     https://github.com/apache/incubator-airflow/pull/3015
>
>     *Taylor Edmiston*
>     Blog <http://blog.tedmiston.com> | Stack Overflow CV
>     <https://stackoverflow.com/story/taylor> | LinkedIn
>     <https://www.linkedin.com/in/tedmiston/> | AngelList
>     <https://angel.co/taylor>
>
>
>     On Wed, Apr 25, 2018 at 3:04 PM, James Meickle <
> jmeickle@xxxxxxxxxxxxxx>
>     wrote:
>
>     > Another reason you would want separated infrastructure is that there
> are a
>     > lot of ways to exhaust Airflow resources or otherwise cause
> contention -
>     > like having too many sensors or sub-DAGs using up all available
> tasks.
>     >
>     > Doesn't seem like a great idea to push for having different teams
> with
>     > co-tenancy until there is also per-team control over resource use...
>     >
>     > On Tue, Apr 24, 2018 at 8:27 PM, 刘松(Cycle++开发组) <
> liusong02@xxxxxxxxxx>
>     > wrote:
>     >
>     > > It seems that all the current approach is pointing to multiple
> instance
>     > of
>     > > airflow, but project concept is very nature since one user might to
>     > handle
>     > > different type of tasks.
>     > >
>     > > Another thing about the multiple user support, one way is also to
> deploy
>     > > multiple instance, but it seems that airflow is providing multiple
> user
>     > > function builtin.
>     > >
>     > > So I can not be convinced that using multiple instance for multiple
>     > > project purpose.
>     > >
>     > > Thanks,
>     > > Song
>     > >
>     > >
>     > >
>     > >
>     > > On Wed, Apr 25, 2018 at 4:25 AM +0800, "Ace Haidrey" <
>     > acehaidrey@xxxxxxxxx
>     > > <mailto:acehaidrey@xxxxxxxxx>> wrote:
>     > >
>     > >
>     > > Looks neat Taylor!
>     > >
>     > > And regarding the original question, going off of what Maxime and
> Bolke
>     > > said, at Pandora, it made more sense for us to have an instance
> per team
>     > > since each team has its own system user for prod and the instance
> can run
>     > > all processes as that user. Alternatively you could have a super
> user
>     > that
>     > > can sudo as those other system users, and have many teams on a
> single
>     > > instance but that is a security concern (what if one team sudo's
> as the
>     > > other team and accidentally overwrites data - there is nothing
> stopping
>     > > them from doing it). It depends what your org set up is, but let
> me know
>     > if
>     > > there are any questions I can help with.
>     > >
>     > > Ace
>     > >
>     > >
>     > > > On Apr 24, 2018, at 1:16 PM, Taylor Edmiston  wrote:
>     > > >
>     > > > We use a similar approach like Bolke mentioned with running
> multiple
>     > > > Airflow instances.
>     > > >
>     > > > I haven't read the Pandora article yet, but we have an
> Astronomer Open
>     > > > Edition (fully open source) that bundles similar tools like
> Prometheus,
>     > > > Grafana, Celery, etc with Airflow and a Docker Compose file if
> you're
>     > > > looking to get a setup like that up and running quickly.
>     > > >
>     > > > https://github.com/astronomerio/astronomer/blob/
>     > master/examples/airflow-
>     > > enterprise/docker-compose.yml
>     > > > https://github.com/astronomerio/astronomer
>     > > >
>     > > > *Taylor Edmiston*
>     > > > Blog  | Stack Overflow CV
>     > > >  | LinkedIn
>     > > >  | AngelList
>     > > >
>     > > >
>     > > >
>     > > > On Tue, Apr 24, 2018 at 3:30 PM, Maxime Beauchemin <
>     > > > maximebeauchemin@xxxxxxxxx> wrote:
>     > > >
>     > > >> Related blog post about multi-tenant Airflow deployment out of
>     > Pandora:
>     > > >> https://engineering.pandora.com/apache-airflow-at-pandora-
>     > 1d7a844d68ee
>     > > >>
>     > > >> On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin
>     > > >> wrote:
>     > > >>
>     > > >>> My suggestion would be to deploy airflow per project. You
> could even
>     > > use
>     > > >>> airflow to manage your ci/cd pipeline.
>     > > >>>
>     > > >>> B.
>     > > >>>
>     > > >>> Sent from my iPhone
>     > > >>>
>     > > >>>> On 24 Apr 2018, at 18:33, Maxime Beauchemin <
>     > > >> maximebeauchemin@xxxxxxxxx>
>     > > >>> wrote:
>     > > >>>>
>     > > >>>> People have been talking about namespacing DAGs in the past.
> I'd
>     > > >>> recommend
>     > > >>>> using tags (many to many) instead of categories/projects (one
> to
>     > > many).
>     > > >>>>
>     > > >>>> It should be fairly easy to add this feature. One question is
>     > whether
>     > > >>> tags
>     > > >>>> are defined as code or in the UI/db only.
>     > > >>>>
>     > > >>>> Max
>     > > >>>>
>     > > >>>>> On Tue, Apr 24, 2018 at 1:48 AM, Song Liu
>     > > >> wrote:
>     > > >>>>>
>     > > >>>>> Hi,
>     > > >>>>>
>     > > >>>>> Basically the DAGs are created for a project purpose, so if
> I have
>     > > >> many
>     > > >>>>> different projects, will the Airflow support the Project
> concept
>     > and
>     > > >>>>> organize them separately ?
>     > > >>>>>
>     > > >>>>> Is this a known requirement or any plan for this already ?
>     > > >>>>>
>     > > >>>>> Thanks,
>     > > >>>>> Song
>     > > >>>>>
>     > > >>>
>     > > >>
>     > >
>     > >
>     > >
>     >
>
>
>