[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PSA: Make sure your Airflow instance isn't public and isn't Google indexed


+1 to being able to disable--we have authentication in place, but use a
separate solution that (probably?) Airflow won't realize is enabled, so
having a continuous giant warning banner would be rather unfortunate.

On Tue, Jun 5, 2018 at 2:05 PM, Alek Storm <alek.storm@xxxxxxxxx> wrote:

> This is a great idea, but we'd appreciate a setting that disables the
> banner even if those conditions aren't met - our instance is deployed
> without authentication, but is only accessible via our intranet.
>
> Alek
>
>
> On Tue, Jun 5, 2018, 3:35 PM James Meickle <jmeickle@xxxxxxxxxxxxxx>
> wrote:
>
> > I think that a banner notification would be a fair penalty if you access
> > Airflow without authentication, or have API authentication turned off, or
> > are accessing via http:// with a non-localhost `Host:`. (Are there any
> > other circumstances to think of?)
> >
> > I would also suggest serving a default robots.txt to mitigate accidental
> > indexing of public instances (as most public instances will be
> accidentally
> > public, statistically speaking). If you truly want your Airflow instance
> > public and indexed, you should have to go out of your way to permit that.
> >
> > On Tue, Jun 5, 2018 at 1:51 PM, Maxime Beauchemin <
> > maximebeauchemin@xxxxxxxxx> wrote:
> >
> > > What about a clear alert on the UI showing when auth is off? Perhaps a
> > > large red triangle-exclamation icon on the navbar with a tooltip
> > > "Authentication is off, this Airflow instance in not secure." and
> > clicking
> > > take you to the doc's security page.
> > >
> > > Well and then of course people should make sure their infra isn't open
> to
> > > the Internet. We really shouldn't have to tell people to keep their
> > > infrastructure behind a firewall. In most environments you have to do
> > quite
> > > a bit of work to open any resource up to the Internet (SSL certs,
> special
> > > security groups for load balancers/proxies, ...). Now I'm curious to
> > > understand how UMG managed to do this by mistake...
> > >
> > > Also a quick reminder to use the Connection abstraction to store
> secrets,
> > > ideally using the environment variable feature.
> > >
> > > Max
> > >
> > > On Tue, Jun 5, 2018 at 10:02 AM Taylor Edmiston <tedmiston@xxxxxxxxx>
> > > wrote:
> > >
> > > > One of our engineers wrote a blog post about the UMG mistakes as
> well.
> > > >
> > > > https://www.astronomer.io/blog/universal-music-group-airflow-leak/
> > > >
> > > > I know that best practices are well known here, but I second James'
> > > > suggestion that we add some docs, code, or config so that the
> framework
> > > > optimizes for being (nearly) production-ready by default and not just
> > > easy
> > > > to start with for local dev.  Admittedly this takes some work to not
> > add
> > > > friction to the local onboarding experience.
> > > >
> > > > Do most people keep separate airflow.cfg files per environment like
> > > what's
> > > > considered the best practice in the Django world?  e.g.
> > > > https://stackoverflow.com/q/10664244/149428
> > > >
> > > > Taylor
> > > >
> > > > *Taylor Edmiston*
> > > > Blog <https://blog.tedmiston.com/> | CV
> > > > <https://stackoverflow.com/cv/taylor> | LinkedIn
> > > > <https://www.linkedin.com/in/tedmiston/> | AngelList
> > > > <https://angel.co/taylor> | Stack Overflow
> > > > <https://stackoverflow.com/users/149428/taylor-edmiston>
> > > >
> > > >
> > > > On Tue, Jun 5, 2018 at 9:57 AM, James Meickle <
> jmeickle@xxxxxxxxxxxxxx
> > >
> > > > wrote:
> > > >
> > > > > Bumping this one because now Airflow is in the news over it...
> > > > >
> > > > > https://www.bleepingcomputer.com/news/security/contractor-
> > > > > exposes-credentials-for-universal-music-groups-it-
> > > > > infrastructure/?utm_campaign=Security%2BNewsletter&utm_
> > > > > medium=email&utm_source=Security_Newsletter_co_79
> > > > >
> > > > > On Fri, Mar 23, 2018 at 9:33 AM, James Meickle <
> > > jmeickle@xxxxxxxxxxxxxx>
> > > > > wrote:
> > > > >
> > > > > > While Googling something Airflow-related a few weeks ago, I
> noticed
> > > > that
> > > > > > someone's Airflow dashboard had been indexed by Google and was
> > > > accessible
> > > > > > to the outside world without authentication. A little more
> Googling
> > > > > > revealed a handful of other indexed instances in various states
> of
> > > > > > security. I did my best to contact the operators, and waited for
> > > > > responses
> > > > > > before posting this.
> > > > > >
> > > > > > Airflow is not a secure project by default (
> > > https://issues.apache.org/
> > > > > > jira/browse/AIRFLOW-2047), and you can do all sorts of mean
> things
> > to
> > > > an
> > > > > > instance that hasn't been intentionally locked down. (And even
> > then,
> > > > you
> > > > > > shouldn't rely exclusively on your app's authentication for
> > providing
> > > > > > security.)
> > > > > >
> > > > > > Having "internal" dashboards/data sources/executors exposed to
> the
> > > web
> > > > is
> > > > > > dangerous, since old versions can stick around for a very long
> > time,
> > > > help
> > > > > > compromise unrelated deployments, and generally just create very
> > bad
> > > > > press
> > > > > > for the overall project if there's ever a mass compromise (see:
> > Redis
> > > > and
> > > > > > MongoDB).
> > > > > >
> > > > > > Shipping secure defaults is hard, but perhaps we could add best
> > > > practices
> > > > > > like instructions for deploying a robots.txt with Airflow? Or an
> > > impact
> > > > > > statement about what someone could do if they access your Airflow
> > > > > instance?
> > > > > > I think that many people deploying Airflow for the first time
> might
> > > not
> > > > > > realize that it can get indexed, or how much damage someone can
> > cause
> > > > via
> > > > > > accessing it.
> > > > > >
> > > > >
> > > >
> > >
> >
>