Re: PSA: Make sure your Airflow instance isn't public and isn't Google indexed
One of our engineers wrote a blog post about the UMG mistakes as well.
I know that best practices are well known here, but I second James'
suggestion that we add some docs, code, or config so that the framework
optimizes for being (nearly) production-ready by default and not just easy
to start with for local dev. Admittedly this takes some work to not add
friction to the local onboarding experience.
Do most people keep separate airflow.cfg files per environment like what's
considered the best practice in the Django world? e.g.
Blog <https://blog.tedmiston.com/> | CV
<https://stackoverflow.com/cv/taylor> | LinkedIn
<https://www.linkedin.com/in/tedmiston/> | AngelList
<https://angel.co/taylor> | Stack Overflow
On Tue, Jun 5, 2018 at 9:57 AM, James Meickle <jmeickle@xxxxxxxxxxxxxx>
> Bumping this one because now Airflow is in the news over it...
> On Fri, Mar 23, 2018 at 9:33 AM, James Meickle <jmeickle@xxxxxxxxxxxxxx>
> > While Googling something Airflow-related a few weeks ago, I noticed that
> > someone's Airflow dashboard had been indexed by Google and was accessible
> > to the outside world without authentication. A little more Googling
> > revealed a handful of other indexed instances in various states of
> > security. I did my best to contact the operators, and waited for
> > before posting this.
> > Airflow is not a secure project by default (https://issues.apache.org/
> > jira/browse/AIRFLOW-2047), and you can do all sorts of mean things to an
> > instance that hasn't been intentionally locked down. (And even then, you
> > shouldn't rely exclusively on your app's authentication for providing
> > security.)
> > Having "internal" dashboards/data sources/executors exposed to the web is
> > dangerous, since old versions can stick around for a very long time, help
> > compromise unrelated deployments, and generally just create very bad
> > for the overall project if there's ever a mass compromise (see: Redis and
> > MongoDB).
> > Shipping secure defaults is hard, but perhaps we could add best practices
> > like instructions for deploying a robots.txt with Airflow? Or an impact
> > statement about what someone could do if they access your Airflow
> > I think that many people deploying Airflow for the first time might not
> > realize that it can get indexed, or how much damage someone can cause via
> > accessing it.