OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Pinning dependencies for Apache Airflow


I suggest not adopting pipenv. It has a nice "first five minutes" demo but
it's simply not baked enough to depend on as a swap in pip replacement. We
are in the process of removing it after finding several serious bugs in our
POC of it.

On Thu, Oct 4, 2018, 20:30 Alex Guziel <alex.guziel@xxxxxxxxxx.invalid>
wrote:

> FWIW, there's some value in using virtualenv with Docker to isolate
> yourself from your system's Python.
>
> It's worth noting that requirements files can link other requirements
> files, so that would make groups easier, but not that pip in one run has no
> guarantee of transitive dependencies not conflicting or overriding. You
> need pip check for that or use --no-deps.
>
> On Thu, Oct 4, 2018 at 5:19 PM Driesprong, Fokko <fokko@xxxxxxxxxxxxxx>
> wrote:
>
> > Hi Jarek,
> >
> > Thanks for bringing this up. I missed the discussion on Slack since I'm
> on
> > holiday, but I saw the thread and it was way too interesting, and
> therefore
> > this email :)
> >
> > This is actually something that we need to address asap. Like you
> mention,
> > we saw it earlier that specific transient dependencies are not compatible
> > and then we end up with a breaking CI, or even worse, a broken release.
> > Earlier we had in the setup.py the fixed versions (==) and in a separate
> > requirements.txt the requirements for the CI. This was also far from
> > optimal since we had two versions of the requirements.
> >
> > I like the idea that you are proposing. Maybe we can do an experiment
> with
> > it, because of the nature of Airflow (orchestrating different systems),
> we
> > have a huge list of dependencies. To not install everything, we've
> created
> > groups. For example specific libraries when you're using the Google
> Cloud,
> > Elastic, Druid, etc. So I'm curious how it will work with the `
> > extras_require` of Airflow
> >
> > Regarding the pipenv. I don't use any pipenv/virtualenv anymore. For me
> > Docker is much easier to work with. I'm also working on a PR to get rid
> of
> > tox for the testing, and move to a more Docker idiomatic test pipeline.
> > Curious what you thoughts are on that.
> >
> > Cheers, Fokko
> >
> > Op do 4 okt. 2018 om 15:39 schreef Arthur Wiedmer <
> > arthur.wiedmer@xxxxxxxxx
> > >:
> >
> > > Thanks Jakob!
> > >
> > > I think that this is a huge risk of Slack.
> > > I am not against Slack as a support channel, but it is a slippery slope
> > to
> > > have more and more decisions/conversations happening there, contrary to
> > > what we hope to achieve with the ASF.
> > >
> > > When we are starting to discuss issues of development, extensions and
> > > improvements, it is important for the discussion to happen in the
> mailing
> > > list.
> > >
> > > Jarek, I wouldn't worry too much, we are still in the process of
> learning
> > > as a community. Welcome and thank you for your contribution!
> > >
> > > Best,
> > > Arthur.
> > >
> > > On Thu, Oct 4, 2018 at 1:42 PM Jarek Potiuk <Jarek.Potiuk@xxxxxxxxxxx>
> > > wrote:
> > >
> > > > Thanks for pointing it out Jakob.
> > > >
> > > > I am still very fresh in the ASF community and learning the ropes and
> > > > etiquette and code of conduct. Apologies for my ignorance.
> > > > I re-read the conduct and FAQ now again - with more understanding and
> > > will
> > > > pay more attention to wording in the future. As you mentioned it's
> more
> > > the
> > > > wording than intentions, but since it was in TL;DR; it has stronger
> > > > consequences.
> > > >
> > > > BTW. Thanks for actually following the code of conduct and pointing
> it
> > > out
> > > > in respectful manner. I really appreciate it.
> > > >
> > > > J.
> > > >
> > > > Principal Software Engineer
> > > > Phone: +48660796129
> > > >
> > > > On Thu, 4 Oct 2018, 20:41 Jakob Homan, <jghoman@xxxxxxxxx> wrote:
> > > >
> > > > > > TL;DR; A change is coming in the way how
> dependencies/requirements
> > > are
> > > > > > specified for Apache Airflow - they will be fixed rather than
> > > flexible
> > > > > (==
> > > > > > rather than >=).
> > > > >
> > > > > > This is follow up after Slack discussion we had with Ash and
> Kaxil
> > -
> > > > > > summarising what we propose we'll do.
> > > > >
> > > > > Hey all.  It's great that we're moving this discussion back from
> > Slack
> > > > > to the mailing list.  But I've gotta point out that the wording
> needs
> > > > > a small but critical fix up:
> > > > >
> > > > > "A change *is* coming... they *will* be fixed"
> > > > >
> > > > > needs to be
> > > > >
> > > > > "We'd like to propose a change... We would like to make them
> fixed."
> > > > >
> > > > > The first says that this decision has been made and the result of
> the
> > > > > decision, which was made on Slack, is being reported back to the
> > > > > mailing list.  The second is more accurate to the rest of the
> > > > > discussion ('what we propose...').  And again, since it's axiomatic
> > in
> > > > > ASF that if it didn't happen on a list, it didn't happen[1], we
> gotta
> > > > > make sure there's no confusion about where the community is on the
> > > > > decision-making process.
> > > > >
> > > > > Thanks,
> > > > > Jakob
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://community.apache.org/newbiefaq.html#NewbieFAQ-IsthereaCodeofConductforApacheprojects
> > > > > ?
> > > >
> > > > On Thu, Oct 4, 2018 at 9:56 AM Alex Guziel
> > > > > <alex.guziel@xxxxxxxxxx.invalid> wrote:
> > > > > >
> > > > > > You should run `pip check` to ensure no conflicts. Pip does not
> do
> > > this
> > > > > on
> > > > > > its own.
> > > > > >
> > > > > > On Thu, Oct 4, 2018 at 9:20 AM Jarek Potiuk <
> > > Jarek.Potiuk@xxxxxxxxxxx>
> > > > > > wrote:
> > > > > >
> > > > > > > Great that this discussion already happened :). Lots of useful
> > > things
> > > > > in
> > > > > > > it. And yes - it means pinning in requirement.txt - this is how
> > > > > pip-tools
> > > > > > > work.
> > > > > > >
> > > > > > > J.
> > > > > > >
> > > > > > > Principal Software Engineer
> > > > > > > Phone: +48660796129
> > > > > > >
> > > > > > > On Thu, 4 Oct 2018, 18:14 Arthur Wiedmer, <
> > > arthur.wiedmer@xxxxxxxxx>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jarek,
> > > > > > > >
> > > > > > > > I will +1 the discussion Dan is referring to and George's
> > advice.
> > > > > > > >
> > > > > > > > I just want to double check we are talking about pinning in
> > > > > > > > requirements.txt only.
> > > > > > > >
> > > > > > > > This offers the ability to
> > > > > > > > pip install -r requirements.txt
> > > > > > > > pip install --no-deps airflow
> > > > > > > > For a guaranteed install which works.
> > > > > > > >
> > > > > > > > Several different requirement files can be provided for
> > specific
> > > > use
> > > > > > > cases,
> > > > > > > > like a stable dev one for instance for people wanting to work
> > on
> > > > > > > operators
> > > > > > > > and non-core functions.
> > > > > > > >
> > > > > > > > However, I think we should proactively test in CI against
> > > unpinned
> > > > > > > > dependencies (though it might be a separate case in the
> > matrix) ,
> > > > so
> > > > > that
> > > > > > > > we get advance warning if possible that things will break.
> > > > > > > > CI downtime is not a bad thing here, it actually caught a
> > problem
> > > > :)
> > > > > > > >
> > > > > > > > We should unpin as possible in setup.py to only maintain
> > minimum
> > > > > required
> > > > > > > > compatibility. The process of pinning in setup.py is
> extremely
> > > > > > > detrimental
> > > > > > > > when you have a large number of python libraries installed
> with
> > > > > different
> > > > > > > > pinned versions.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Arthur
> > > > > > > >
> > > > > > > > On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
> > > > > <ddavydov@xxxxxxxxxxx.invalid
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Relevant discussion about this:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
> > > > > > > > >
> > > > > > > > > On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
> > > > > Jarek.Potiuk@xxxxxxxxxxx
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > TL;DR; A change is coming in the way how
> > > > > dependencies/requirements
> > > > > > > are
> > > > > > > > > > specified for Apache Airflow - they will be fixed rather
> > than
> > > > > > > flexible
> > > > > > > > > (==
> > > > > > > > > > rather than >=).
> > > > > > > > > >
> > > > > > > > > > This is follow up after Slack discussion we had with Ash
> > and
> > > > > Kaxil -
> > > > > > > > > > summarising what we propose we'll do.
> > > > > > > > > >
> > > > > > > > > > *Problem:*
> > > > > > > > > > During last few weeks we experienced quite a few
> downtimes
> > of
> > > > > > > TravisCI
> > > > > > > > > > builds (for all PRs/branches including master) as some of
> > the
> > > > > > > > transitive
> > > > > > > > > > dependencies were automatically upgraded. This because
> in a
> > > > > number of
> > > > > > > > > > dependencies we have  >= rather than == dependencies.
> > > > > > > > > >
> > > > > > > > > > Whenever there is a new release of such dependency, it
> > might
> > > > > cause
> > > > > > > > chain
> > > > > > > > > > reaction with upgrade of transitive dependencies which
> > might
> > > > get
> > > > > into
> > > > > > > > > > conflict.
> > > > > > > > > >
> > > > > > > > > > An example was Flask-AppBuilder vs flask-login transitive
> > > > > dependency
> > > > > > > > with
> > > > > > > > > > click. They started to conflict once AppBuilder has
> > released
> > > > > version
> > > > > > > > > > 1.12.0.
> > > > > > > > > >
> > > > > > > > > > *Diagnosis:*
> > > > > > > > > > Transitive dependencies with "flexible" versions (where
> >=
> > is
> > > > > used
> > > > > > > > > instead
> > > > > > > > > > of ==) is a reason for "dependency hell". We will sooner
> or
> > > > > later hit
> > > > > > > > > other
> > > > > > > > > > cases where not fixed dependencies cause similar problems
> > > with
> > > > > other
> > > > > > > > > > transitive dependencies. We need to fix-pin them. This
> > causes
> > > > > > > problems
> > > > > > > > > for
> > > > > > > > > > both - released versions (cause they stop to work!) and
> for
> > > > > > > development
> > > > > > > > > > (cause they break master builds in TravisCI and prevent
> > > people
> > > > > from
> > > > > > > > > > installing development environment from the scratch.
> > > > > > > > > >
> > > > > > > > > > *Solution:*
> > > > > > > > > >
> > > > > > > > > >    - Following the old-but-good post
> > > > > > > > > >    https://nvie.com/posts/pin-your-packages/ we are
> going
> > to
> > > > > fix the
> > > > > > > > > > pinned
> > > > > > > > > >    dependencies to specific versions (so basically all
> > > > > dependencies
> > > > > > > are
> > > > > > > > > >    "fixed").
> > > > > > > > > >    - We will introduce mechanism to be able to upgrade
> > > > > dependencies
> > > > > > > > with
> > > > > > > > > >    pip-tools (https://github.com/jazzband/pip-tools). We
> > > might
> > > > > also
> > > > > > > > > take a
> > > > > > > > > >    look at pipenv:
> > https://pipenv.readthedocs.io/en/latest/
> > > > > > > > > >    - People who would like to upgrade some dependencies
> for
> > > > > their PRs
> > > > > > > > > will
> > > > > > > > > >    still be able to do it - but such upgrades will be in
> > > their
> > > > PR
> > > > > > > thus
> > > > > > > > > they
> > > > > > > > > >    will go through TravisCI tests and they will also have
> > to
> > > be
> > > > > > > > specified
> > > > > > > > > > with
> > > > > > > > > >    pinned fixed versions (==). This should be part of
> > review
> > > > > process
> > > > > > > to
> > > > > > > > > > make
> > > > > > > > > >    sure new/changed requirements are pinned.
> > > > > > > > > >    - In release process there will be a point where an
> > > upgrade
> > > > > will
> > > > > > > be
> > > > > > > > > >    attempted for all requirements (using pip-tools) so
> that
> > > we
> > > > > are
> > > > > > > not
> > > > > > > > > > stuck
> > > > > > > > > >    with older releases. This will be in controlled PR
> > > > environment
> > > > > > > where
> > > > > > > > > > there
> > > > > > > > > >    will be time to fix all dependencies without impacting
> > > > others
> > > > > and
> > > > > > > > > likely
> > > > > > > > > >    enough time to "vet" such changes (this can be done
> for
> > > > > alpha/beta
> > > > > > > > > > releases
> > > > > > > > > >    for example).
> > > > > > > > > >    - As a side effect dependencies specification will
> > become
> > > > far
> > > > > > > > simpler
> > > > > > > > > >    and straightforward.
> > > > > > > > > >
> > > > > > > > > > Happy to hear community comments to the proposal. I am
> > happy
> > > to
> > > > > take
> > > > > > > a
> > > > > > > > > lead
> > > > > > > > > > on that, open JIRA issue and implement if this is
> something
> > > > > community
> > > > > > > > is
> > > > > > > > > > happy with.
> > > > > > > > > >
> > > > > > > > > > J.
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > >
> > > > > > > > > > *Jarek Potiuk, Principal Software Engineer*
> > > > > > > > > > Mobile: +48 660 796 129
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>