[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Python-Dev] Python startup time

On Wed, May 2, 2018, 4:53 AM Gregory Szorc <gregory.szorc at gmail.com> wrote:

> On 7/19/2017 12:15 PM, Larry Hastings wrote:
> >
> >
> > On 07/19/2017 05:59 AM, Victor Stinner wrote:
> >> Mercurial startup time is already 45.8x slower than Git whereas tested
> >> Mercurial runs on Python 2.7.12. Now try to sell Python 3 to Mercurial
> >> developers, with a startup time 2x - 3x slower...
> >
> > When Matt Mackall spoke at the Python Language Summit some years back, I
> > recall that he specifically complained about Python startup time.  He
> > said Python 3 "didn't solve any problems for [them]"--they'd already
> > solved their Unicode hygiene problems--and that Python's slow startup
> > time was already a big problem for them.  Python 3 being /even slower/
> > to start was absolutely one of the reasons why they didn't want to
> upgrade.
> >
> > You might think "what's a few milliseconds matter".  But if you run
> > hundreds of commands in a shell script it adds up.  git's speed is one
> > of the few bright spots in its UX, and hg's comparative slowness here is
> > a palpable disadvantage.
> >
> >
> >> So please continue efforts for make Python startup even faster to beat
> >> all other programming languages, and finally convince Mercurial to
> >> upgrade ;-)
> >
> > I believe Mercurial is, finally, slowly porting to Python 3.
> >
> >     https://www.mercurial-scm.org/wiki/Python3
> >
> > Nevertheless, I can't really be annoyed or upset at them moving slowly
> > to adopt Python 3, as Matt's objections were entirely legitimate.
> I just now found found this thread when searching the archive for
> threads about startup time. And I was searching for threads about
> startup time because Mercurial's startup time has been getting slower
> over the past few months and this is causing substantial pain.
> As I posted back in 2014 [1], CPython's startup overhead was >10% of the
> total CPU time in Mercurial's test suite. And when you factor in the
> time to import modules that get Mercurial to a point where it can run
> commands, it was more like 30%!
> Mercurial's full test suite currently runs `hg` ~25,000 times. Using
> Victor's startup time numbers of 6.4ms for 2.7 and 14.5ms for
> 3.7/master, Python startup overhead contributes ~160s on 2.7 and ~360s
> on 3.7/master. Even if you divide this by the number of available CPU
> cores, we're talking dozens of seconds of wall time just waiting for
> CPython to get to a place where Mercurial's first bytecode can execute.
> And the problem is worse when you factor in the time it takes to import
> Mercurial's own modules.
> As a concrete example, I recently landed a Mercurial patch [2] that
> stubs out zope.interface to prevent the import of 9 modules on every
> `hg` invocation. This "only" saved ~6.94ms for a typical `hg`
> invocation. But this decreased the CPU time required to run the test
> suite on my i7-6700K from ~4450s to ~3980s (~89.5% of original) - a
> reduction of almost 8 minutes of CPU time (and over 1 minute of wall time)!
> By the time CPython gets Mercurial to a point where we can run useful
> code, we've already blown most of or past the time budget where humans
> perceive an action/command as instantaneous. If you ignore startup
> overhead, Mercurial's performance compares quite well to Git's for many
> operations. But the reality is that CPython startup overhead makes it
> look like Mercurial is non-instantaneous before Mercurial even has the
> opportunity to execute meaningful code!
> Mercurial provides a `chg` program that essentially spins up a daemon
> `hg` process running a "command server" so the `chg` program [written in
> C - no startup overhead] can dispatch commands to an already-running
> Python/`hg` process and avoid paying the startup overhead cost. When you
> run Mercurial's test suite using `chg`, it completes *minutes* faster.
> `chg` exists mainly as a workaround for slow startup overhead.
> Changing gears, my day job is maintaining Firefox's build system. We use
> Python heavily in the build system. And again, Python startup overhead
> is problematic. I don't have numbers offhand, but we invoke likely a few
> hundred Python processes as part of building Firefox. It should be
> several thousand. But, we've had to "hack" parts of the build system to
> "batch" certain build actions in single process invocations in order to
> avoid Python startup overhead. This undermines the ability of some build
> tools to formulate a reasonable understanding of the DAG and it causes a
> bit of pain for build system developers and makes it difficult to
> achieve "no-op" and fast incremental builds because we're always
> invoking certain Python processes because we've had to move DAG
> awareness out of the build backend and into Python. At some point, we'll
> likely replace Python code with Rust so the build system is more "pure"
> and easier to maintain and reason about.
> I've seen posts in this thread and elsewhere in the CPython development
> universe that challenge whether milliseconds in startup time matter.
> Speaking as a Mercurial and Firefox build system developer,
> *milliseconds absolutely matter*. Going further, *fractions of
> milliseconds matter*. For Mercurial's test suite with its ~25,000 Python
> process invocations, 1ms translates to ~25s of CPU time. With 2.7,
> Mercurial can dispatch commands in ~50ms. When you load common
> extensions, it isn't uncommon to see process startup overhead of
> 100-150ms! A millisecond here. A millisecond there. Before you know it,
> we're talking *minutes* of CPU (and potentially wall) time in order to
> run Mercurial's test suite (or build Firefox, or ...).
> From my perspective, Python process startup and module import overhead
> is a severe problem for Python. I don't say this lightly, but in my mind
> the problem causes me to question the viability of Python for popular
> use cases, such as CLI applications. When choosing a programming
> language, I want one that will scale as a project grows. Vanilla process
> overhead has Python starting off significantly slower than compiled code
> (or even Perl) and adding module import overhead into the mix makes
> Python slower and slower as projects grow. As someone who has to deal
> with this slowness on a daily basis, I can tell you that it is extremely
> frustrating and it does matter. I hope that the importance of the
> problem will be acknowledged (milliseconds *do* matter) and that
> creative minds will band together to address it. Since I am
> disproportionately impacted by this issue, if there's anything I can do
> to help, let me know

Is your Python interpreter statically linked? The Python 3 ones from the
anaconda distribution (use Miniconda!) are for Linux and macOS and that
roughly halved our startup times.

> Gregory
> [1] https://mail.python.org/pipermail/python-dev/2014-May/134528.html
> [2] https://www.mercurial-scm.org/repo/hg/rev/856f381ad74b
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/mingw.android%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20180502/3fded41b/attachment.html>