[Python-Dev] Have a big machine and spare time? Here's a possible Python bug.
[Larry Hastings <larry at hastings.org>]
> I have a computer with two Xeon CPUs and 256GB of RAM. So, even
> though it's NUMA, I still have 128GB of memory per CPU. It's running a
> "spin" of Ubuntu 18.10.
> I compiled a fresh Python 3.7.3 --with-optimizations. I copied the sample
> program straight off the StackOverflow page. The program ran for about
> five and a half hours then exited normally.
> During the run it printed:
> This gets printed!
> This doesn't get printed
> Statistics reported by "time":
> 19811.05s user 123.56s system 99% cpu 5:32:15.04 total
> Checking in on it now and then, peak observed memory usage (as reported
> by "top") was just under 80GB.
All of that is consistent with what others reported, although you've
given more quantified details, and that's helpful.
> I take it that the interesting part was confirming that "This doesn't get printed"
> gets printed when you have enough RAM for the program to run to completion.
That was the primary point at first, yes,
> So I guess there's no bug here? Just an observation about CPython's
> garbage collector being kinda slow? Or maybe CPython gc + swap =
> super bombad slow?
No need to confirm this: if you ran it again with a bit more output,
you'd find that the vast bulk of the time was spent between the two
output lines. That is, it takes hours from the time the train()
function starts to return and completes returning. That's all
consumed as a side effect of `tree` losing its last reference.
We know why now, and it's hard to say whether it's "a bug". It's
functioning as designed, but the design sucks ;-) So I'd call it a
The heuristics that were introduced to try to make it likely we could
return more empty arenas to C were designed when machines had far less
RAM, and can burn time quadratic in the number of arenas. That really
didn't matter in a world with a few hundred arenas, but can be a
timing disaster in a world with hundreds of thousands (or more) of
The Stackoverflow test case was whittled way down from the OP's real
program, which ran "for days" without completing.
I've sketched two (albeit closely related, both to each other and to
what we currently do) ways the worst-case overall time could be cut
from quadratic to linear in the number of arenas, which is
More ambitious would be to dream up an entirely different heuristic.
Keeping the list of usable arenas fully sorted in order of number of
free pools seems a very strong requirement for a poke-and-hope
But, best I can tell, the person who came up with this heuristic to
begin with didn't check in any motivating test programs. So we'd be
starting over from scratch. For this specific Stackoverflow case,
I've confirmed that it doesn't make a real difference if we didn't
bother to sort the list of arenas _at all_ (then it still returns
about the same number of arenas to C, both at the point the tree
finishes building, and at the time it completes tearing down the
So I have a program showing the current scheme can be a disaster, but
none where it shows it actually helps anything ;-)