[Python-Dev] Have a big machine and spare time? Here's a possible Python bug.
[Inada Naoki <songofacandy at gmail.com>]
> For the record, result for 10M nodes, Ubuntu 18.04 on AWS r5a.4xlarge:
I'm unclear on what "nodes" means. If you mean you changed 27M to 10M
in this line:
for token in random_strings(27_000_000):
that's fine, but there are about 40 times more than that `Node`
objects created by the program. So if you did change that to
10_000_000, you'd have about 400M `Node` objects, consuming about 80x
that many bytes of RAM (or about 32GB).
> $ local/bin/python3 t1.py # default
> 1138.1123778309993 -- end train, start del
> 688.7927911250008 -- end
> $ arena-1m/bin/python3 t1.py # Changed ARENA_SIZE to 1MiB
> 1085.3363994170013 -- end train, start del
> 84.57135540099989 -- end
Nice! I assume these are seconds. On Stackoverflow, the OP reported
that boosting ARENA_SIZE the same way cut deallocation time in his
original program to about 13 minutes.
> $ PYTHONMALLOC=malloc local/bin/python3 t1.py
> 1157.4882792789995 -- end train, start del
> 27.919834706000074 -- end
While the OP reported, for the original program:
With PYTHONMALLOC=malloc, insertion takes nearly twice as long, but
deallocation only takes 2 minutes!
The "nearly twice as long" for building the tree is in sharp contrast
to what you saw, but there's about 3x as much of everything in the
original program, and, e.;g., it's certainly possible malloc is
tripping over fragmentation then that obmalloc avoids.
> $ LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1
> PYTHONMALLOC=malloc local/bin/python3 t1.py
> 1098.4383037820007 -- end train, start del
> 117.93938426599925 -- end
> In this case, glibc malloc is the fastest.
> glibc is know to weak about fragmentation.
> But algorithm to avoid fragmentation is just an overhead in this script.
I can't say.
I'll note that the approach I very briefly sketched before
(restructure the list of arenas to partition it into multiple lists
partitioned by number of free pools) "should make" obmalloc
competitive with malloc here (it would eliminate all searches, except
perhaps (depending on how gonzo the code is) a rare need to search for
"the first" non-empty list).