[Python-Dev] Have a big machine and spare time? Here's a possible Python bug.
It seems pretty clear now that the primary cause is keeping arenas
sorted by number of free pools. As deallocation goes on, the number of
distinct "# of free pools" values decreases, leaving large numbers of
arenas sharing the same value. Then, e.g., if there are 10,000 arenas
with 30 free pools remaining, and another pool is freed in one of
them, its free pool count jumps to 31, and on average we have to crawl
over 5,000 of its former peers to locate its new position in the list.
Which is also consistent with what the OP saw, and I did too but in a
much smaller case: the longer deallocation goes on, the slower it
gets (fewer and fewer Nodes freed per unit time).
Which suggests a different - albeit related - approach: it's not
really the order of arenas that matters, but the partitioning of
arenas by number of free pools. So instead of a singly doubly linked
list of arenas, have a vector of doubly linked lists, one list per
possible number of free pools. That's a fixed and relatively small
number. For example, even if all arena bytes could be used for pools
(they can't be - there's bookkeeping info at the start of an arena),
the maximum number of free pools in an arena would be 2**18 / 2**12 ==
2**6 == 64.
When a pool is freed, no big deal: unlink the arena from its current
list, and stick it in the list for the next higher number of free
pools. This, of course, is constant-time.
Of course there are edge cases (e.g., if the area is entirely free, it
should be given back to C instead), but they seem pretty obvious, and
I haven't thought of a real problem.
Tedious, though ;-)