[Python-Dev] PEP 556 threaded garbage collection & linear recursion in gc
[Gregory P. Smith <greg at krypto.org>]
> Good point, I hadn't considered that it was regular common ref
> count 0 dealloc chaining.
It pretty much has to be whenever you see a chain of XXX_dealloc
routines in a stack trace. gcmodule.c never even looks at a
tp_dealloc slot directly, let alone directly invoke a deallocation
method. That all happens indirectly, as a result of what Py_DECREF
does. Then once you're first inside one tp_dealloc method, gc is
completely irrelevant - it's that tp_dealloc for the top-level
container does its own Py_DECREF on a contained container, which in
turn can do _its_ own Py_DECREF on one of its contained containers
.... etc. You can get an arbitrarily deep stack of XXX_dealloc calls
then, and there's really no other way to get that.
BTW, "container" here is used in a very broad C-level sense, not a
high-level Python sense: any PyObject that contains a pointer to a
PyObject is "a container" in the intended sense.
> The processes unfortunately didn't have faulthandler enabled so there wasn't
> much info from where in the python code it happened (now fixed).
It's quite possible that the top-level container was Py_DECREF'ed by
code in gcmodule.c. But gc gets blamed at first for a lot of stuff
that's not actually its fault ;-)
> I'll see if anything looks particularly unusual next time I hear of such a report.
The trashcan mechanism is the one and only hack in the code intended
to stop unbounded XXX_dealloc stacks, so that's what needs looking at.
Alas, it's hard to work with because it's so very low-level, and
there's nothing portable that can be relied on about stack sizes or
requirements across platforms or compilers.
- The trashcan code is buggy.
- The maximum container dealloc stack depth trashcan intends to allow
(PyTrash_UNWIND_LEVEL = 50) is too large for the C stack a thread gets
under this app on this platform using this compiler.
- One or more of the specific container types involved in this app's
dealloc chain doesn't use the trashcan gimmick at all, so is invisible
to trashcan's count of how deep the call stack has gotten.
For example, cell_dealloc was in your stack trace, but I see no use of
trashcan code in that function (Py_TRASHCAN_SAFE_BEGIN /
Py_TRASHCAN_SAFE_END). So the trashcan hack has no idea that
cell_dealloc calls are on the stack.
And likewise for func_dealloc.- looks like calls to that are also
invisible to the trashcan.
tupledealloc is cool, though.
IIRC, Christian Tismer introduced the trashcan because code only he
wrote ;-) was blowing the stack when very deeply nested lists and/or
tuples became trash.
>From a quick scan of the current code, looks like it was later added
to only a few types that aren't container types in the Python sense.
Which may or may not matter here. Your stack trace showed a
tupledealloc in one of every three slots, so even if two of every
three slots were invisible to the traschcan, the call stack "should
have been" limited to a maximum of about PyTrash_UNWIND_LEVEL * 3 =
150 XXX_dealloc functions. But you saw a stack 1000+ levels deep. So
something else that isn't immediately apparent is also going on.