[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Python-Dev] PEP 556 threaded garbage collection & linear recursion in gc


This PEP is currently Deferred as nobody is actively working on a test

A situation came up the other day where I *believe* this could've helped.

Scenario (admittedly not one most environments run into): A Python process
with a C++ extension module implementing a threaded server (threads spawned
by C++) that could call back into CPython within server request handlers.
(ie: how all threaded servers tend to work regardless of core loop
implementation language)

Python code in the application had done something (unknown to me, I didn't
dive into their code) that built up large enough presumably nested or
recursive data structures that the garbage collector, when triggered, would
wind up in very deep recursion.  This caused a stack overflow as the C++
spawned threads were only being given a 256k stack (to conserve virtual
address space - there can potentially be a _ton_ of threads in this code).

That had a C++ stack trace 1000+ levels deep repeating the pattern of

    @     0x564d59bd21de         32  func_dealloc
    @     0x564d59bce0c1         32  cell_dealloc
    @     0x564d5839db41         48  tupledealloc
    @     0x564d59bd21de         32  func_dealloc
    @     0x564d59bce0c1         32  cell_dealloc
    @     0x564d5839db41         48  tupledealloc

If our gc were done on a thread of its own spawned by Python, with a
typical normal larger default stack size (8mb) this would've been less
likely to trigger a crash (though obviously still possible if the recursion
is linear).

I know, there are obvious workarounds to this situation of all sorts.  My
point is more that synchronously triggering gc _within_ a thread that
happens to invoke the periodic "hey, lets do another gc run" logic the eval
loop is undesirable.

I'm not making any plans to work on an implementation for this PEP,
deferred seems accurate. Just dropping a note that I'd still be interested
in seeing something other than synchronous gc in arbitrary threads happen
when a process has multiple threads.

*Another* take away from this is that it *appears* possible to cause our gc
to go into linear recursion.  Yuck!  I'd file an issue on that, but doing
so requires making some example code to construct such a scenario first...

Food for thought in case these kinds of things are something anyone else
has encountered.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20190327/8b90cf69/attachment.html>