osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Python-Dev] Usage of the multiprocessing API and object lifetime


Hi,

tzickel reported a reference cycle bug in multiprocessing which keeps
threads and processes alive:

   https://bugs.python.org/issue34172

He wrote a fix which has been merged in 3.6, 3.7 and master branches.
But Pablo Galindo noticed that the fix breaks the following code (he
added "I found the weird code in the example in several projects."):

    import multiprocessing

    def the_test():
        print("Begin")
        for x in multiprocessing.Pool().imap(int,
                ["4", "3"]):
            print(x)
        print("End")

    the_test()

Pablo proposed to add a strong reference to the Pool from
multiprocessing iterators:
https://bugs.python.org/issue35378

I blocked his pull request because I see this change as a risk of new
reference cycles. Since we are close to 3.6 and 3.7 releases, I
decided to revert the multiprocessing fix instead.

Pablo's issue35378 evolved to add a weak reference in iterators to try
to detect when the Pool is destroyed: raise an exception from the
iterator, if possible.

Then a discussion started on how the multiprocessing API is supposed
to be used and about the lifetime of multiprocessing objects.

I would prefer to make the multiprocessing API stricter: Python
shouldn't try to guess how long an object is going to live. The API
user has to *explicitly* release resources.

tzickel noted that the documentations says:

   "When the pool object is garbage collected terminate() will be
called immediately."

And that multiprocessing rely on the garbage collector to release
resources, especially using multiprocessing.util.Finalize tool:

    class Finalize(object):
        '''
        Class which supports object finalization using weakrefs
        '''
        def __init__(self, obj, callback, ...):
            ...
            if obj is not None:
                self._weakref = weakref.ref(obj, self)
            else:
                assert exitpriority is not None
            ...
            _finalizer_registry[self._key] = self

I propose to start to emit ResourceWarning in Python 3.8 when objects
are not released explicitly. I wrote a first change to emit
ResourceWarning in the Pool object:

   https://bugs.python.org/issue35424
   https://github.com/python/cpython/pull/10974

By the way, I'm surprised that "with pool:" doesn't release all
resources. An additional "pool.join()" is needed to ensure that all
resources are released. It's a little bit surprising to have to emit a
ResourceWarning if join() has not been called, even if the code uses
"with pool:".

I don't know well the multiprocessing API, so I'm not sure in which
directions we should go: best effort to support strange legacy code
with "implicit object lifetime", or become stricter in Python 3.8?

>From a technical point of view, I would prefer to become stricter.
Relying on the garbage collector means that the code is going to
behave badly on PyPy which uses a different garbage collector
implementation :-(

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.