osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Python-Dev] Usage of the multiprocessing API and object lifetime


On Tue, 11 Dec 2018 11:48:24 -0800
Nathaniel Smith <njs at pobox.com> wrote:
> 
> I know this question is rhetorical, but it does actually have a principled
> answer. Memory is a special resource, because the GC has a complete picture
> of memory use in the program, and if there's a danger of running out of
> memory then the GC will detect that and quickly run a collection before
> anyone has a chance to notice. But it doesn't know about other resources
> like descriptors, threads, processes, etc., so it can't detect or prevent
> unbounded leaks of these resources.
> 
> Therefore, in a classic GC-ed language, bytes() doesn't need to be
> explicitly released, but all other kinds of resources do.

I would disagree here.  You may /like/ to release other kinds of
resources explicitly, but you don't /need/ to.  It is actually obvious
for things such as mutexes which, while the GC doesn't know about
them, are small system resources.  And nobody's asking Python to add a
method to deterministically destroy the mutex that's inside a Lock
object (*).
(also, calling Lock.release() already does something else :-))

Arguably, things are more complicated for things like threads and
processes.  But here we are talking not about threads and processes
themselves, but about an abstraction (the Pool object) that
is /designed/ to hide threads and processes in favour of higher level
semantics organized around the idea of task submission.  One important
characteristic here is that, when the pool is idle, those threads and
processes aren't holding important resources (user-allocated resources)
alive (**).  The idle pool just has a bookkeeping overhead.

Usually, people don't really care how exactly a Pool manages its helper
threads and worker processes (it has both at the same time), and they
are fine with the internal bookkeeping overhead.  For the people who
care (***), the Pool.join() method is there to be called.

(*) actually, it's not a mutex, it's a semaphore

(**) unless the user sets a global variable from an executing
task, which I think of as an anti-pattern :-)

(***) for example because they used the anti-pattern above :-)

Regards

Antoine.