osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals


Hi Sean,

On Fri, 28 Sep 2018 19:23:06 -0400
Sean Harrington <seanharr11 at gmail.com> wrote:
> My simple argument is that the
> developer should not be constrained to make the objects passed globally
> available in the process, as this MAY break encapsulation for large
> projects.

IMHO, global variables don't break encapsulation if they remain private
to the module where they actually play a role.

Of course, there are also global-like alternatives to globals, such as
class attributes...  The multiprocessing module itself uses globals (or
quasi-globals) internally for various implementation details.

> 3. If you don't like globals, you could probably do something like
> > lazily-initialize the resource when a function needing it is executed;
> > this also avoids creating the resource if the child doesn't use it at
> > all.  Would that work for you?
> >
> > I have nothing against globals, my gripe is with being enforced to use  
> them for every Pool use case. Further, if initializing the resource is
> expensive, we only want to do this ONE time per worker process.

That's what I meant with lazy initialization: initialize it if not
already done, otherwise just use the already-initialized resource.
It's a common pattern.

(you can view it as a 1-element cache if you prefer)

> > As a more general remark, I understand the desire to make the Pool
> > object more flexible, but we can also not pile up features until it
> > satisfies all use cases.
> >
> > I understand that this is a legitimate concern, but this is about API  
> approachability.  Python end-users of Pool are forced to declare a global
> from a lexical scope. Most Python end-users probably don't even know this
> is possible.

Hmm...  We might have a disagreement on the target audience of the
multiprocessing module.  multiprocessing isn't very high-level, I would
expect it to be used by experienced programmers who know how to mutate
a global variable from a lexical scope.

For non-programmer end-users, such as data scientists, there are
higher-level libraries such as Celery (http://www.celeryproject.org/)
and Dask distributed (https://distributed.readthedocs.io/en/latest/).
Perhaps it would be worth mentioning them in the documentation.

Regards

Antoine.