[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Thread-safe way to add a key to a dict only if it isn't already there?

On Sat, Jul 7, 2018 at 8:03 AM Stefan Behnel <stefan_ml at behnel.de> wrote:
> Marko Rauhamaa schrieb am 07.07.2018 um 15:41:
> > Steven D'Aprano <steve+comp.lang.python at pearwood.info>:
> >> On Sat, 07 Jul 2018 02:51:41 +0900, INADA Naoki wrote:
> >>> D.setdefault('c', None)
> >>
> >> Oh that's clever!
> >
> > Is that guaranteed to be thread-safe? The documentation (<URL: http
> > s://docs.python.org/3/library/stdtypes.html#dict.setdefault>) makes no
> > such promise.
> It's implemented in C and it's at least designed to avoid multiple lookups
> and hash value calculations, which suggests that it's also thread-safe by
> design (or by a side-effect of the design). Whether that's guaranteed, I
> cannot say, but a change that makes it non-thread-safe would probably be
> very controversial.

It's only implemented in C if you're using CPython (and if it's the
builtin dict type and not a subclass). If there's any chance that your
code might run under any other interpreter than CPython, then you
can't rely on the GIL for thread-safety. I would also point out
https://bugs.python.org/issue25343. While some operations are known to
be atomic (and therefore thread-safe), the leaning of the devs seems
to be to refrain from documenting it and instead document that *no*
operations are guaranteed atomic.

> > At least __collectios_abc.py
> > contains this method definition for MutableMapping:
> >
> >     def setdefault(self, key, default=None):
> >         'D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D'
> >         try:
> >             return self[key]
> >         except KeyError:
> >             self[key] = default
> >         return default
> >
> > There are more such non-thread-safe definitions.
> That's a different beast, because Python code can always be interrupted by
> thread switches (between each byte code execution). C code cannot, unless
> it starts executing byte code (e.g. for calculating a key's hash value) or
> explicitly allows a thread switch at a given point.

dict.setdefault does potentially call __hash__ and __eq__ on the key.
Since this is part of the lookup I don't know whether it affects
thread-safety as long as the key is properly hashable, but it does
make it more difficult to reason about. I don't *think* that
setdefault calls Py_DECREF, but if it did then that is another
potential point of thread interruption.

By contrast, using a mutex to guard accesses is definitely safe, and
it's also self-documenting of the fact that thread-safety is a