[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Python-Dev] Have a big machine and spare time? Here's a possible Python bug.

On Fri, 24 May 2019 14:23:21 +0200
Thomas Wouters <thomas at python.org> wrote:
> On Thu, May 23, 2019 at 5:15 PM Steve Dower <steve.dower at python.org> wrote:
> > On 23May2019 0542, Inada Naoki wrote:  
> > > 1. perf shows 95% of CPU time is eaten by _PyObject_Free, not kernel  
> > space.  
> > > 2. This loop is cleary hot:
> > >  
> > https://github.com/python/cpython/blob/51aa35e9e17eef60d04add9619fe2a7eb938358c/Objects/obmalloc.c#L1816-L1819  
> > >
> > > I can attach the process by gdb and I confirmed many arenas have
> > > same nfreepools.  
> >
> > It's relatively easy to test replacing our custom allocators with the
> > system ones, yes? Can we try those to see whether they have the same
> > characteristic?
> >
> > Given the relative amount of investment over the last 19 years [1], I
> > wouldn't be surprised if most system ones are at least as good for our
> > needs now. Certainly Windows HeapAlloc has had serious improvements in
> > that time to help with fragmentation and small allocations.
> >  
> FYI, and I've mentioned this at PyCon to a few people (might've been you,
> Steve, I don't remember) -- but at Google we've experimented with disabling
> obmalloc when using TCMalloc (a faster and thread-aware malloc, which makes
> a huge difference within Google when dealing with multi-threaded C++
> libraries), both using the usual Python benchmarks and real-world code with
> real-world workloads (a core part of YouTube, for example), all on Linux.
> There's still a significant benefit to using obmalloc when using glibc's
> malloc, and also a noticeable benefit when using TCMalloc. There are
> certainly cases where it doesn't matter much, and there may even be cases
> where the overhead of obmalloc isn't worth it, but I do believe it's still
> a firm net benefit.

Interesting that a 20-year simple allocator (obmalloc) is able to do
better than the sophisticated TCMalloc.

(well, of course, obmalloc doesn't have to worry about concurrent
scenarios, which explains some of the simplicity)