osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Counting Python threads vs C/C++ threads


On Tue, Jul 16, 2019 at 11:13 AM Barry Scott <barry at barrys-emacs.org> wrote:

> I'm going to assume you are on linux.
>
Yes, I am.  Ubuntu 16.04.6 LTS sometimes, Mint 19.1 other times.

On 16 Jul 2019, at 18:35, Dan Stromberg <drsalists at gmail.com> wrote:
> >
> > I'm looking at a performance problem in a large CPython 2.x/3.x codebase
> > with quite a few dependencies.
> >
> > I'm not sure what's causing the slowness yet.  The CPU isn't getting hit
> > hard, and I/O on the system appears to be low - but throughput is poor.
> > I'm wondering if it could be CPU-bound Python threads causing the problem
> > (because of the threading+GIL thing).
>
> Does top show the process using 100% CPU?
>
Nope.  CPU utilization and disk use are both low.

We've been going into top, and then hitting '1' to see things broken down
by CPU core (there are 32 of them, probably counting hyperthreads as
different cores), but the CPU use is in the teens or so.

I've also tried dstat and csysdig.  The hardware isn't breaking a sweat,
but throughput is poor.

> The non-dependency Python portions don't Appear to have much in the way of
> > threading going on based on a quick grep, but csysdig says a process
> > running the code has around 32 threads running - the actual thread count
> > varies, but that's the ballpark.
> >
> > I'm wondering if there's a good way to find two counts of those threads -
> > how many are from CPython code that could run afoul of the GIL, and how
> > many of them are from C/C++ extension modules that wouldn't be
> responsible
> > for a GIL issue.
>
> From the docs on threading:
>
> threading.active_count()
>
>  <file:///Library/Frameworks/Python.framework/Versions/3.7/Resources/English.lproj/Documentation/library/threading.html?highlight=threading#threading.active_count>
> Return the number of Thread
> <file:///Library/Frameworks/Python.framework/Versions/3.7/Resources/English.lproj/Documentation/library/threading.html?highlight=threading#threading.Thread>
> objects currently alive. The returned count is equal to the length of the
> list returned by enumerate()
> <file:///Library/Frameworks/Python.framework/Versions/3.7/Resources/English.lproj/Documentation/library/threading.html?highlight=threading#threading.enumerate>.
>

Are you on a Mac?

https://docs.python.org/2/library/threading.html appears to have some good
info. I'll probably try logging threading.active_count()

A question arises though: Does threading.active_count() only show Python
threads created with the threading module?  What about threads created with
the thread module?

Try running strace on the process to see what system calls its making.
>
I've tried it, but thank you.  It's a good suggestion.

I often find that when strace'ing a program, there's a bunch of
mostly-irrelevant stuff at Initial Program Load (IPL), but then the main
loop fails into a small cycle of system calls.

Not with this program.  Its main loop is busy and large.

You could also connect gdb to the process and find out what code the
> threads are running.
>

I used to use gdb, and wrappers for gdb, when I was doing C code, but I
don't have much experience using it on a CPython interrpreter.

Would I be doing a "thread apply all bt" or what?  I'm guessing those
backtraces could facilitate identifying the origin of a thread.

Thanks a bunch.