osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

subprocess.Popen(['/sbin/ldconfig', '-p'], stdin=PIPE) itself hangs/deadlocks (Linux)


I ran into an interesting problem where calling
`subprocess.Popen(['/sbin/ldconfig', '-p'], stdin=PIPE)` hangs and
never returns.

$ python
Python 2.7.9 (default, Apr 23 2015, 22:07:47)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

>>> import subprocess
>>> p = subprocess.Popen(['/sbin/ldconfig', '-p'], stdout=subprocess.PIPE)
^CTraceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/Python/Python-2.7.9/lib/python2.7/subprocess.py", line
710, in __init__
    errread, errwrite)
  File "/opt/Python/Python-2.7.9/lib/python2.7/subprocess.py", line
1316, in _execute_child
    data = _eintr_retry_call(os.read, errpipe_read, 1048576)
  File "/opt/Python/Python-2.7.9/lib/python2.7/subprocess.py", line
476, in _eintr_retry_call
    return func(*args)
KeyboardInterrupt
>>>

Note how I have to send a user interrupt to break out of `subprocess.Popen()`.


TROUBLESHOOTING:

First, it's interesting to note that the following works:

>>> import subprocess
>>> p = subprocess.Popen(['/sbin/ldconfig -p'], stdout=subprocess.PIPE, shell=True)
>>> out,err = p.communicate()
>>> len(out)
102460
>>>

which I believe is the same as:

>>> import subprocess
>>> p = subprocess.Popen(['sh', '-c', '/sbin/ldconfig -p'], stdout=subprocess.PIPE)
>>> out,err = p.communicate()
>>> len(out)
102460
>>>

which also works.



Second, calling:

>>> import subprocess
>>> p = subprocess.Popen(['/sbin/ldconfig', '-p'])
1562 libs found in cache `/etc/ld.so.cache'
        libzmq.so.1 (libc6,x86-64) => /usr/lib64/libzmq.so.1
        libz.so.1 (libc6,x86-64) => /lib64/libz.so.1
        [ ... all 102,460 bytes of ldconfig -p output ...]
        ld-linux-x86-64.so.2 (libc6,x86-64) => /lib64/ld-linux-x86-64.so.2
>>>

also works, so the PIPE is my main suspect.


Finally, if I do:

>>> import subprocess
>>> p = subprocess.Popen(['/sbin/ldconfig', '-p'], stdout=subprocess.PIPE)

   [ manually pkill -INT ldconfig' ]

>>> out,err = p.communicate()
>>> len(out)
65536
>>>

then I notice that it reads exactly 65,536=2^16 bytes (out of 102,460
bytes).  I suspect this is related to the default buffer-size limit of
pipes set by the Linux kernel.  Using `strace` on the latter Python
process, reveals:

[...]
open("/opt/Python/Python-2.7.9/lib/python2.7/lib-dynload/cStringIO.so",
O_RDONLY) = 6
read(6, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0@\32\0\0\0\0\0\0"...,
832) = 832
fstat(6, {st_mode=S_IFREG|0755, st_size=49556, ...}) = 0
mmap(NULL, 2115000, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 6,
0) = 0x2ad3ca6e7000
mprotect(0x2ad3ca6eb000, 2093056, PROT_NONE) = 0
mmap(0x2ad3ca8ea000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 6, 0x3000) = 0x2ad3ca8ea000
close(6)                                = 0
close(5)                                = 0
close(4)                                = 0
getrlimit(RLIMIT_NOFILE, {rlim_cur=64*1024, rlim_max=64*1024}) = 0
close(3)                                = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x2ad3ca8ec000
write(1, "1\n", 21
)                      = 2
pipe([3, 4])                            = 0
fcntl(3, F_GETFD)                       = 0
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
fcntl(4, F_GETFD)                       = 0
fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
pipe([5, 6])                            = 0
fcntl(5, F_GETFD)                       = 0
fcntl(5, F_SETFD, FD_CLOEXEC)           = 0
fcntl(6, F_GETFD)                       = 0
fcntl(6, F_SETFD, FD_CLOEXEC)           = 0
clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x2ad3c972adf0) = 239074
close(6)                                = 0
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x2ad3ca8ed000
read(5,

and `strace` on the stalled `ldconfig` process reveals:

  $ strace -p $(pgrep ldconfig)
Process 239074 attached - interrupt to quit
write(1, "ibgconfmm-2.6.so.1 (libc6,x86-64"..., 4096
 RH  6.6 0:-  1:-* 2:--

That latter `write()` contains the bytes after position 65,536, i.e.
bytes 65,537 and beyond (not shown, but verified after careful
inspection).


MY CONCLUSION:

To me, this looks like a deadlock in Popen() itself - is that correct?


SESSION INFORMATION:

All of the above is with Python 2.7.9 (installed from EPEL), but I can
also reproduce it with Python 2.7.15 installed from source.

What is also useful to know, is that I'm observing this on a legacy
RHEL 6 system *with a customized kernel* part of the Scyld ClusterWare
(https://www.penguincomputing.com/products/software/scyld-clusterware/)
that *cannot* be updated:

$ uname -a
Linux n6 2.6.32-504.12.2.el6.664g0000.x86_64 #1 SMP Wed Mar 11
14:20:51 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux


I appreciate any suggestions to further troubleshoot this and ideally
resolve it.  The reason for this being an important issue is that
`find_library()` of ctypes.util performs the above stalling
`Popen(['/sbin/ldconfig', '-p'])` call that was introduced in Python
(>= 2.7.13).  This happens for instance whenever we try to create a
new virtual environment using `virtualenv`.  In other words, the
solution is *not* really to change the code to use, say, the
shell=True approach.

Thanks,

Henrik

PS. This is my first post to this list - please let me know if I
should send to another forum instead.