logo       

ZEO communication deadlock?: msg#00059

web.zope.zodb

Subject: ZEO communication deadlock?

We see occationally the following behaviour (Zope 2.7 + ZEO + psycopg):

Zope suddenly stops request processing (usually during high load
situations) and hangs apparently.

An external lifeness checker detects the irresponsiveness
and sends a SIGHUP to the Zope process. Zope does not react
to this SIGHUP (we should see a log message when
"Signals.Signals.restartHandler" is activated).

I interprete the "does not react to SIGHUP" that Zope is not running
any Python code as signals are only noticed at Python bytecode boundaries.

In principle this could have been caused by a C extension with
a blocking function that does not release the GIL. Indeed,
older versions of "psycopg" had such a behaviour in its "connect".
However, we log all interactions with Postgres and should see
blocking here.

Remains the possibility of a deadlock.
The asyncore thread would need to be included in this deadlock
as it executes Python code whenever a new request arrives (and
anyway after 30 s have elapsed). Medusa itself tries hard
to decouple from the worker thread such that a deadlock in worker
threads should not affect medusa itself.

Remains the ZEO communication. When I am right, it does not decouple
communication from work and calls methods directly from the "asyncore"
thread. Some calls acquire locks, e.g. calls to process invalidations.
Should it be possible that we get a deadlock in this way?


Maybe, a ZEO communication expert can say something about this...


--
Dieter

_______________________________________________
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list - ZODB-Dev@xxxxxxxx
http://mail.zope.org/mailman/listinfo/zodb-dev



<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise