osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[oslo][oslo-messaging][nova] Stein nova-api AMQP issue running under uWSGI


On Mon, Apr 22, 2019 at 01:21:03PM -0500, Ben Nemec wrote:
> 
> 
> On 4/22/19 12:53 PM, Alex Schultz wrote:
> > On Mon, Apr 22, 2019 at 11:28 AM Ben Nemec <openstack at nemebean.com> wrote:
> > > 
> > > 
> > > 
> > > On 4/20/19 1:38 AM, Michele Baldessari wrote:
> > > > On Fri, Apr 19, 2019 at 03:20:44PM -0700, iain.macdonnell at oracle.com wrote:
> > > > > 
> > > > > Today I discovered that this problem appears to be caused by eventlet
> > > > > monkey-patching. I've created a bug for it:
> > > > > 
> > > > > https://bugs.launchpad.net/nova/+bug/1825584
> > > > 
> > > > Hi,
> > > > 
> > > > just for completeness we see this very same issue also with
> > > > mistral (actually it was the first service where we noticed the missed
> > > > heartbeats). iirc Alex Schultz mentioned seeing it in ironic as well,
> > > > although I have not personally observed it there yet.
> > > 
> > > Is Mistral also mixing eventlet monkeypatching and WSGI?
> > > 
> > 
> > Looks like there is monkey patching, however we noticed it with the
> > engine/executor. So it's likely not just wsgi.  I think I also saw it
> > in the ironic-conductor, though I'd have to try it out again.  I'll
> > spin up an undercloud today and see if I can get a more complete list
> > of affected services. It was pretty easy to reproduce.
> 
> Okay, I asked because if there's no WSGI/Eventlet combination then this may
> be different from the Nova issue that prompted this thread. It sounds like
> that was being caused by a bad interaction between WSGI and some Eventlet
> timers. If there's no WSGI involved then I wouldn't expect that to happen.
> 
> I guess we'll see what further investigation turns up, but based on the
> preliminary information there may be two bugs here.

So just to get some closure on this error that we have seen around
mistral executor and tripleo with python3: this was due to the ansible
action that called subprocess which has a different implementation in
python3 and so the monkeypatching needs to be adapted.

Review which fixes it for us is here: https://review.opendev.org/#/c/656901/

Damien and I think the nova_api/eventlet/mod_wsgi has a separate root-cause
(although we have not spent all too much time on that one yet)

cheers.
Michele
-- 
Michele Baldessari            <michele at acksyn.org>
C2A5 9DA3 9961 4FFB E01B  D0BC DDD4 DCCB 7515 5C6D