osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[nova] critical bug around reload/upgrades



On 4/3/19 1:20 PM, Matt Riedemann wrote:
> On 3/28/2019 7:42 PM, Mohammed Naser wrote:
>> Looks like some progress has been made but we're pretty confident that
>> this
>> is more and more an Oslo.service bug:
>>
>> Matt & Dan have both left ideas around this with possible solutions on
>> how to
>> make a change like this back portable..
>>
>> https://review.openstack.org/#/c/641907/
> 
> Another update on this, but I was trying to recreate the original
> reported issue in the nova bug:
> 
> https://bugs.launchpad.net/nova/+bug/1715374
> 
> And I didn't even get to the point of the libvirt driver waiting for the
> network-vif-plugged event because privsep blows up much earlier during
> server create after SIGHUP'ing the service. Details start at comment 34
> in that bug, but the tl;dr is the privsep-helper child processes are
> gone after the SIGHUP so anything that relies on privsep (which is
> anything using root in the libvirt driver and os-vif utils code now I
> think) won't work until you restart the service.
> 
> I don't yet know if this is a regression in Stein but I'm going to
> create a stable/rocky devstack and try to find out.

With that oslo.service patch [1] in place, I recreated Matt's result as
described above. Then I hacked on oslo.privsep a bit [2] and was able to
resolve the issue (create instances smoothly after SIGHUPping
n-cpu.service). That fix is going to need UT, but also more thread- and
socket- and security-savvy eyeballs to make sure it has legs. But
hopefully we can finally put this one to bed.

efried

[1] https://review.opendev.org/#/c/641907/
[2] https://review.opendev.org/#/c/678323/