osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[neutron][oslo] CI issue related to pyroute2 and latest oslo.privsep


We discussed this in the Oslo meeting yesterday and the conclusion we 
came to was that we would try the easiest option first. We're going to 
provide a way to specify that certain calls need to run in the main 
thread rather than being scheduled to the thread pool. If this proves 
insufficient to fix the problem we can revisit the more complicated options.

On 1/17/19 2:37 PM, Ben Nemec wrote:
> I think it's worth noting that this has actually demonstrated a rather 
> significant issue with threaded privsep, which is that forking from a 
> Python thread is really not a safe thing to do.[1][2]
> 
> Sure, we could just say "don't fork in privileged code", but in this 
> case the fork wasn't even in our code, it was in a library we were 
> using. There are a few options, none of which I'm crazy about at this 
> point:
> 
> * Provide a way for callers to specify that a call needs to run 
> in-process rather than in the thread-pool. Two problems with this: 1) It 
> requires the callers to know that forking is happening and 2) I'm not 
> sure it actually fixes all of the potential problems. You might need to 
> have a completely separate privsep daemon to avoid the potential bad 
> fork/thread interactions.
> 
> * Switch to multiprocessing so calls execute in their own process. I may 
> be wrong, but I think this requires all of the parameters passed in to 
> be pickleable, which I bet is not remotely the case right now.
> 
> I'm open to suggestions that are better than playing whack-a-mole with 
> these bugs using a threaded and un-threaded daemon.
> 
> -Ben
> 
> 1: https://rachelbythebay.com/w/2011/06/07/forked/
> 2: https://rachelbythebay.com/w/2014/08/16/forkenv/
> 
> On 1/17/19 2:12 PM, Slawomir Kaplonski wrote:
>> Hi,
>>
>> Recently we had one more issue related to oslo.privsep and pyroute2. 
>> This caused many failures in Neutron CI. See [1] for details. Now fix 
>> (more like a workaround) for this issue is merged [2]. So if You saw 
>> in Your patch failing tempest/scenario jobs and in failed tests there 
>> were issues with SSH to instance through floating IP, please now 
>> rebase Your patch. It should be better :)
>>
>> [1] https://bugs.launchpad.net/neutron/+bug/1811515
>> [2] https://review.openstack.org/#/c/631275/
>>
>> â??
>> Slawek Kaplonski
>> Senior software engineer
>> Red Hat
>>
>>
>