osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[qa][openstackclient] Debugging devstack slowness


On Wed, 2019-08-07 at 08:33 -0500, Ben Nemec wrote:
> 
> On 8/6/19 11:34 AM, Ben Nemec wrote:
> > 
> > 
> > On 8/6/19 10:49 AM, Clark Boylan wrote:
> > > On Tue, Aug 6, 2019, at 8:26 AM, Ben Nemec wrote:
> > > > Just a reminder that there is also
> > > > http://lists.openstack.org/pipermail/openstack-dev/2016-April/092546.html 
> > > > 
> > > > which was intended to address this same issue.
> > > > 
> > > > I toyed around with it a bit for TripleO installs back then and it did
> > > > seem to speed things up, but at the time there was a bug in our client
> > > > plugin where it was triggering a prompt for input that was problematic
> > > > with the server running in the background. I never really got back to it
> > > > once that was fixed. :-/
> > > 
> > > I'm not tied to any particular implementation. Mostly I wanted to show 
> > > that we can take this ~5 minute portion of devstack and turn it into a 
> > > 15 second portion of devstack by improving our use of the service APIs 
> > > (and possibly even further if we apply it to all of the api 
> > > interaction). Any idea how difficult it would be to get your client as 
> > > a service stuff running in devstack again?
> > 
> > I wish I could take credit, but this is actually Dan Berrange's work. :-)
> > 
> > > 
> > > I do not think we should make a one off change like I've done in my 
> > > POC. That will just end up being harder to understand and debug in the 
> > > future since it will be different than all of the other API 
> > > interaction. I like the idea of a manifest or feeding a longer lived 
> > > process api update commands as we can then avoid requesting new tokens 
> > > as well as pkg_resource startup time. Such a system could be used by 
> > > all of devstack as well (avoiding the "this bit is special" problem).
> > > 
> > > Is there any interest from the QA team in committing to an approach 
> > > and working to do a conversion? I don't want to commit any more time 
> > > to this myself unless there is strong interest in getting changes 
> > > merged (as I expect it will be a slow process weeding out places where 
> > > we've made bad assumptions particularly around plugins).
> > > 
> > > One of the things I found was that using names with osc results in 
> > > name to id lookups as well. We can avoid these entirely if we remember 
> > > name to id mappings instead (which my POC does). Any idea if your osc 
> > > as a service tool does or can do that? Probably have to be more 
> > > careful for scoping things in a tool like that as it may be reused by 
> > > people with name collisions across projects/users/groups/domains.
> > 
> > I don't believe this would handle name to id mapping. It's a very thin 
> > wrapper around the regular client code that just makes it persistent so 
> > we don't pay the startup costs every call. On the plus side that means 
> > it basically works like the vanilla client, on the minus side that means 
> > it may not provide as much improvement as a more targeted solution.
> > 
> > IIRC it's pretty easy to use, so I can try it out again and make sure it 
> > still works and still provides a performance benefit.
> 
> It still works and it still helps. Using the osc service cut about 3 
> minutes off my 21 minute devstack run. Subjectively I would say that 
> most of the time was being spent cloning and installing services and 
> their deps.
> 
> I guess the downside is that working around the OSC slowness in CI will 
> reduce developer motivation to fix the problem, which affects all users 
> too. Then again, this has been a problem for years and no one has fixed 
> it, so apparently that isn't a big enough lever to get things moving 
> anyway. :-/
using osc diretly i dont think the slowness is really perceptable from a human
stand point but it adds up in a ci run. there are large problems to kill with gate
slowness then fixing osc will solve be every little helps. i do agree however
that the gage is not a big enough motivater for people to fix osc slowness as
we can wait hours in some cases for jobs to start so 3 minutes is not really a consern
form a latency perspective but if we saved 3 mins on every run that might
in aggreaget reduce the latency problems we have.
> 
> > 
> > > 
> > > > 
> > > > On 7/26/19 6:53 PM, Clark Boylan wrote:
> > > > > Today I have been digging into devstack runtime costs to help Donny 
> > > > > Davis understand why tempest jobs sometimes timeout on the 
> > > > > FortNebula cloud. One thing I discovered was that the keystone user, 
> > > > > group, project, role, and domain setup [0] can take many minutes 
> > > > > [1][2] (in the examples here almost 5).
> > > > > 
> > > > > I've rewritten create_keystone_accounts to be a python tool [3] and 
> > > > > get the runtime for that subset of setup from ~100s to ~9s [4].  I 
> > > > > imagine that if we applied this to the other create_X_accounts 
> > > > > functions we would see similar results.
> > > > > 
> > > > > I think this is so much faster because we avoid repeated costs in 
> > > > > openstack client including: python process startup, pkg_resource 
> > > > > disk scanning to find entrypoints, and needing to convert names to 
> > > > > IDs via the API every time osc is run. Given my change shows this 
> > > > > can be so much quicker is there any interest in modifying devstack 
> > > > > to be faster here? And if so what do we think an appropriate 
> > > > > approach would be?
> > > > > 
> > > > > [0] 
> > > > > 
https://opendev.org/openstack/devstack/src/commit/6aeaceb0c4ef078d028fb6605cac2a37444097d8/stack.sh#L1146-L1161
> > > > >  
> > > > > 
> > > > > [1] 
> > > > > http://logs.openstack.org/05/672805/4/check/tempest-full/14f3211/job-output.txt.gz#_2019-07-26_12_31_04_488228
> > > > >  
> > > > > 
> > > > > [2] 
> > > > > http://logs.openstack.org/05/672805/4/check/tempest-full/14f3211/job-output.txt.gz#_2019-07-26_12_35_53_445059
> > > > >  
> > > > > 
> > > > > [3] https://review.opendev.org/#/c/673108/
> > > > > [4] 
> > > > > 
http://logs.openstack.org/08/673108/6/check/devstack-xenial/a4107d0/job-output.txt.gz#_2019-07-26_23_18_37_211013
> > > > >  
> > > > > 
> > > > > 
> > > > > Note the jobs compared above all ran on rax-dfw.
> > > > > 
> > > > > Clark
> > > > > 
> > > > 
> > > > 
> 
>