osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[qa][openstackclient] Debugging devstack slowness


Just for reference FortNebula does 73-80 jobs an hour, so that's 1700(ish)
jobs a day -3 minutes per job.

That is 5200(ish) cycle minutes a day.  Or about 4 days worth of computing
time.

If there can be a fix that saves minutes, its surely worth it.



On Wed, Aug 7, 2019 at 1:18 PM Sean Mooney <smooney at redhat.com> wrote:

> On Wed, 2019-08-07 at 10:11 -0500, Ben Nemec wrote:
> >
> > On 8/7/19 9:37 AM, Sean Mooney wrote:
> > > On Wed, 2019-08-07 at 08:33 -0500, Ben Nemec wrote:
> > > >
> > > > On 8/6/19 11:34 AM, Ben Nemec wrote:
> > > > >
> > > > >
> > > > > On 8/6/19 10:49 AM, Clark Boylan wrote:
> > > > > > On Tue, Aug 6, 2019, at 8:26 AM, Ben Nemec wrote:
> > > > > > > Just a reminder that there is also
> > > > > > >
> http://lists.openstack.org/pipermail/openstack-dev/2016-April/092546.html
> > > > > > >
> > > > > > > which was intended to address this same issue.
> > > > > > >
> > > > > > > I toyed around with it a bit for TripleO installs back then
> and it did
> > > > > > > seem to speed things up, but at the time there was a bug in
> our client
> > > > > > > plugin where it was triggering a prompt for input that was
> problematic
> > > > > > > with the server running in the background. I never really got
> back to it
> > > > > > > once that was fixed. :-/
> > > > > >
> > > > > > I'm not tied to any particular implementation. Mostly I wanted
> to show
> > > > > > that we can take this ~5 minute portion of devstack and turn it
> into a
> > > > > > 15 second portion of devstack by improving our use of the
> service APIs
> > > > > > (and possibly even further if we apply it to all of the api
> > > > > > interaction). Any idea how difficult it would be to get your
> client as
> > > > > > a service stuff running in devstack again?
> > > > >
> > > > > I wish I could take credit, but this is actually Dan Berrange's
> work. :-)
> > > > >
> > > > > >
> > > > > > I do not think we should make a one off change like I've done in
> my
> > > > > > POC. That will just end up being harder to understand and debug
> in the
> > > > > > future since it will be different than all of the other API
> > > > > > interaction. I like the idea of a manifest or feeding a longer
> lived
> > > > > > process api update commands as we can then avoid requesting new
> tokens
> > > > > > as well as pkg_resource startup time. Such a system could be
> used by
> > > > > > all of devstack as well (avoiding the "this bit is special"
> problem).
> > > > > >
> > > > > > Is there any interest from the QA team in committing to an
> approach
> > > > > > and working to do a conversion? I don't want to commit any more
> time
> > > > > > to this myself unless there is strong interest in getting changes
> > > > > > merged (as I expect it will be a slow process weeding out places
> where
> > > > > > we've made bad assumptions particularly around plugins).
> > > > > >
> > > > > > One of the things I found was that using names with osc results
> in
> > > > > > name to id lookups as well. We can avoid these entirely if we
> remember
> > > > > > name to id mappings instead (which my POC does). Any idea if
> your osc
> > > > > > as a service tool does or can do that? Probably have to be more
> > > > > > careful for scoping things in a tool like that as it may be
> reused by
> > > > > > people with name collisions across projects/users/groups/domains.
> > > > >
> > > > > I don't believe this would handle name to id mapping. It's a very
> thin
> > > > > wrapper around the regular client code that just makes it
> persistent so
> > > > > we don't pay the startup costs every call. On the plus side that
> means
> > > > > it basically works like the vanilla client, on the minus side that
> means
> > > > > it may not provide as much improvement as a more targeted solution.
> > > > >
> > > > > IIRC it's pretty easy to use, so I can try it out again and make
> sure it
> > > > > still works and still provides a performance benefit.
> > > >
> > > > It still works and it still helps. Using the osc service cut about 3
> > > > minutes off my 21 minute devstack run. Subjectively I would say that
> > > > most of the time was being spent cloning and installing services and
> > > > their deps.
> > > >
> > > > I guess the downside is that working around the OSC slowness in CI
> will
> > > > reduce developer motivation to fix the problem, which affects all
> users
> > > > too. Then again, this has been a problem for years and no one has
> fixed
> > > > it, so apparently that isn't a big enough lever to get things moving
> > > > anyway. :-/
> > >
> > > using osc diretly i dont think the slowness is really perceptable from
> a human
> > > stand point but it adds up in a ci run. there are large problems to
> kill with gate
> > > slowness then fixing osc will solve be every little helps. i do agree
> however
> > > that the gage is not a big enough motivater for people to fix osc
> slowness as
> > > we can wait hours in some cases for jobs to start so 3 minutes is not
> really a consern
> > > form a latency perspective but if we saved 3 mins on every run that
> might
> > > in aggreaget reduce the latency problems we have.
> >
> > I find the slowness very noticeable in interactive use. It adds
> > something like 2 seconds to a basic call like image list that returns
> > almost instantly in the OSC interactive shell where there is no startup
> > overhead. From my performance days, any latency over 1 second was
> > considered unacceptable for an interactive call. The interactive shell
> > does help with that if I'm doing a bunch of calls in a row though.
> well that was kind of my point when we write sripts we invoke it over and
> over again.
> if i need to use osc to do lots of commands for some reason i generaly
> enter the interactive
> mode. the interactive mode already masks the pain so anytime it has bother
> me in the past
> i have just ended up using it instead.
>
> its been a long time since i looked at this but it think there were two
> reasons it is slow on startup.
> one is the need to get the token for each request and the other was
> related to the way we scan
> for plugins. i honestly dont know if either have imporved but the
> interactive shell elimiates both
> as issues.
> >
> > That said, you're right that 3 minutes multiplied by the number of jobs
> > we run per day is significant. Picking 1000 as a round number (and I'm
> > pretty sure we run a _lot_ more than that per day), a 3 minute decrease
> > in runtime per job would save about 50 hours of CI time in total. Small
> > things add up at scale. :-)
> yep it defintly does.
> >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > On 7/26/19 6:53 PM, Clark Boylan wrote:
> > > > > > > > Today I have been digging into devstack runtime costs to
> help Donny
> > > > > > > > Davis understand why tempest jobs sometimes timeout on the
> > > > > > > > FortNebula cloud. One thing I discovered was that the
> keystone user,
> > > > > > > > group, project, role, and domain setup [0] can take many
> minutes
> > > > > > > > [1][2] (in the examples here almost 5).
> > > > > > > >
> > > > > > > > I've rewritten create_keystone_accounts to be a python tool
> [3] and
> > > > > > > > get the runtime for that subset of setup from ~100s to ~9s
> [4].  I
> > > > > > > > imagine that if we applied this to the other
> create_X_accounts
> > > > > > > > functions we would see similar results.
> > > > > > > >
> > > > > > > > I think this is so much faster because we avoid repeated
> costs in
> > > > > > > > openstack client including: python process startup,
> pkg_resource
> > > > > > > > disk scanning to find entrypoints, and needing to convert
> names to
> > > > > > > > IDs via the API every time osc is run. Given my change shows
> this
> > > > > > > > can be so much quicker is there any interest in modifying
> devstack
> > > > > > > > to be faster here? And if so what do we think an appropriate
> > > > > > > > approach would be?
> > > > > > > >
> > > > > > > > [0]
> > > > > > > >
> > >
> > >
> https://opendev.org/openstack/devstack/src/commit/6aeaceb0c4ef078d028fb6605cac2a37444097d8/stack.sh#L1146-L1161
> > > > > > > >
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
>
> http://logs.openstack.org/05/672805/4/check/tempest-full/14f3211/job-output.txt.gz#_2019-07-26_12_31_04_488228
> > > > > > > >
> > > > > > > >
> > > > > > > > [2]
> > > > > > > >
>
> http://logs.openstack.org/05/672805/4/check/tempest-full/14f3211/job-output.txt.gz#_2019-07-26_12_35_53_445059
> > > > > > > >
> > > > > > > >
> > > > > > > > [3] https://review.opendev.org/#/c/673108/
> > > > > > > > [4]
> > > > > > > >
> > >
> > >
> http://logs.openstack.org/08/673108/6/check/devstack-xenial/a4107d0/job-output.txt.gz#_2019-07-26_23_18_37_211013
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Note the jobs compared above all ran on rax-dfw.
> > > > > > > >
> > > > > > > > Clark
> > > > > > > >
> > > > > > >
> > > > > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190807/61f082e8/attachment-0001.html>