osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[qa][openstackclient] Debugging devstack slowness


These jobs seem to timeout from every provider on the regular[1], but the
issue is surely more apparent with tempest on FN. The result is quite a bit
of lost time. 361 jobs that run for several hours results in a little over
a 1000 hours of lost cycles.

[1]
http://logstash.openstack.org/#/dashboard/file/logstash.json?query=filename:%5C%22job-output.txt%5C%22%20AND%20message:%5C%22RUN%20END%20RESULT_TIMED_OUT%5C%22&from=7d

On Thu, Aug 1, 2019 at 5:01 AM Ian Wienand <iwienand at redhat.com> wrote:

> On Fri, Jul 26, 2019 at 04:53:28PM -0700, Clark Boylan wrote:
> > Given my change shows this can be so much quicker is there any
> > interest in modifying devstack to be faster here? And if so what do
> > we think an appropriate approach would be?
>
> My first concern was if anyone considered openstack-client setting
> these things up as actually part of the testing.  I'd say not,
> comments in [1] suggest similar views.
>
> My second concern is that we do keep sufficient track of complexity v
> speed; obviously doing things in a sequential manner via a script is
> pretty simple to follow and as we start putting things into scripts we
> make it harder to debug when a monoscript dies and you have to start
> pulling apart where it was.  With just a little json fiddling we can
> currently pull good stats from logstash ([2]) so I think as we go it
> would be good to make sure we account for the time using appropriate
> wrappers, etc.
>
> Then the third concern is not to break anything for plugins --
> devstack has a very very loose API which basically relies on plugin
> authors using a combination of good taste and copying other code to
> decide what's internal or not.
>
> Which made me start thinking I wonder if we look at this closely, even
> without replacing things we might make inroads?
>
> For example [3]; it seems like SERVICE_DOMAIN_NAME is never not
> default, so the get_or_create_domain call is always just overhead (the
> result is never used).
>
> Then it seems that in the gate, basically all of the "get_or_create"
> calls will really just be "create" calls?  Because we're always
> starting fresh.  So we could cut out about half of the calls there
> pre-checking if we know we're under zuul (proof-of-concept [4]).
>
> Then we have blocks like:
>
>   get_or_add_user_project_role $member_role $demo_user $demo_project
>   get_or_add_user_project_role $admin_role $admin_user $demo_project
>   get_or_add_user_project_role $another_role $demo_user $demo_project
>   get_or_add_user_project_role $member_role $demo_user $invis_project
>
> If we wrapped that in something like
>
>  start_osc_session
>  ...
>  end_osc_session
>
> which sets a variable that means instead of calling directly, those
> functions write their arguments to a tmp file.  Then at the end call,
> end_osc_session does
>
>  $ osc "$(< tmpfile)"
>
> and uses the inbuilt batching?  If that had half the calls by skipping
> the "get_or" bit, and used common authentication from batching, would
> that help?
>
> And then I don't know if all the projects and groups are required for
> every devstack run?  Maybe someone skilled in the art could do a bit
> of an audit and we could cut more of that out too?
>
> So I guess my point is that maybe we could tweak what we have a bit to
> make some immediate wins, before anyone has to rewrite too much?
>
> -i
>
> [1] https://review.opendev.org/673018
> [2] https://ethercalc.openstack.org/rzuhevxz7793
> [3] https://review.opendev.org/673941
> [4] https://review.opendev.org/673936
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190801/78ead705/attachment.html>