[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[goals][upgrade-checkers] Retrospective

On Thu, 25 Apr 2019 at 23:50, Matt Riedemann <mriedemos at gmail.com> wrote:

> On 4/24/2019 8:21 AM, Mark Goddard wrote:
> > I put together a patch for kolla-ansible with support for upgrade checks
> > for some projects: https://review.opendev.org/644528. It's on the
> > backburner at the moment but I plan to return to it during the Train
> > cycle. Perhaps you could clarify a few things about expected usage.
> Cool. I'd probably try to pick one service (nova?) to start with before
> trying to bite off all of these in a single change (that review is kind
> of daunting).
> Also, as part of the community wide goal I wrote up reference docs in
> the nova tree [1] which might answer your questions with links for more
> details.
> >
> > 1. Should the tool be run using the new code? I would assume so.
> Depends on what you mean by "new code". When nova introduced this in
> Ocata it was meant to be run in a venv or container after upgrading the
> newton schema and data migrations to ocata, but before restarting the
> services with the ocata code and that's how grenade uses it. But the
> checks should also be idempotent and can be run as a
> post-install/upgrade verify step, which is how OSA uses it (and is
> described in the nova install docs [2]).
In kolla land, I mean should I use the container image for the current
release or the target release to execute the nova-status command. It sounds
like it's the latter, which also implies we're using the target version of
kolla/kolla-ansible. I hadn't twigged that we'd need to perform the schema
upgrade and online migrations.

> > 2. How would you expect this to be run with multiple projects? I was
> > thinking of adding a new command that performs upgrade checks for all
> > projects that would be read-only, then also performing the check again
> > as part of the upgrade procedure.
> Hmm, good question. This probably depends on each deployment tool and
> how they roll through services to do the upgrade. Obviously you'd want
> to run each project's checks as part of upgrading that service, but I
> guess you're looking for some kind of "should we even start this whole
> damn upgrade if we can detect early that there are going to be issues?".
> If the early run is read-only though - and I'm assuming by read-only you
> mean they won't cause a failure - how are you going to expose that there
> is a problem without failing? Would you make that configurable?
> Otherwise the checks themselves are supposed to be read-only and not
> change your data (they aren't the same thing as an online data migration
> routine for example).
If we need to have run the schema upgrade and migrations before the upgrade
check, I think that reduces the usefulness of a separate check operation. I
was thinking you might be able to run the checks against the system prior
to making any upgrade changes, but it seems not. I guess a separate check
after the upgrade might still be useful for diagnosing upgrade issues from

> > 3. For the warnings, would you recommend a -Werror style argument that
> > optionally flags up warnings as errors? Reporting non-fatal errors is
> > quite difficult in Ansible.
> OSA fails on any return codes that aren't 0 (success) or 1 (warning).
> It's hard to say when warning should be considered an error really. When
> writing these checks I think of warning as a case where you might be OK
> but we don't really know for sure, so it can aid in debugging
> upgrade-related issues after the fact but might not necessarily mean you
> shouldn't upgrade. mnaser has brought up the idea in the past of making
> the output more machine readable so tooling could pick and choose which
> things it considers to be a failure (assuming the return code was 1).
> That's an interesting idea but one I haven't put a lot of thought into.
> It might be as simple as outputting a unique code per check per project,
> sort of like the error code concept in the API guidelines [3] which the
> placement project is using [4].
Machine readable would be nice. Perhaps there's something we could do to
generate a report of the combined results.

> [1] https://docs.openstack.org/nova/latest/reference/upgrade-checks.html
> [2] https://docs.openstack.org/nova/latest/install/verify.html
> [3] https://specs.openstack.org/openstack/api-wg/guidelines/errors.html
> [4]
> https://opendev.org/openstack/placement/src/branch/master/placement/errors.py
> --
> Thanks,
> Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190426/44f7e076/attachment.html>