osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

More upgrade issues with PCPUs - input wanted


On Thu, 2019-08-15 at 13:21 +0100, Stephen Finucane wrote:
> tl;dr: Is breaking booting of pinned instances on Stein compute nodes
> in a Train deployment an acceptable thing to do, and if not, how do we
> best handle the VCPU->PCPU migration in Train?
> 
> I've been working through the cpu-resources spec [1] and have run into
> a tricky issue I'd like some input on. In short, this spec means that
> pinned instances (i.e. 'hw:cpu_policy=dedicated') will now start
> consuming a new resources type, PCPU, instead of VCPU. Many things need
> to change to make this happen but the key changes are:
> 
>    1. The scheduler needs to start modifying requests for pinned instances
>       to request PCPU resources instead of VCPU resources
>    2. The libvirt driver needs to start reporting PCPU resources
>    3. The libvirt driver needs to do a reshape, moving all existing
>       allocations of VCPUs to PCPUs, if the instance holding that
>       allocation is pinned
> 
> The first two of these steps presents an issue for which we have a
> solution, but the solutions we've chosen are now resulting in this new
> issue.
> 
>  * For (1), the translation of VCPU to PCPU in the scheduler means
>    compute nodes must now report PCPU in order for a pinned instance to
>    land on that host. Since controllers are upgraded before compute
>    nodes and all compute nodes aren't necessarily upgraded in one go
>    (particularly for edge or other large or multi-cell deployments),
>    this can mean there will be a period of time where there are very
>    few or no hosts available on which to schedule pinned instances.
> 
>  * For (2), we're hampered by the fact that there is no clear way to
>    determine if a host is used for pinned instances or not. Because of
>    this, we can't determine if a host should be reporting PCPU or VCPU
>    inventory.
> 
> The solution we have for the issues with (1) is to add a workaround
> option that would disable this translation, allowing operators time to
> upgrade all their compute nodes to report PCPU resources before
> anything starts using them. For (2), we've decided to temporarily (i.e.
> for one release or until configuration is updated) report both, in the
> expectation that everyone using pinned instances has followed the long-
> standing advice to separate hosts intended for pinned instances from
> those intended for unpinned instances using host aggregates (e.g. even
> if we started reporting PCPUs on a host, nothing would consume that due
> to 'pinned=False' aggregate metadata or similar). These actually
> benefit each other, since if instances are still consuming VCPUs then
> the hosts need to continue reporting VCPUs. However, both interfere
> with our ability to do the reshape.
> 
> Normally, a reshape is a one time thing. The way we'd planned to
> determine if a reshape was necessary was to check if PCPU inventory was
> registered against the host and, if not, whether there were any pinned
> instances on the host. If PCPU inventory was not available and there
> were pinned instances, we would update the allocations for these
> instances so that they would be consuming PCPUs instead of VCPUs and
> then update the inventory. This is problematic though, because our
> solution for the issue with (1) means pinned instances can continue to
> request VCPU resources, which in turn means we could end up with some
> pinned instances on a host consuming PCPU and other consuming VCPU.
> That obviously can't happen, so we need to change tacks slightly. The
> two obvious solutions would be to either (a) remove the workaround
> option so the scheduler would immediately start requesting PCPUs and
> just advise operators to upgrade their hosts for pinned instances asap
> or (b) add a different option, defaulting to True, that would apply to
> both the scheduler and compute nodes and prevent not only translation
> of flavors in the scheduler but also the reporting PCPUs and reshaping
> of allocations until disabled.
> 
> I'm currently leaning towards (a) because it's a *lot* simpler, far
> more robust (IMO) and lets us finish this effort in a single cycle, but
> I imagine this could make upgrades very painful for operators if they
> can't fast track their compute node upgrades. (b) is more complex and
> would have some constraints, chief among them being that the option
> would have to be disabled at some point post-release and would have to
> be disabled on the scheduler first (to prevent the mismash or VCPU and
> PCPU resource allocations) above. It also means this becomes a three
> cycle effort at minimum, since this new option will default to True in
> Train, before defaulting to False and being deprecated in U and finally
> being removed in V. As such, I'd like some input, particularly from
> operators using pinned instances in larger deployments. What are your
> thoughts, and are there any potential solutions that I'm missing here?

if we go with (b) i would move the config your of the workarond section
to the default seachtion and call it pcpus_in_placement and have it default
false in train. i.e. we dont enabel the featue in train by default. in installer
tools we would update them to set the configvale to true so new installs use this feature.
in U we would cahnge teh default to True and deprecate as you said and finally remove in V.

we should add a nova status check too for the U upgrade so that operators can define the correct config
values before the upgrade.

if we go with (a) then we would want to add that check for train i think.
operators would need to add the new config options to all host before they upgrade.
this could be problematic in some cases as the meaning of cpu_shared_set changes between stine and train.
in stine it is used for emultor threads only, in train it will be used for all floating vms vcpus.
(a) would also require you to upgrade all host in one go more or less.

for fast forward upgrades this is requried anyway since we cant have contol plane manging agent
that are older then n-1 but not all tool support FFU  or recommend it.


> 
> Cheers,
> Stephen
> 
> [1] https://specs.openstack.org/openstack/nova-specs/specs/train/approved/cpu-resources.html
> 
>