[cyborg] [nova] [placement] Shelve/unshelve behavior
On 8/19/2019 2:28 AM, Nadathur, Sundar wrote:
> Many of them worked as expected: pause/unpause, lock/unlock,
> rescue/unrescue, etc. That is, the application in the VM can
> successfully offload to the accelerator device before and after the
I just wanted to point out that lock/unlock has nothing to do with the
guest and is control-plane only in the compute API.
> But, shelve/shelve-offloaded/unshelve sequence shows two discrepancies:
> * After shelve, the instance is shut off in Libvirt but is shown as
> ACTIVE in â??openstack server listâ??.
After a successful shelve/shelve offload, the server status should be
SHELVED or SHELVED_OFFLOADED, not ACTIVE. Did something fail during the
shelve and the instance was left in ACTIVE state rather than ERROR state?
> * After unshelve, the PCI VF gets re-attached on VM startup and the
> application inside the VM can access the accelerator device. However,
> â??openstack resource provider usage show <rp-uuid>â?? shows the RC usage as
> 0, i.e., there seems to be no claim in Placement for the resource in use.
What is the resource class? Something reported by cyborg on a nested
resource provider under the compute node provider? Note that unshelve
will go through the scheduler to pick a destination host (like the
initial create) and call placement. If you're not persisting information
about the resources to "claim" during scheduling on the RequestSpec,
then that would need to be re-calculated and set on the RequestSpec
prior to calling select_destinations during the unshelve flow in
conductor. gibi's series to add move support for bandwidth-aware QoS
ports is needing to do something similar. This patch is for resize/cold
migration but you get the idea:
> After shelve, the instance transitions to â??shelve-offloadedâ??
> automatically after the configured time interval. The resource class
> usage is 0. This part is good. But, after the unshelve, one would think
> the usage would be bumped up automatically.