osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[cyborg] [nova] [placement] Shelve/unshelve behavior


On 8/19/2019 2:28 AM, Nadathur, Sundar wrote:
> Many of them worked as expected: pause/unpause, lock/unlock, 
> rescue/unrescue, etc. That is, the application in the VM can 
> successfully offload to the accelerator device before and after the 
> sequence.

I just wanted to point out that lock/unlock has nothing to do with the 
guest and is control-plane only in the compute API.

> 
> But, shelve/shelve-offloaded/unshelve sequence shows two discrepancies:
> 
> * After shelve, the instance is shut off in Libvirt but is shown as 
> ACTIVE in â??openstack server listâ??.

After a successful shelve/shelve offload, the server status should be 
SHELVED or SHELVED_OFFLOADED, not ACTIVE. Did something fail during the 
shelve and the instance was left in ACTIVE state rather than ERROR state?

> 
> * After unshelve, the PCI VF gets re-attached on VM startup and the 
> application inside the VM can access the accelerator device. However, 
> â??openstack resource provider usage show <rp-uuid>â?? shows the RC usage as 
> 0, i.e., there seems to be no claim in Placement for the resource in use.

What is the resource class? Something reported by cyborg on a nested 
resource provider under the compute node provider? Note that unshelve 
will go through the scheduler to pick a destination host (like the 
initial create) and call placement. If you're not persisting information 
about the resources to "claim" during scheduling on the RequestSpec, 
then that would need to be re-calculated and set on the RequestSpec 
prior to calling select_destinations during the unshelve flow in 
conductor. gibi's series to add move support for bandwidth-aware QoS 
ports is needing to do something similar. This patch is for resize/cold 
migration but you get the idea:

https://review.opendev.org/#/c/655112/

> 
> After shelve, the instance transitions to â??shelve-offloadedâ?? 
> automatically after the configured time interval. The resource class 
> usage is 0. This part is good. But, after the unshelve, one would think 
> the usage would be bumped up automatically.
> 

-- 

Thanks,

Matt