[nova] The pros/cons for libvirt persistent assignment and DB persistent assignment.
On 8/23/2019 3:43 AM, Alex Xu wrote:
> 2. Supporting same host cold migration
> This is the point I said below.
> For the same host resize, instance_uuid + flavor_id is working very
> well. But it can't support the same host cold migration. And yes,
> migration allocation can fix it. But also as you said, do we need to
> support the same host cold migration?
I see no reason to try and bend over backward to support same host cold
migration since, as I said, the only virt driver that supports that
today (and has been the only one for a long time - maybe forever?) is
the vmware driver which isn't supporting any of these more advanced
flows (VGPU, VPMEM, PCPU).
> If the answer is no, then we needn't bother it. instance_uuid +
> flavor_id is much simple. If the answer is yes, right, we can put it
> into the RT. But it will be complex, maybe we need a data model like the
> DB way proposal to pass the virt driver/platform specific info between
> RT and virt driver. Also, think about the case, we need to check if
> there is any incomplete live-migration, we need to do a cleanup for all
> free vpmems, since we lost the allocation info for live-migration. Then
> we need a virt dirver interface to trigger that cleanup, pretty sure I
> don't want to call it as driver.cleanup_vpmems(). We also need to change
> the existing driver.spawn method, to pass the assigned resource into
> virt driver. Also thinking about the case of interrupted migration, I
> guess there is no way to switch the
> I also remember Dan said, it isn't good to not support same host cold
Again, the libvirt driver, as far as I know, has never supported same
host cold migration, nor is anyone working on that, so I don't see where
the need to make that support happen now is coming from. I think it
should be ignored for the sake of these conversations.
> I think you are right, we can use RT.tracked_instances and
> RT.tracked_migrations. Then it isn't on the fly anymore.
> There are two existing bugs should be fixed.
> 1. The orphaned instance isn't in RT.tracked_instance. Although there is
> resource consuming for orphaned instance
> the virt driver interface
> https://github.com/openstack/nova/blob/62f6a0a1bc6c4b24621e1c2e927177f99501bef3/nova/compute/resource_tracker.py#L1447Â doesn't
> implement by most of virt driver.
For the latter, get_per_instance_usage, that's only implemented by the
xenapi driver which is on the path to being deprecated by the end of
Train at this point anyway:
so I wouldn't worry too much about that one.
In summary I'm not going to block attempts at proposal #2. As you said,
there are existing bugs which should be handled, though some likely
won't ever be completely fixed (automatic cleanup and recovery from live
migration failures - the live migration methods are huge and have a lot
of points of failure, so properly rolling back from all of those is
going to be a big undertaking in test and review time, and I don't see
either happening at this stage).
I think one of the motivations to keep VPMEM resource tracking isolated
to the hypervisor was just to get something quick and dirty working with
a minimal amount of impact to other parts of nova, like the data model,
ResourceTracker, etc. If proposal #2 also solves issues for VGPUs and
PCPUs then there is more justification for doing it.
Either way I'm not opposed to the #2 proposal so if that's what the
people that are working on this want, go ahead. I personally don't plan
on investing much review time in this series either way though, so
that's kind of why I'm apathetic about this.