osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Forcing restart of a worker node with running guest


On Thu, Jul 25, 2019 at 3:44 PM Matt Riedemann <mriedemos at gmail.com> wrote:
>
> On 7/25/2019 11:04 AM, Mauricio Tavares wrote:
> > I found out when it was taking 30 min to delete a guest. So, what I can
> > do in a forceful way?
> >
> > 1. How to kill the guest? Can I kill it through virsh or openstack
> > compute service will get sad?
>
> I would try to avoid this if possible, but you might need to kill the
> guest in the hypervisor if doing it through nova won't get the job done.
> What happens in nova-compute is undefined, but you'd probably see some
> errors as expected if you're doing anything with that server at the
> hypervisor layer, like trying to get the guest power state.
>
> What nova is tracking and what is in the hypervisor are different
> things, and if you delete the guest out of band from nova, you'll need
> to delete the server to sync the nova database. If the delete is stuck
> in the compute API, thinking it's already deleting (I think we have an
> old bug for that and force delete, and I hit something similar today),
> you could try resetting the server status to ERROR [1] and then try
> deleting it in the API again.
>
> > 2. What would happen if I stop the compute service?
>
> This won't really do anything to the guest in the hypervisor unless [2]
> tries to change the guest state on restart. In my experience that option
> has not been very reliable / predictable.
>
> > 3. What would happen if I get really annoyed and tell worker node to reboot?
>
> Pretty much the same as #2 from a nova perspective I think. Depending on
> how libvirt and/or the guest domain is configured, the libvirt-guest
> service might try to resume the guest.
>
      Does that mean it is using the standard libvirt config files?

> [1] openstack server set --state error <server>
> [2]
> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.resume_guests_state_on_host_boot
>
      Thanks for the info. It turned out the issue is hardware
related, so shutting the worker node down is way past the realm of
possibility into the realm of it will happen today.

> --
>
> Thanks,
>
> Matt
>