osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Forcing restart of a worker node with running guest


On Thu, Jul 25, 2019 at 4:24 PM Mauricio Tavares <raubvogel at gmail.com> wrote:
>
> On Thu, Jul 25, 2019 at 3:44 PM Matt Riedemann <mriedemos at gmail.com> wrote:
> >
> > On 7/25/2019 11:04 AM, Mauricio Tavares wrote:
> > > I found out when it was taking 30 min to delete a guest. So, what I can
> > > do in a forceful way?
> > >
> > > 1. How to kill the guest? Can I kill it through virsh or openstack
> > > compute service will get sad?
> >
> > I would try to avoid this if possible, but you might need to kill the
> > guest in the hypervisor if doing it through nova won't get the job done.
> > What happens in nova-compute is undefined, but you'd probably see some
> > errors as expected if you're doing anything with that server at the
> > hypervisor layer, like trying to get the guest power state.
> >
> > What nova is tracking and what is in the hypervisor are different
> > things, and if you delete the guest out of band from nova, you'll need
> > to delete the server to sync the nova database. If the delete is stuck
> > in the compute API, thinking it's already deleting (I think we have an
> > old bug for that and force delete, and I hit something similar today),
> > you could try resetting the server status to ERROR [1] and then try
> > deleting it in the API again.
> >
> > > 2. What would happen if I stop the compute service?
> >
> > This won't really do anything to the guest in the hypervisor unless [2]
> > tries to change the guest state on restart. In my experience that option
> > has not been very reliable / predictable.
> >
> > > 3. What would happen if I get really annoyed and tell worker node to reboot?
> >
> > Pretty much the same as #2 from a nova perspective I think. Depending on
> > how libvirt and/or the guest domain is configured, the libvirt-guest
> > service might try to resume the guest.
> >
>       Does that mean it is using the standard libvirt config files?
>
> > [1] openstack server set --state error <server>
> > [2]
> > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.resume_guests_state_on_host_boot
> >
>       Thanks for the info. It turned out the issue is hardware
> related, so shutting the worker node down is way past the realm of
> possibility into the realm of it will happen today.
>
      Update: after I dealt with the hardware, I now was able to tell
the instance to go to silicon heaven:

[raub at openstack-hn ~(keystone_admin)]$ openstack server list
+--------------------------------------+----------+--------+--------------------------------------------+--------+-------------+
| ID                                   | Name     | Status | Networks
                                 | Image  | Flavor      |
+--------------------------------------+----------+--------+--------------------------------------------+--------+-------------+
| 1f76ca35-9d7f-4403-ae72-bcbfa1cc9b99 | desktop1 | ERROR  |
physnet1=10.20.20.66, 192.168.20.66      | centos | netro.small |
+--------------------------------------+----------+--------+--------------------------------------------+--------+-------------+
[raub at openstack-hn ~(keystone_admin)]$ openstack server delete desktop1
[raub at openstack-hn ~(keystone_admin)]$ openstack server list

[raub at openstack-hn ~(keystone_admin)]$

Thank you for providing all the different options to account for the
possible increasing degrees of things going bad! I will save this
message for next time...

> > --
> >
> > Thanks,
> >
> > Matt
> >