osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[queens][nova] nova host-evacuate errot


Ignazio,

One instance is stuck in error state not able to recover it. All other
instances are running now.

root at h004:~$ nova reset-state --all-tenants my-instance-1-2
Reset state for server my-instance-1-2 succeeded; new state is error

I have several compute nodes (14). I am not sure what is gli?
Live migration is not working, i have tried it was not throwing any errors.
But nothing seems to happen.
I am not completely sure, I haven't heard about gli before. (This setup is
deployed by someone else).

~Jay.

On Fri, Jul 12, 2019 at 6:12 AM Ignazio Cassano <ignaziocassano at gmail.com>
wrote:

> Jay,  for recovering vm state use the command nova reset-state....
>
> nova help reset-state to check the command requested parameters.
>
> Ad far as evacuation la concerned, how many compute nodes do gli have ?
> Instance live migration works?
> Are gli using shared cinder storage?
> Ignazio
>
> Il Gio 11 Lug 2019 20:51 Jay See <jayachander.it at gmail.com> ha scritto:
>
>> Thanks for explanation Ignazio.
>>
>> I have tried same same by trying to put the compute node on a failure
>> (echo 'c' > /proc/sysrq-trigger ). Compute node was stuck and I was not
>> able connect to it.
>> All the VMs are now in Error state.
>>
>> Running the host-evacaute was successful on controller node, but now I am
>> not able to use the VMs. Because they are all in error state now.
>>
>> root at h004:~$ nova host-evacuate h017
>>
>> +--------------------------------------+-------------------+---------------+
>> | Server UUID                          | Evacuate Accepted | Error
>> Message |
>>
>> +--------------------------------------+-------------------+---------------+
>> | f3545f7d-b85e-49ee-b407-333a4c5b5ab9 | True              |
>>   |
>> | 9094494b-cfa3-459b-8d51-d9aae0ea9636 | True              |
>>   |
>> | abe7075b-ac22-4168-bf3d-d302ba37d80e | True              |
>>   |
>> | c9919371-5f2e-4155-a01a-5f41d9c8b0e7 | True              |
>>   |
>> | ffd983bb-851e-4314-9d1d-375303c278f3 | True              |
>>   |
>>
>> +--------------------------------------+-------------------+---------------+
>>
>> Now I have restarted the compute node manually , now I am able to connect
>> to the compute node but VMs are still in Error state.
>> 1. Any ideas, how to recover the VMs?
>> 2. Are there any other methods to evacuate, as this method seems to be
>> not working in mitaka version.
>>
>> ~Jay.
>>
>> On Thu, Jul 11, 2019 at 1:33 PM Ignazio Cassano <ignaziocassano at gmail.com>
>> wrote:
>>
>>> Ok Jay,
>>> let me to describe my environment.
>>> I have an openstack made up of 3 controllers nodes ad several compute
>>> nodes.
>>> The controller nodes services are controlled by pacemaker and the
>>> compute nodes services are controlled by remote pacemaker.
>>> My hardware is Dell so I am using ipmi fencing device .
>>> I wrote a service controlled by pacemaker:
>>> this service controls if a compude node fails and for avoiding split
>>> brains if a compute node does nod respond on the management network and on
>>> storage network the stonith poweroff the node and then execute a nova
>>> host-evacuate.
>>>
>>> Anycase to have a simulation before writing the service I described
>>> above you can do as follows:
>>>
>>> connect on one compute node where some virtual machines are running
>>> run the command: echo 'c' > /proc/sysrq-trigger (it stops immediately
>>> the node like in case of failure)
>>> On a controller node run:  nova host-evacuate "name of failed compute
>>> node"
>>> Instances running on the failed compute node should be restarted on
>>> another compute node
>>>
>>>
>>> Ignazio
>>>
>>> Il giorno gio 11 lug 2019 alle ore 11:57 Jay See <
>>> jayachander.it at gmail.com> ha scritto:
>>>
>>>> Hi ,
>>>>
>>>> I have tried on a failed compute node which is in power off state now.
>>>> I have tried on a running compute node, no errors. But nothing happens.
>>>> On running compute node - Disabled the compute service and tried
>>>> migration also.
>>>>
>>>> May be I might have not followed proper steps. Just wanted to know the
>>>> steps you have followed. Otherwise, I was planning to manual migration also
>>>> if possible.
>>>> ~Jay.
>>>>
>>>> On Thu, Jul 11, 2019 at 11:52 AM Ignazio Cassano <
>>>> ignaziocassano at gmail.com> wrote:
>>>>
>>>>> Hi Jay,
>>>>> would you like to evacuate a failed compute node or evacuate a running
>>>>> compute node ?
>>>>>
>>>>> Ignazio
>>>>>
>>>>> Il giorno gio 11 lug 2019 alle ore 11:48 Jay See <
>>>>> jayachander.it at gmail.com> ha scritto:
>>>>>
>>>>>> Hi Ignazio,
>>>>>>
>>>>>> I am trying to evacuate the compute host on older version (mitaka).
>>>>>> Could please share the process you followed. I am not able to succeed
>>>>>> with openstack live-migration fails with error message (this is known issue
>>>>>> in older versions) and nova live-ligration - nothing happens even after
>>>>>> initiating VM migration. It is almost 4 days.
>>>>>>
>>>>>> ~Jay.
>>>>>>
>>>>>> On Thu, Jul 11, 2019 at 11:31 AM Ignazio Cassano <
>>>>>> ignaziocassano at gmail.com> wrote:
>>>>>>
>>>>>>> I am sorry.
>>>>>>> For simulating an host crash I used a wrong procedure.
>>>>>>> Using  "echo 'c' > /proc/sysrq-trigger" all work fine
>>>>>>>
>>>>>>> Il giorno gio 11 lug 2019 alle ore 11:01 Ignazio Cassano <
>>>>>>> ignaziocassano at gmail.com> ha scritto:
>>>>>>>
>>>>>>>> Hello All,
>>>>>>>> on ocata when I  poweroff a node with active instance , doing a
>>>>>>>> nova host-evacuate works  fine
>>>>>>>> and instances are restartd on an active node.
>>>>>>>> On queens it does non evacuate instances but nova-api reports for
>>>>>>>> each instance the following:
>>>>>>>>
>>>>>>>> 2019-07-11 10:19:54.745 13811 INFO nova.api.openstack.wsgi
>>>>>>>> [req-daad0a7d-87ce-41bf-b096-a70fc306db5c 0c7a2d6006614fe2b3e81e47377dd2a9
>>>>>>>> c26f8d35f85547c4add392a221af1aab - default default] HTTP exception thrown:
>>>>>>>> Cannot 'evacuate' instance e8485a5e-3623-4184-bcce-cafd56fa60b3 while it is
>>>>>>>> in task_state powering-off
>>>>>>>>
>>>>>>>> So it poweroff all instance on the failed node but does not start
>>>>>>>> them on active nodes
>>>>>>>>
>>>>>>>> What is changed ?
>>>>>>>> Ignazio
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> â??
>>>>>> P  *SAVE PAPER â?? Please do not print this e-mail unless absolutely
>>>>>> necessary.*
>>>>>>
>>>>>
>>>>
>>>> --
>>>> â??
>>>> P  *SAVE PAPER â?? Please do not print this e-mail unless absolutely
>>>> necessary.*
>>>>
>>>
>>
>> --
>> â??
>> P  *SAVE PAPER â?? Please do not print this e-mail unless absolutely
>> necessary.*
>>
>

-- 
â??
P  *SAVE PAPER â?? Please do not print this e-mail unless absolutely
necessary.*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190712/9d981bdb/attachment-0001.html>