osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[queens][nova] nova host-evacuate errot


Yes, cinder is running.

root at h017:~$ service --status-all | grep cinder
[ + ]  cinder-volume

On Fri, Jul 12, 2019 at 11:53 AM Ignazio Cassano <ignaziocassano at gmail.com>
wrote:

> Sorry ...the question was : how many compute nodes do you have ?
> instead of how many compute nodes do gli have...
>
>
> Anycase;
> Did you configured cinder ?
>
> Il giorno ven 12 lug 2019 alle ore 11:26 Jay See <jayachander.it at gmail.com>
> ha scritto:
>
>> Ignazio,
>>
>> One instance is stuck in error state not able to recover it. All other
>> instances are running now.
>>
>> root at h004:~$ nova reset-state --all-tenants my-instance-1-2
>> Reset state for server my-instance-1-2 succeeded; new state is error
>>
>> I have several compute nodes (14). I am not sure what is gli?
>> Live migration is not working, i have tried it was not throwing any
>> errors. But nothing seems to happen.
>> I am not completely sure, I haven't heard about gli before. (This setup
>> is deployed by someone else).
>>
>> ~Jay.
>>
>> On Fri, Jul 12, 2019 at 6:12 AM Ignazio Cassano <ignaziocassano at gmail.com>
>> wrote:
>>
>>> Jay,  for recovering vm state use the command nova reset-state....
>>>
>>> nova help reset-state to check the command requested parameters.
>>>
>>> Ad far as evacuation la concerned, how many compute nodes do gli have ?
>>> Instance live migration works?
>>> Are gli using shared cinder storage?
>>> Ignazio
>>>
>>> Il Gio 11 Lug 2019 20:51 Jay See <jayachander.it at gmail.com> ha scritto:
>>>
>>>> Thanks for explanation Ignazio.
>>>>
>>>> I have tried same same by trying to put the compute node on a failure
>>>> (echo 'c' > /proc/sysrq-trigger ). Compute node was stuck and I was not
>>>> able connect to it.
>>>> All the VMs are now in Error state.
>>>>
>>>> Running the host-evacaute was successful on controller node, but now I
>>>> am not able to use the VMs. Because they are all in error state now.
>>>>
>>>> root at h004:~$ nova host-evacuate h017
>>>>
>>>> +--------------------------------------+-------------------+---------------+
>>>> | Server UUID                          | Evacuate Accepted | Error
>>>> Message |
>>>>
>>>> +--------------------------------------+-------------------+---------------+
>>>> | f3545f7d-b85e-49ee-b407-333a4c5b5ab9 | True              |
>>>>     |
>>>> | 9094494b-cfa3-459b-8d51-d9aae0ea9636 | True              |
>>>>     |
>>>> | abe7075b-ac22-4168-bf3d-d302ba37d80e | True              |
>>>>     |
>>>> | c9919371-5f2e-4155-a01a-5f41d9c8b0e7 | True              |
>>>>     |
>>>> | ffd983bb-851e-4314-9d1d-375303c278f3 | True              |
>>>>     |
>>>>
>>>> +--------------------------------------+-------------------+---------------+
>>>>
>>>> Now I have restarted the compute node manually , now I am able to
>>>> connect to the compute node but VMs are still in Error state.
>>>> 1. Any ideas, how to recover the VMs?
>>>> 2. Are there any other methods to evacuate, as this method seems to be
>>>> not working in mitaka version.
>>>>
>>>> ~Jay.
>>>>
>>>> On Thu, Jul 11, 2019 at 1:33 PM Ignazio Cassano <
>>>> ignaziocassano at gmail.com> wrote:
>>>>
>>>>> Ok Jay,
>>>>> let me to describe my environment.
>>>>> I have an openstack made up of 3 controllers nodes ad several compute
>>>>> nodes.
>>>>> The controller nodes services are controlled by pacemaker and the
>>>>> compute nodes services are controlled by remote pacemaker.
>>>>> My hardware is Dell so I am using ipmi fencing device .
>>>>> I wrote a service controlled by pacemaker:
>>>>> this service controls if a compude node fails and for avoiding split
>>>>> brains if a compute node does nod respond on the management network and on
>>>>> storage network the stonith poweroff the node and then execute a nova
>>>>> host-evacuate.
>>>>>
>>>>> Anycase to have a simulation before writing the service I described
>>>>> above you can do as follows:
>>>>>
>>>>> connect on one compute node where some virtual machines are running
>>>>> run the command: echo 'c' > /proc/sysrq-trigger (it stops immediately
>>>>> the node like in case of failure)
>>>>> On a controller node run:  nova host-evacuate "name of failed compute
>>>>> node"
>>>>> Instances running on the failed compute node should be restarted on
>>>>> another compute node
>>>>>
>>>>>
>>>>> Ignazio
>>>>>
>>>>> Il giorno gio 11 lug 2019 alle ore 11:57 Jay See <
>>>>> jayachander.it at gmail.com> ha scritto:
>>>>>
>>>>>> Hi ,
>>>>>>
>>>>>> I have tried on a failed compute node which is in power off state now.
>>>>>> I have tried on a running compute node, no errors. But
>>>>>> nothing happens.
>>>>>> On running compute node - Disabled the compute service and tried
>>>>>> migration also.
>>>>>>
>>>>>> May be I might have not followed proper steps. Just wanted to know
>>>>>> the steps you have followed. Otherwise, I was planning to manual migration
>>>>>> also if possible.
>>>>>> ~Jay.
>>>>>>
>>>>>> On Thu, Jul 11, 2019 at 11:52 AM Ignazio Cassano <
>>>>>> ignaziocassano at gmail.com> wrote:
>>>>>>
>>>>>>> Hi Jay,
>>>>>>> would you like to evacuate a failed compute node or evacuate a
>>>>>>> running compute node ?
>>>>>>>
>>>>>>> Ignazio
>>>>>>>
>>>>>>> Il giorno gio 11 lug 2019 alle ore 11:48 Jay See <
>>>>>>> jayachander.it at gmail.com> ha scritto:
>>>>>>>
>>>>>>>> Hi Ignazio,
>>>>>>>>
>>>>>>>> I am trying to evacuate the compute host on older version (mitaka).
>>>>>>>> Could please share the process you followed. I am not able to
>>>>>>>> succeed with openstack live-migration fails with error message (this is
>>>>>>>> known issue in older versions) and nova live-ligration - nothing happens
>>>>>>>> even after initiating VM migration. It is almost 4 days.
>>>>>>>>
>>>>>>>> ~Jay.
>>>>>>>>
>>>>>>>> On Thu, Jul 11, 2019 at 11:31 AM Ignazio Cassano <
>>>>>>>> ignaziocassano at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I am sorry.
>>>>>>>>> For simulating an host crash I used a wrong procedure.
>>>>>>>>> Using  "echo 'c' > /proc/sysrq-trigger" all work fine
>>>>>>>>>
>>>>>>>>> Il giorno gio 11 lug 2019 alle ore 11:01 Ignazio Cassano <
>>>>>>>>> ignaziocassano at gmail.com> ha scritto:
>>>>>>>>>
>>>>>>>>>> Hello All,
>>>>>>>>>> on ocata when I  poweroff a node with active instance , doing a
>>>>>>>>>> nova host-evacuate works  fine
>>>>>>>>>> and instances are restartd on an active node.
>>>>>>>>>> On queens it does non evacuate instances but nova-api reports for
>>>>>>>>>> each instance the following:
>>>>>>>>>>
>>>>>>>>>> 2019-07-11 10:19:54.745 13811 INFO nova.api.openstack.wsgi
>>>>>>>>>> [req-daad0a7d-87ce-41bf-b096-a70fc306db5c 0c7a2d6006614fe2b3e81e47377dd2a9
>>>>>>>>>> c26f8d35f85547c4add392a221af1aab - default default] HTTP exception thrown:
>>>>>>>>>> Cannot 'evacuate' instance e8485a5e-3623-4184-bcce-cafd56fa60b3 while it is
>>>>>>>>>> in task_state powering-off
>>>>>>>>>>
>>>>>>>>>> So it poweroff all instance on the failed node but does not start
>>>>>>>>>> them on active nodes
>>>>>>>>>>
>>>>>>>>>> What is changed ?
>>>>>>>>>> Ignazio
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> â??
>>>>>>>> P  *SAVE PAPER â?? Please do not print this e-mail unless absolutely
>>>>>>>> necessary.*
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> â??
>>>>>> P  *SAVE PAPER â?? Please do not print this e-mail unless absolutely
>>>>>> necessary.*
>>>>>>
>>>>>
>>>>
>>>> --
>>>> â??
>>>> P  *SAVE PAPER â?? Please do not print this e-mail unless absolutely
>>>> necessary.*
>>>>
>>>
>>
>> --
>> â??
>> P  *SAVE PAPER â?? Please do not print this e-mail unless absolutely
>> necessary.*
>>
>

-- 
â??
P  *SAVE PAPER â?? Please do not print this e-mail unless absolutely
necessary.*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190712/6915c7a5/attachment.html>