[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: KVM HA BUG 4.11.1.0 centos 7


Hi,

Ive added more hosts and enabled ha on all of them. Now i shoot down
node cs-hv-06, which is running r-199. Here
are the logs iam gettin.

--
2018-08-13 12:12:51,402 DEBUG [c.c.h.HighAvailabilityManagerImpl]
(pool-5-thread-1:null) (logid:b71a09c7) Notifying HA Mgr of to restart
vm 199-r-199-VM
2018-08-13 12:12:51,410 INFO  [c.c.h.HighAvailabilityManagerImpl]
(pool-5-thread-1:null) (logid:b71a09c7) Schedule vm for HA: 
VM[DomainRouter|r-199-VM]
2018-08-13 12:12:51,418 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-3:ctx-bd6bd4a1 work-48) (logid:e99fa8f3) Processing work
HAWork[48-HA-199-Running-Investigating]
2018-08-13 12:12:51,421 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-3:ctx-bd6bd4a1 work-48) (logid:e99fa8f3) HA on
VM[DomainRouter|r-199-VM]
2018-08-13 12:12:51,424 DEBUG [c.c.h.CheckOnAgentInvestigator]
(HA-Worker-3:ctx-bd6bd4a1 work-48) (logid:e99fa8f3) Unable to reach the
agent for VM[DomainRouter|r-199-VM]: Resource [Host:32] is unreachable:
Host 32: Host with specified id is not in the right state: Down
2018-08-13 12:12:51,424 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-3:ctx-bd6bd4a1 work-48) (logid:e99fa8f3) SimpleInvestigator
could not find VM[DomainRouter|r-199-VM]
2018-08-13 12:12:51,424 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-3:ctx-bd6bd4a1 work-48) (logid:e99fa8f3)
XenServerInvestigator could not find VM[DomainRouter|r-199-VM]
2018-08-13 12:12:51,426 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-3:ctx-bd6bd4a1 work-48) (logid:e99fa8f3) KVMInvestigator
found VM[DomainRouter|r-199-VM] to be alive? true
2018-08-13 12:12:51,426 DEBUG [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-3:ctx-bd6bd4a1 work-48) (logid:e99fa8f3) VM r-199-VM is found
to be alive by KVMInvestigator
2018-08-13 12:12:51,426 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-3:ctx-bd6bd4a1 work-48) (logid:e99fa8f3) Rescheduling work
HAWork[48-HA-199-Running-Investigating] to try again at Mon Aug 13
12:13:52 CEST 2018
2018-08-13 12:14:51,431 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-1:ctx-631d0fad work-48) (logid:85d7428e) Processing work
HAWork[48-HA-199-Running-Investigating]
2018-08-13 12:14:51,433 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-1:ctx-631d0fad work-48) (logid:85d7428e) HA on
VM[DomainRouter|r-199-VM]
2018-08-13 12:14:51,436 DEBUG [c.c.h.CheckOnAgentInvestigator]
(HA-Worker-1:ctx-631d0fad work-48) (logid:85d7428e) Unable to reach the
agent for VM[DomainRouter|r-199-VM]: Resource [Host:32] is unreachable:
Host 32: Host with specified id is not in the right state: Down
2018-08-13 12:14:51,436 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-1:ctx-631d0fad work-48) (logid:85d7428e) SimpleInvestigator
could not find VM[DomainRouter|r-199-VM]
2018-08-13 12:14:51,436 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-1:ctx-631d0fad work-48) (logid:85d7428e)
XenServerInvestigator could not find VM[DomainRouter|r-199-VM]
2018-08-13 12:14:51,438 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-1:ctx-631d0fad work-48) (logid:85d7428e) KVMInvestigator
found VM[DomainRouter|r-199-VM] to be alive? true
2018-08-13 12:14:51,438 DEBUG [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-1:ctx-631d0fad work-48) (logid:85d7428e) VM r-199-VM is found
to be alive by KVMInvestigator
2018-08-13 12:14:51,438 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-1:ctx-631d0fad work-48) (logid:85d7428e) Rescheduling work
HAWork[48-HA-199-Running-Investigating] to try again at Mon Aug 13
12:15:52 CEST 2018
--

The router ist dead, but its still detected as alive. It seems its also
not BUG https://issues.apache.org/jira/browse/CLOUDSTACK-3535.

kind regards,
thomas


On 02.08.2018 16:46, Thomas Heil wrote:
> Hi,
>
> I have a setup with one advanced zone, one cluster and two Hosts. The
> hosts are KVM and use a single NFS Storage von Primary and one for
> Secondary.
>
> Everything is running smootly until I remove power from one host.
>
> In my honest opinion cloudstack should now delcare the faulty host as
> dead, declare the vm's etc. that were running bevore there as dead and
> start them on the leaving KVM host.
>
> But nothing happens. The VM's states remain as running, the system vms
> become 'Agent State: dicconnected' and thats all.
>
> The only solution to fix that issue for me was to set the state of all
> VM's that were running on the faulty node to 'Stopped'.
>
> Could anybody confirm that this is a reproduceable problem?
>
> kind regards,
> thomas