[Openstack] HA Compute & Instance Evacuation
Take this with a grain of salt because we're using the original version
before the project moved under the Big Tent and I'm not sure how much it's
evolved since then. I assume the basic functions are the same though.
You're correct; Corosync and Pacemaker are used to determine if a compute
node goes down. The masakari-host-monitor process runs on each compute node
and checks the cluster status and sends a notification to
masakari-controller when a node goes down. The controller process keeps a
list of reserved hosts in it's database and calls nova host-evacuate to
move the Instances to one of the reserved hosts.
In our environment I also configured STONITH and I'd highly recommend it.
With STONITH Pacemaker sends a shutdown command to the Out of Band
Management card of the unreachable node to make sure that it can't come
back and cause a conflict.
There are two other components, masakari-process-monitor and
masakari-instance-monitor. These also run on your compute nodes. The former
watches the nova-compute service and the later monitors running instances
and restarts them if necessary.
Looking here it seems they've split Masakari into thee different repos:
masakari - The controller service and API
masakari-monitors - Compute node monitoring services
python-masakari-client - The cli tools
-------------- next part --------------
An HTML attachment was scrubbed...