[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Brooklyn highAvailabilityMode: default as AUTO?

Hi all,

I'd like to change the default value of highAvailabilityMode from DISABLED to AUTO.

Currently, if you start two Brooklyn servers pointing at the same persisted state (file-system directory or object store's bucket), then they are independent (because HA is 'disabled' by default). However, they both write to that same persisted state, which will lead to surprising behaviour, particularly when a Brooklyn server is next restarted.

Changing to 'AUTO' would (almost entirely) have the same behaviour as we have currently for a single Brooklyn server. In the case of two servers pointing at the same persisted state, the second would come up as 'standby', and will be automatically promoted to 'master' if the first stops or fails.

I say "almost entirely":
1. If you run Brooklyn and then kill it (e.g. `kill -9` or turn off the VM), when you start Brooklyn again it will wait to confirm the previous server is really dead. It waits for 30 seconds after the server's last heartbeat, by default. 2. The HA status shows all previous runs of the Brooklyn server (it gets a new node-id each time it restarts). This list will get longer and longer if you keep restarting Brooklyn, pointing at the same persisted state, until you clear out terminates instances from the list (via the UI or the REST api). 3. The logging at startup will be quite different (e.g. "Brooklyn initialisation (part two) complete" now means that the server has finished becoming the 'standby'. If anyone has tools/scripts that search/parse these logs, then they may be affected.


Note the current behaviour contradicts the docs [1], which say: "Brooklyn will automatically run in HA mode if multiple Brooklyn instances are started pointing at the same persistence store."



p.s. another option would be to try to fail-fast when highAvailabilityMode is disabled but there is another Brooklyn using the same persisted state. However, distinguishing that from (1) above is tricky.

[1] https://github.com/apache/brooklyn-docs/blob/master/guide/ops/high-availability/index.md