osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Brooklyn highAvailabilityMode: default as AUTO?


Aled,

+1 sounds like a sensible plan

Duncan

On Tue, 22 May 2018 at 13:59 Aled Sage <aled.sage@xxxxxxxxx> wrote:

> Hi all,
>
> I'd like to change the default value of highAvailabilityMode from
> DISABLED to AUTO.
>
> Currently, if you start two Brooklyn servers pointing at the same
> persisted state (file-system directory or object store's bucket), then
> they are independent (because HA is 'disabled' by default). However,
> they both write to that same persisted state, which will lead to
> surprising behaviour, particularly when a Brooklyn server is next
> restarted.
>
> Changing to 'AUTO' would (almost entirely) have the same behaviour as we
> have currently for a single Brooklyn server. In the case of two servers
> pointing at the same persisted state, the second would come up as
> 'standby', and will be automatically promoted to 'master' if the first
> stops or fails.
>
> I say "almost entirely":
> 1. If you run Brooklyn and then kill it (e.g. `kill -9` or turn off the
> VM), when you start Brooklyn again it will wait to confirm the previous
> server is really dead. It waits for 30 seconds after the server's last
> heartbeat, by default.
> 2. The HA status shows all previous runs of the Brooklyn server (it gets
> a new node-id each time it restarts). This list will get longer and
> longer if you keep restarting Brooklyn, pointing at the same persisted
> state, until you clear out terminates instances from the list (via the
> UI or the REST api).
> 3. The logging at startup will be quite different (e.g. "Brooklyn
> initialisation (part two) complete" now means that the server has
> finished becoming the 'standby'. If anyone has tools/scripts that
> search/parse these logs, then they may be affected.
>
> ---
>
> Note the current behaviour contradicts the docs [1], which say:
> "Brooklyn will automatically run in HA mode if multiple Brooklyn
> instances are started pointing at the same persistence store."
>
> Thoughts?
>
> Aled
>
> p.s. another option would be to try to fail-fast when
> highAvailabilityMode is disabled but there is another Brooklyn using the
> same persisted state. However, distinguishing that from (1) above is
> tricky.
>
> [1]
>
> https://github.com/apache/brooklyn-docs/blob/master/guide/ops/high-availability/index.md
>
>
>