osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Brooklyn highAvailabilityMode: default as AUTO?


+1 sounds like a change that's needed to match the intended behaviour
anyway, as described in the docs.  We should update the docs as part of
this to include your explanation above, Aled, of the details of the
behaviour.

regards
Geoff

On Tue, 22 May 2018 at 15:27 Thomas Bouron <thomas.bouron@xxxxxxxxxxxxxxxxx>
wrote:

> +1, sounds sensible to me
>
> Best.
>
> On Tue, 22 May 2018 at 14:51 Duncan Grant <duncan.grant@xxxxxxxxxxxx>
> wrote:
>
> > Aled,
> >
> > +1 sounds like a sensible plan
> >
> > Duncan
> >
> > On Tue, 22 May 2018 at 13:59 Aled Sage <aled.sage@xxxxxxxxx> wrote:
> >
> > > Hi all,
> > >
> > > I'd like to change the default value of highAvailabilityMode from
> > > DISABLED to AUTO.
> > >
> > > Currently, if you start two Brooklyn servers pointing at the same
> > > persisted state (file-system directory or object store's bucket), then
> > > they are independent (because HA is 'disabled' by default). However,
> > > they both write to that same persisted state, which will lead to
> > > surprising behaviour, particularly when a Brooklyn server is next
> > > restarted.
> > >
> > > Changing to 'AUTO' would (almost entirely) have the same behaviour as
> we
> > > have currently for a single Brooklyn server. In the case of two servers
> > > pointing at the same persisted state, the second would come up as
> > > 'standby', and will be automatically promoted to 'master' if the first
> > > stops or fails.
> > >
> > > I say "almost entirely":
> > > 1. If you run Brooklyn and then kill it (e.g. `kill -9` or turn off the
> > > VM), when you start Brooklyn again it will wait to confirm the previous
> > > server is really dead. It waits for 30 seconds after the server's last
> > > heartbeat, by default.
> > > 2. The HA status shows all previous runs of the Brooklyn server (it
> gets
> > > a new node-id each time it restarts). This list will get longer and
> > > longer if you keep restarting Brooklyn, pointing at the same persisted
> > > state, until you clear out terminates instances from the list (via the
> > > UI or the REST api).
> > > 3. The logging at startup will be quite different (e.g. "Brooklyn
> > > initialisation (part two) complete" now means that the server has
> > > finished becoming the 'standby'. If anyone has tools/scripts that
> > > search/parse these logs, then they may be affected.
> > >
> > > ---
> > >
> > > Note the current behaviour contradicts the docs [1], which say:
> > > "Brooklyn will automatically run in HA mode if multiple Brooklyn
> > > instances are started pointing at the same persistence store."
> > >
> > > Thoughts?
> > >
> > > Aled
> > >
> > > p.s. another option would be to try to fail-fast when
> > > highAvailabilityMode is disabled but there is another Brooklyn using
> the
> > > same persisted state. However, distinguishing that from (1) above is
> > > tricky.
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/brooklyn-docs/blob/master/guide/ops/high-availability/index.md
> > >
> > >
> > >
> >
> --
>
> Thomas Bouron • Senior Software Engineer @ Cloudsoft Corporation •
> https://cloudsoft.io/
> Github: https://github.com/tbouron
> Twitter: https://twitter.com/eltibouron
>