osdir.com
mailing list archive

Subject: Re: Automating failover with WAL - msg#00089

List: db.postgresql.skytools.user

Mail Archive Navigation:
by Date: Prev Next Index by Thread: Prev Next Index
Hi. Yesterday I wrote requesting advice for automated failover to a hot
backup. I am hoping to get some comments on the process described below
as a starting point. I most likely do not have this correct but want
feedback to iron out a strategy for using walmgr.

First, I am assuming a backup and restore must be done prior to running
syncdaemon. Second, booting will need to be done on slave and I want to
make sure this is being done properly. Third, on a failover the slave
needs to become the master so there has to be a change in configuration
for that to occur.

This assumes master and slave monitor each other with a heartbeat. A
monitoring server will be used to provide oversight of the servers
generally and health of syncdaemon, postgres, and heartbeat.

I'd also like some idea on using periodical. Generally how and where are
you using this? For master slave replication, are you using it for
backup on master

Sorry for long post but I realize it is easier to comment on things that
are written than just ask for feedback. Many thanks.

Regards,
David


Master setup
------------
1. Initialize and startup db
2. walmgr wal.master.ini setup
3. walmgr wal.master.ini backup
4. walmgr wal.slave.ini restore
5. walmgr wal.master.ini syncdaemon
6. start heartbeat


Slave setup
-----------
1. walmgr wal.slave.ini backup (to run backups from
2. start heartbeat


Heartbeat (1.5 - 2 sec frequency)
---------
1 heartbeat missed - ignore
2 heartbeats missed - WARNING logged
3 heartbeats missed - CRITICAL logged (Initiates failover)


Master Failover
---------------
1. walmgr wal.slave.ini boot (from slave machine)
2. Reload configuration so slave is now master
3. Start up a machine to become the new slave
4. walmgr wal.master.ini backup
5. walmgr wal.slave.ini restore
5. walmgr wal.master.ini syncdaemon
7. start heartbeat


Slave Failover
-------------
1. From master: wal.master.ini stop
2. Start up new slave machine
3. From master: walmgr wal.master.ini backup
4. From master: walmgr wal.slave.ini restore
5. From master: walmgr wal.master.ini syncdaemon
6. start heartbeat


Conducting backups during a scheduled shutdown
----------------------------------------------
walmgr wal.master.ini backup
walmgr wal.slave.ini restore



David Pratt wrote:
> Hi. I am hoping that someone can help me list steps for basic automated
> failover using WAL. At this point I have a working WAL mgr that I am
> testing with and I have written a heartbeat client and server that I
> have working. I plan on having the slave monitor the server and vice
> versa so actions would be executed from slave if master dies and from
> master if slave dies. I am currently studying the commands but want to
> ensure I get it right.
>
> Since UDP packets can be less reliable I am looking at emitting a
> WARNING in a log file but not initiating the failover until three
> consecutive packets have not returned. I can regulate the interval of
> the heartbeat and considering an interval of 1.5 - 2 sec. I want to get
> the mechanics of failover correct so I can play the actions after the
> heartbeat has confirmed the server is dead. Many thanks.
>
> Regards,
> David
> _______________________________________________
> Skytools-users mailing list
> Skytools-users-JL6EbXIHTPOxbKUeIHjxjQ@xxxxxxxxxxxxxxxx
> http://pgfoundry.org/mailman/listinfo/skytools-users
>


Find Db Jobs at git.net
(osdir sister site)

Thread at a glance:

Previous Message by Date:

Re: PHP API for PGQ

Hi, Le mercredi 11 juin 2008, Marko Kreen a écrit : > > the process method in an infinite loop and the stop (kill -TERM) will > > not interrupt currently running process. > > Sounds fancy :) You bet it does... after the first real tests I've switched to the more classic way of doing things: INT wait for current loop to end normally then quit, TERM force rollback *now* and quit. > - Automatic queue creation and registration is dangerous. > The typos will be very costly. The current Skytools do > automatic registration and I'd like to get rid of that in 3.0. > Even that has caused pain. I changed to a set of commands such as install, uninstall, register, unregister, check, create_queue, drop_queue, and the startup sequence do no more call install() but check(), and the daemon refuses to start when check() is False. > - You should COMMIT after next_batch(), otherwise the batch_id > is unusable for tracking in remote db - if canceled, next > next_batch() gives new value from sequence, but same batch. > Calling next_batch() without any transaction open would be > best. Done in its own transaction (out of the process_* ones). Thanks a lot for this one, I bet we could have some hard times before finding this out... > - On logutriga urlencode - if the '=' is missing the value is > NULL otherwise the value is string. If you are cetain not to > carry NULLs or if you want to handle NULLs as '' it's ok. Changed "" to null in this case. > - The per-event-SAVEPOINT processing seems weird. It may be ok > for a specific consumer, but does not seem good as a general > mechanism. Do you really want to eat SQL errors? This should > be something that consumer explicitly requests - or better > yet, is implemented as separate class. (Although, if only one > consumer requires it, it can do it itself..) Now done in a separate class, namely PGQEventRemoteConsumer. See above. FYI, the SQL errors are not trapped: if some SQL level error happens in the transaction, PostgreSQL won't let you RELEASE SAVEPOINT, you're no more in a context where this operation has any sense. So you'll get a WARNING at this point, and all the current batch work in progress is aborted, and opened transactions are ROLLBACKed. > Yes, I'd suggest splitting it up: > > - Just a local consumer. > - Then a consumer that handles the remote batch tracking. > It could be simple batch tracker like RemoteConsumer, > or use the current logic you have. The basic logic with a single database connection is now implemented into PGQConsumer class, which also provides the PHP functions around the SQL API. I've added a PGQRemoteConsumer which protects you from seeing the same batch twice and has some COMMIT/ROLLBACK logic on both connections (source and destination). Atop this one, I've made a PGQEventRemoteConsumer which implements the SAVEPOINT logic I presented. I've done this by having process_batch() resort to preprocess_batch() and postprocess_batch() hooks, and surrounding process_event() calls with preprocess_event() and postprocess_event() hooks. This way, PGQConsumer.process_batch() is the only implementation of it, the specialized consumers are only (ab)using the hooks. Any hook can ask for batch processing aborting by returning PGQ_ABORT_BATCH, as process_event() can. The PGQ prefix are not only for decoration purpose here, PHP not offering any namespace concept (that I know of). Oh, and I just discovered what pgq_ext is for... but didn't take the time to use this for is_batch_done() and set_batch_done() which still use a separate implementation. We could see about using pgq_ext here, of course. > Having it split that way means user can implement their > own variants of batch/event tracking too, because one > thing we have learned is that batch-jobs have different > tracking requirements. (Although the batch-tracking > where all of them are applied in single tx is most common.) I hope to have just answered this one :) > I'm ok including it, but I'm not sure how to position it. Under > skytools/ fits mostly example code, because I'd like to keep it > mostly Python still. Sound choice. > If you want to publish more real-life code then maybe we should > open a subdir parallel to skytools/ and publish it also as > separate download. (php-pgq?) You could also rewrite pgadm in > PHP so to get totally rid of Python. In general such rewrites > do not make much sense, except it's core logic is quite simple > (it just calls db functions) and that way you could tie it > with your PHP logging/daemon handling. In fact the create_queue, drop_queue, register, unregister, etc commands to the daemons offer some coverage of pgqadm, the user having to resort to SQL only to launch a ticker daemon and configure it. I personally see no interest in duplicating the pgqadm tool and ticker daemons. The PHP API interest reside in offering the ability to reuse someone Model classes in the consumer daemon processing instead of resorting to rewrite those models in python. That was my main goal in this work. > This would probably mean also separating Postgres modules > (sql/ or skytools-modules package) into separate download. > But that should be quite easy. > > So what are your thoughts? I'm thinking the top level pgq-php directory is a good solution, and the php/pgq idea too (allowing to add some tools in PHP if ever needed, not sure about this one). As for distributing this all, I'm perfectly fine with adding the php files to the current mix, on the grounds that you still need all of pgq (skytools scripting, python/pgq, sql/pgq, and the C PostgreSQL modules too) to be able to use it. Please find updated version at the following URL, the .phps for easy browsing, the tarball archive for more serious usage :) http://pgsql.tapoueh.org/pgq/ I still didn't have time to provide some simple example daemons using those classes, I hope to find some time "soon" for this. Regards, -- dim signature.asc Description: This is a digitally signed message part. _______________________________________________ Skytools-users mailing list Skytools-users-JL6EbXIHTPOxbKUeIHjxjQ@xxxxxxxxxxxxxxxx http://pgfoundry.org/mailman/listinfo/skytools-users

Next Message by Date:

Re: Automating failover with WAL

On Thu, Jun 12, 2008 at 11:01 AM, David Pratt <fairwinds-KiPCJBMgASssA/PxXw9srA@xxxxxxxxxxxxxxxx> wrote: > Hi. Yesterday I wrote requesting advice for automated failover to a hot > backup. I am hoping to get some comments on the process described below > as a starting point. I most likely do not have this correct but want > feedback to iron out a strategy for using walmgr. Very good post, one that I'm very intestered in the answer from other, more experienced with walmgr, members of the list. What are you planning to use for heartbeat? Roberto -- http://blog.divisiblebyfour.org/

Previous Message by Thread:

Automating failover with WAL

Hi. I am hoping that someone can help me list steps for basic automated failover using WAL. At this point I have a working WAL mgr that I am testing with and I have written a heartbeat client and server that I have working. I plan on having the slave monitor the server and vice versa so actions would be executed from slave if master dies and from master if slave dies. I am currently studying the commands but want to ensure I get it right. Since UDP packets can be less reliable I am looking at emitting a WARNING in a log file but not initiating the failover until three consecutive packets have not returned. I can regulate the interval of the heartbeat and considering an interval of 1.5 - 2 sec. I want to get the mechanics of failover correct so I can play the actions after the heartbeat has confirmed the server is dead. Many thanks. Regards, David

Next Message by Thread:

Re: Automating failover with WAL

On Thu, Jun 12, 2008 at 11:01 AM, David Pratt <fairwinds-KiPCJBMgASssA/PxXw9srA@xxxxxxxxxxxxxxxx> wrote: > Hi. Yesterday I wrote requesting advice for automated failover to a hot > backup. I am hoping to get some comments on the process described below > as a starting point. I most likely do not have this correct but want > feedback to iron out a strategy for using walmgr. Very good post, one that I'm very intestered in the answer from other, more experienced with walmgr, members of the list. What are you planning to use for heartbeat? Roberto -- http://blog.divisiblebyfour.org/

Web Hosting Reviews from OSDir.com Sister Site iBizWebHosting.com

Home | News | Patents | Sitemap | FAQ | advertise | OSDir is an Inevitable website. GBiz & git.net are too!

Advertising by