|
|
Re: Replication sometimes hangs: msg#00008
|
Subject: |
Re: Replication sometimes hangs |
This is a multi-part message in MIME format.
Wanja Schonecke wrote:
Hi again,
i continued on my way testing anything that come into my mind ;)
as i wrote before, my postgres cluster hangs for 2-3 minutes about every
2 hours (more often if there is really much party on my website).
i run auto-vacuum every 60 seconds... and here is the strange thing: if
i do an analyze there is no hanging for about 30 hours.
after that its the same as before. i am as new to postgres as to
pgcluster, so is it necessary to make analyze periodically? and if it
is, why isn´t it done together with auto-vacuum?
it seems that anything works great if i run analyze every night... could
this just be all my problem?
thanks in advance
wanja
Hi,
The missing "got it" appears not only on PGR_CLOSE_CONNECTION. It appears on any query, for about 20 seconds (in this time there are many hundret lines in my debug log) and after that it hangs for 2 minutes. It happens more often with higher load.
My /etc/hosts ist on both server:
127.0.0.1 localhost.localdomain localhost
192.168.10.31 db1.suckerprod.de db1
192.168.10.32 db2.suckerprod.de db2
What else do you have to know?
Perhaps it can be a problem that both server don´t have connection to internet? They don´t see any DNS server etc... these 2 minutes of doing nothing feels like a timeout.
Thank you for answering so quickly,
Wanja
Hi,
It seems that there is no response from cluster DBs when replication
server send connection close notice.
Would you let us know your server's environment (/etc/hosts etc)
Regards,
-------------
At.Mitani
I´m running Pgcluster 1.5(rc7) in production environment with the
following setup:
hosts:
db1: clusterdb + replicator
<Cluster_Server_Info>
<Host_Name> db1 </Host_Name>
<Port> 5432 </Port>
<Recovery_Port> 7101 </Recovery_Port>
</Cluster_Server_Info>
<Cluster_Server_Info>
<Host_Name> db2 </Host_Name>
<Port> 5432 </Port>
<Recovery_Port> 7101 </Recovery_Port>
</Cluster_Server_Info>
<Status_Log_File> /srv/pgcluster/pgreplicate.sts </Status_Log_File>
<Error_Log_File> /srv/pgcluster/pgreplicate.log </Error_Log_File>
<Replication_Port> 8001 </Replication_Port>
<Recovery_Port> 8101 </Recovery_Port>
<RLOG_Port> 8301 </RLOG_Port>
<Response_Mode> normal </Response_Mode>
<Use_Replication_Log> yes </Use_Replication_Log>
___________________________________________________________________
<Replicate_Server_Info>
<Host_Name> db1 </Host_Name>
<Port> 8001 </Port>
<Recovery_Port> 8101 </Recovery_Port>
</Replicate_Server_Info>
<Replicate_Server_Info>
<Host_Name> db2 </Host_Name>
<Port> 8001 </Port>
<Recovery_Port> 8101 </Recovery_Port>
</Replicate_Server_Info>
<Recovery_Port> 7101 </Recovery_Port>
<Rsync_Path> /usr/bin/rsync </Rsync_Path>
<Rsync_Option> ssh </Rsync_Option>
<Rsync_Compress> yes </Rsync_Compress>
<When_Stand_Alone> read_write </When_Stand_Alone>
<Status_Log_File> /srv/pgcluster/cluster.sts </Status_Log_File>
<Error_Log_File> /srv/pgcluster/cluster.err </Error_Log_File>
db2: clusterdb + replicator
<Cluster_Server_Info>
<Host_Name> db1 </Host_Name>
<Port> 5432 </Port>
<Recovery_Port> 7101 </Recovery_Port>
</Cluster_Server_Info>
<Cluster_Server_Info>
<Host_Name> db2 </Host_Name>
<Port> 5432 </Port>
<Recovery_Port> 7101 </Recovery_Port>
</Cluster_Server_Info>
<Replicate_Server_Info>
<Host_Name> db1 </Host_Name>
<Port> 8001 </Port>
<Recovery_Port> 8101 </Recovery_Port>
<LifeCheck_Port> 8201 </LifeCheck_Port>
</Replicate_Server_Info>
<Status_Log_File> /srv/pgcluster/pgreplicate.sts </Status_Log_File>
<Error_Log_File> /srv/pgcluster/pgreplicate.log </Error_Log_File>
<Replication_Port> 8001 </Replication_Port>
<Recovery_Port> 8101 </Recovery_Port>
<RLOG_Port> 8301 </RLOG_Port>
<Response_Mode> normal </Response_Mode>
<Use_Replication_Log> yes </Use_Replication_Log>
___________________________________________________________
<Replicate_Server_Info>
<Host_Name> db1 </Host_Name>
<Port> 8001 </Port>
<Recovery_Port> 8101 </Recovery_Port>
</Replicate_Server_Info>
<Replicate_Server_Info>
<Host_Name> db2 </Host_Name>
<Port> 8001 </Port>
<Recovery_Port> 8101 </Recovery_Port>
</Replicate_Server_Info>
<Recovery_Port> 7101 </Recovery_Port>
<Rsync_Path> /usr/bin/rsync </Rsync_Path>
<Rsync_Option> ssh </Rsync_Option>
<Rsync_Compress> yes </Rsync_Compress>
<When_Stand_Alone> read_write </When_Stand_Alone>
<Status_Log_File> /srv/pgcluster/cluster.sts </Status_Log_File>
<Error_Log_File> /srv/pgcluster/cluster.err </Error_Log_File>
I don´t use the pglb, i use LVS for that, but i also tryed it without
load balancing, with persistent und non persitent connections, it all
makes no difference.
About every 30 minutes my application using the database hangs for about 2
minutes.
i run the replicator in debug mode to find details:
2006-11-02 20:36:14 [18542] DEBUG:query=PGR_CLOSE_CONNECTION
2006-11-02 20:36:14 [18542] DEBUG:sem_lock [1] req
2006-11-02 20:36:14 [18543] DEBUG:cmdSts=O
2006-11-02 20:36:14 [18543] DEBUG:cmdType=x
2006-11-02 20:36:14 [18543] DEBUG:rlog=64
2006-11-02 20:36:14 [18543] DEBUG:port=5432
2006-11-02 20:36:14 [18543] DEBUG:pid=8457
2006-11-02 20:36:14 [18543] DEBUG:from_host=192.168.10.32
2006-11-02 20:36:14 [18543] DEBUG:dbName=suckerprod
2006-11-02 20:36:14 [18543] DEBUG:userName=pgman
2006-11-02 20:36:14 [18543] DEBUG:recieve sec=1162496174
2006-11-02 20:36:14 [18543] DEBUG:recieve usec=461004
2006-11-02 20:36:14 [18543] DEBUG:query_size=21
2006-11-02 20:36:14 [18543] DEBUG:request_id=0
2006-11-02 20:36:14 [18543] DEBUG:replicate_id=0
2006-11-02 20:36:14 [18543] DEBUG:query=PGR_CLOSE_CONNECTION
2006-11-02 20:36:14 [18543] DEBUG:sem_lock [1] req
2006-11-02 20:36:15 [18544] DEBUG:cmdSts=O
2006-11-02 20:36:15 [18544] DEBUG:cmdType=x
2006-11-02 20:36:15 [18544] DEBUG:rlog=64
2006-11-02 20:36:15 [18544] DEBUG:port=5432
2006-11-02 20:36:15 [18544] DEBUG:pid=8460
2006-11-02 20:36:15 [18544] DEBUG:from_host=192.168.10.32
2006-11-02 20:36:15 [18544] DEBUG:dbName=suckerprod
2006-11-02 20:36:15 [18544] DEBUG:userName=pgman
2006-11-02 20:36:15 [18544] DEBUG:recieve sec=1162496175
2006-11-02 20:36:15 [18544] DEBUG:recieve usec=218061
2006-11-02 20:36:15 [18544] DEBUG:query_size=21
2006-11-02 20:36:15 [18544] DEBUG:request_id=0
2006-11-02 20:36:15 [18544] DEBUG:replicate_id=0
2006-11-02 20:36:15 [18544] DEBUG:query=PGR_CLOSE_CONNECTION
2006-11-02 20:36:15 [18544] DEBUG:sem_lock [1] req
2006-11-02 20:36:15 [18545] DEBUG:cmdSts=Q
2006-11-02 20:36:15 [18545] DEBUG:cmdType=U
2006-11-02 20:36:15 [18545] DEBUG:rlog=0
2006-11-02 20:36:15 [18545] DEBUG:port=5432
2006-11-02 20:36:15 [18545] DEBUG:pid=8462
2006-11-02 20:36:15 [18545] DEBUG:from_host=192.168.10.32
2006-11-02 20:36:15 [18545] DEBUG:dbName=suckerprod
2006-11-02 20:36:15 [18545] DEBUG:userName=pgman
2006-11-02 20:36:15 [18545] DEBUG:recieve sec=1162496175
2006-11-02 20:36:15 [18545] DEBUG:recieve usec=550293
2006-11-02 20:36:15 [18545] DEBUG:query_size=217
2006-11-02 20:36:15 [18545] DEBUG:request_id=1
2006-11-02 20:36:15 [18545] DEBUG:replicate_id=0
2006-11-02 20:36:15 [18545] DEBUG:query=UPDATE com_gbvisit SET visit1 =
'3730', visit2 = '5475', visit3 = '5609', visit4 = '9341', visit5 =
'20242', visit6 = '1', visit7 = '17420', visit8 = '2903',
visit9 = '45', visit10 = '15213' WHERE gbowner_nr = '8453'
2006-11-02 20:36:15 [18545] DEBUG:sem_lock [1] req
2006-11-02 20:36:15 [18546] DEBUG:cmdSts=Q
2006-11-02 20:36:15 [18546] DEBUG:cmdType=U
2006-11-02 20:36:15 [18546] DEBUG:rlog=0
2006-11-02 20:36:15 [18546] DEBUG:port=5432
2006-11-02 20:36:15 [18546] DEBUG:pid=8461
2006-11-02 20:36:15 [18546] DEBUG:from_host=192.168.10.32
2006-11-02 20:36:15 [18546] DEBUG:dbName=suckerprod
2006-11-02 20:36:15 [18546] DEBUG:userName=pgman
2006-11-02 20:36:15 [18546] DEBUG:recieve sec=1162496175
2006-11-02 20:36:15 [18546] DEBUG:recieve usec=552288
2006-11-02 20:36:15 [18546] DEBUG:query_size=113
2006-11-02 20:36:15 [18546] DEBUG:request_id=1
2006-11-02 20:36:15 [18546] DEBUG:replicate_id=0
2006-11-02 20:36:15 [18546] DEBUG:query=UPDATE com_user SET email =
'blabla', sex = 'w', geburtsdatum = '1993-09-18' WHERE user_nr = '7536'
2006-11-02 20:36:15 [18546] DEBUG:sem_lock [1] req
2006-11-02 20:38:19 [18187] DEBUG:deleteTransactionTbl():
2006-11-02 20:38:19 [18184] DEBUG:sem_unlock[1]
2006-11-02 20:38:19 [18184]
DEBUG:PGRdo_replicate():PGRreplicate_packet_send returns 0
2006-11-02 20:38:19 [18184] DEBUG:replicate_loop():session closed
this is the intresting moment. he won´t gain any semaphore for many
seconds, after some time he does this deleteTransactionTbl(): and after
that it runs again for about 30 minutes....
i can also send more lines of my debug file if needed.
What are the requirements of semaphores and shared memory for pgcluster?
I hope someone can help me.
Thank you in advance,
Wanja
_____________________________________________________________________
Der WEB.DE SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
http://smartsurfer.web.de/?mc=100071&distributionid=000000000066
_______________________________________________
Pgcluster-general mailing list
Pgcluster-general@xxxxxxxxxxxxx
http://pgfoundry.org/mailman/listinfo/pgcluster-general
_______________________________________________
Pgcluster-general mailing list
Pgcluster-general@xxxxxxxxxxxxx
http://pgfoundry.org/mailman/listinfo/pgcluster-general
_______________________________________________________________________
Viren-Scan für Ihren PC! Jetzt für jeden. Sofort, online und kostenlos.
Gleich testen! http://www.pc-sicherheit.web.de/freescan/?mc=022222
_______________________________________________
Pgcluster-general mailing list
Pgcluster-general@xxxxxxxxxxxxx
http://pgfoundry.org/mailman/listinfo/pgcluster-general
_______________________________________________
Pgcluster-general mailing list
Pgcluster-general@xxxxxxxxxxxxx
http://pgfoundry.org/mailman/listinfo/pgcluster-general
Hi Wanja,
it is quite usual to analyze every night - we do that on all our
postgresql databases.
Regards,
Christian
--
Christian Denning
Development Manager
DDI: +44 (0)161 241 8602
|
|
Try Searching:
servers, voip, java, networking, microsoft ...
|
|
|
| |