logo       

Replication sometimes hangs: msg#00001

Subject: Replication sometimes hangs
Hi,

I´m running Pgcluster 1.5(rc7) in production environment with the 
following setup:

hosts:

db1: clusterdb + replicator

<Cluster_Server_Info>
    <Host_Name>           db1                   </Host_Name>
    <Port>                5432                  </Port>
    <Recovery_Port>       7101                  </Recovery_Port>
</Cluster_Server_Info>
<Cluster_Server_Info>
    <Host_Name>           db2                   </Host_Name>
    <Port>                5432                  </Port>
    <Recovery_Port>       7101                  </Recovery_Port>
</Cluster_Server_Info>

<Status_Log_File>       /srv/pgcluster/pgreplicate.sts  </Status_Log_File>
<Error_Log_File>        /srv/pgcluster/pgreplicate.log  </Error_Log_File>
<Replication_Port>      8001                    </Replication_Port>
<Recovery_Port>         8101                    </Recovery_Port>
<RLOG_Port>             8301                    </RLOG_Port>
<Response_Mode>         normal                  </Response_Mode>
<Use_Replication_Log>   yes                     </Use_Replication_Log>
___________________________________________________________________

<Replicate_Server_Info>
        <Host_Name> db1 </Host_Name>
        <Port> 8001 </Port>
        <Recovery_Port> 8101 </Recovery_Port>
</Replicate_Server_Info>

<Replicate_Server_Info>
        <Host_Name> db2 </Host_Name>
        <Port> 8001 </Port>
        <Recovery_Port> 8101 </Recovery_Port>
</Replicate_Server_Info>


<Recovery_Port> 7101 </Recovery_Port>
<Rsync_Path> /usr/bin/rsync </Rsync_Path>
<Rsync_Option> ssh </Rsync_Option>
<Rsync_Compress> yes </Rsync_Compress>
<When_Stand_Alone> read_write  </When_Stand_Alone>
<Status_Log_File> /srv/pgcluster/cluster.sts </Status_Log_File>
<Error_Log_File> /srv/pgcluster/cluster.err </Error_Log_File>


db2: clusterdb + replicator

<Cluster_Server_Info>
    <Host_Name>           db1                   </Host_Name>
    <Port>                5432                  </Port>
    <Recovery_Port>       7101                  </Recovery_Port>
</Cluster_Server_Info>
<Cluster_Server_Info>
    <Host_Name>           db2                   </Host_Name>
    <Port>                5432                  </Port>
    <Recovery_Port>       7101                  </Recovery_Port>
</Cluster_Server_Info>

<Replicate_Server_Info>
        <Host_Name> db1 </Host_Name>
        <Port> 8001 </Port>
        <Recovery_Port> 8101 </Recovery_Port>
        <LifeCheck_Port> 8201 </LifeCheck_Port>
</Replicate_Server_Info>

<Status_Log_File>       /srv/pgcluster/pgreplicate.sts  </Status_Log_File>
<Error_Log_File>        /srv/pgcluster/pgreplicate.log  </Error_Log_File>
<Replication_Port>      8001                    </Replication_Port>
<Recovery_Port>         8101                    </Recovery_Port>
<RLOG_Port>             8301                    </RLOG_Port>
<Response_Mode>         normal                  </Response_Mode>
<Use_Replication_Log>   yes                     </Use_Replication_Log>
___________________________________________________________

<Replicate_Server_Info>
        <Host_Name> db1 </Host_Name>
        <Port> 8001 </Port>
        <Recovery_Port> 8101 </Recovery_Port>
</Replicate_Server_Info>

<Replicate_Server_Info>
        <Host_Name> db2 </Host_Name>
        <Port> 8001 </Port>
        <Recovery_Port> 8101 </Recovery_Port>
</Replicate_Server_Info>


<Recovery_Port> 7101 </Recovery_Port>
<Rsync_Path> /usr/bin/rsync </Rsync_Path>
<Rsync_Option> ssh </Rsync_Option>
<Rsync_Compress> yes </Rsync_Compress>
<When_Stand_Alone> read_write  </When_Stand_Alone>
<Status_Log_File> /srv/pgcluster/cluster.sts </Status_Log_File>
<Error_Log_File> /srv/pgcluster/cluster.err </Error_Log_File>


I don&#180;t use the pglb, i use LVS for that, but i also tryed it without load 
balancing, with persistent und non persitent connections, it all makes no 
difference.


About every 30 minutes my application using the database hangs for about 2 
minutes.
i run the replicator in debug mode to find details:



2006-11-02 20:36:14 [18542] DEBUG:query=PGR_CLOSE_CONNECTION
2006-11-02 20:36:14 [18542] DEBUG:sem_lock [1] req
2006-11-02 20:36:14 [18543] DEBUG:cmdSts=O
2006-11-02 20:36:14 [18543] DEBUG:cmdType=x
2006-11-02 20:36:14 [18543] DEBUG:rlog=64
2006-11-02 20:36:14 [18543] DEBUG:port=5432
2006-11-02 20:36:14 [18543] DEBUG:pid=8457
2006-11-02 20:36:14 [18543] DEBUG:from_host=192.168.10.32
2006-11-02 20:36:14 [18543] DEBUG:dbName=suckerprod
2006-11-02 20:36:14 [18543] DEBUG:userName=pgman
2006-11-02 20:36:14 [18543] DEBUG:recieve sec=1162496174
2006-11-02 20:36:14 [18543] DEBUG:recieve usec=461004
2006-11-02 20:36:14 [18543] DEBUG:query_size=21
2006-11-02 20:36:14 [18543] DEBUG:request_id=0
2006-11-02 20:36:14 [18543] DEBUG:replicate_id=0
2006-11-02 20:36:14 [18543] DEBUG:query=PGR_CLOSE_CONNECTION
2006-11-02 20:36:14 [18543] DEBUG:sem_lock [1] req
2006-11-02 20:36:15 [18544] DEBUG:cmdSts=O
2006-11-02 20:36:15 [18544] DEBUG:cmdType=x
2006-11-02 20:36:15 [18544] DEBUG:rlog=64
2006-11-02 20:36:15 [18544] DEBUG:port=5432
2006-11-02 20:36:15 [18544] DEBUG:pid=8460
2006-11-02 20:36:15 [18544] DEBUG:from_host=192.168.10.32
2006-11-02 20:36:15 [18544] DEBUG:dbName=suckerprod
2006-11-02 20:36:15 [18544] DEBUG:userName=pgman
2006-11-02 20:36:15 [18544] DEBUG:recieve sec=1162496175
2006-11-02 20:36:15 [18544] DEBUG:recieve usec=218061
2006-11-02 20:36:15 [18544] DEBUG:query_size=21
2006-11-02 20:36:15 [18544] DEBUG:request_id=0
2006-11-02 20:36:15 [18544] DEBUG:replicate_id=0
2006-11-02 20:36:15 [18544] DEBUG:query=PGR_CLOSE_CONNECTION
2006-11-02 20:36:15 [18544] DEBUG:sem_lock [1] req
2006-11-02 20:36:15 [18545] DEBUG:cmdSts=Q
2006-11-02 20:36:15 [18545] DEBUG:cmdType=U
2006-11-02 20:36:15 [18545] DEBUG:rlog=0
2006-11-02 20:36:15 [18545] DEBUG:port=5432
2006-11-02 20:36:15 [18545] DEBUG:pid=8462
2006-11-02 20:36:15 [18545] DEBUG:from_host=192.168.10.32
2006-11-02 20:36:15 [18545] DEBUG:dbName=suckerprod
2006-11-02 20:36:15 [18545] DEBUG:userName=pgman
2006-11-02 20:36:15 [18545] DEBUG:recieve sec=1162496175
2006-11-02 20:36:15 [18545] DEBUG:recieve usec=550293
2006-11-02 20:36:15 [18545] DEBUG:query_size=217
2006-11-02 20:36:15 [18545] DEBUG:request_id=1
2006-11-02 20:36:15 [18545] DEBUG:replicate_id=0
2006-11-02 20:36:15 [18545] DEBUG:query=UPDATE com_gbvisit SET visit1 = '3730', 
visit2 = '5475', visit3 = '5609', visit4 = '9341', visit5 = '20242', visit6 = 
'1', visit7 = '17420', visit8 = '2903',
visit9 = '45', visit10 = '15213' WHERE gbowner_nr = '8453'
2006-11-02 20:36:15 [18545] DEBUG:sem_lock [1] req
2006-11-02 20:36:15 [18546] DEBUG:cmdSts=Q
2006-11-02 20:36:15 [18546] DEBUG:cmdType=U
2006-11-02 20:36:15 [18546] DEBUG:rlog=0
2006-11-02 20:36:15 [18546] DEBUG:port=5432
2006-11-02 20:36:15 [18546] DEBUG:pid=8461
2006-11-02 20:36:15 [18546] DEBUG:from_host=192.168.10.32
2006-11-02 20:36:15 [18546] DEBUG:dbName=suckerprod
2006-11-02 20:36:15 [18546] DEBUG:userName=pgman
2006-11-02 20:36:15 [18546] DEBUG:recieve sec=1162496175
2006-11-02 20:36:15 [18546] DEBUG:recieve usec=552288
2006-11-02 20:36:15 [18546] DEBUG:query_size=113
2006-11-02 20:36:15 [18546] DEBUG:request_id=1
2006-11-02 20:36:15 [18546] DEBUG:replicate_id=0
2006-11-02 20:36:15 [18546] DEBUG:query=UPDATE com_user SET email = 'blabla', 
sex = 'w', geburtsdatum = '1993-09-18' WHERE user_nr = '7536'
2006-11-02 20:36:15 [18546] DEBUG:sem_lock [1] req
2006-11-02 20:38:19 [18187] DEBUG:deleteTransactionTbl():
2006-11-02 20:38:19 [18184] DEBUG:sem_unlock[1]
2006-11-02 20:38:19 [18184] DEBUG:PGRdo_replicate():PGRreplicate_packet_send 
returns 0
2006-11-02 20:38:19 [18184] DEBUG:replicate_loop():session closed


this is the intresting moment. he won&#180;t gain any semaphore for many 
seconds, after some time he does this deleteTransactionTbl(): and after that it 
runs again for about 30 minutes.... 

i can also send more lines of my debug file if needed.

What are the requirements of semaphores and shared memory for pgcluster?

I hope someone can help me.

Thank you in advance,
Wanja

_____________________________________________________________________
Der WEB.DE SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
http://smartsurfer.web.de/?mc=100071&distributionid=000000000066



<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
boot-loaders.gr...    php.pear.genera...    debugging.valgr...    kde.redhat.user...    text.xml.xsl.ge...    culture.languag...    hardware.microc...    java.servicemix...    redhat.release....    web.zope.plone....    user-groups.lin...    opendarwin.webk...    video.mjpeg.use...    sysutils.bcfg2....    encryption.gpg....    lx-office.devel...    xfree86.forum/2...    mail.mutt.devel...    acpi.devel/2003...    qnx.openqnx.dev...    network.irc.irs...    freebsd.devel.m...   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe