Hi,
In order to guarantee serialize accessing to the same table, I recomend
set 'true' to 'auto_lock_table' (therefore, default setting is 'true').
By the way, would you let me know why you have satarted cluster db with
recovery mode.
Please try to start without recovery mode at initial start.
Regards,
---------------------
At.
> thanks for the answer. Setting 'auto_lock_table = false'
> solved the problem for me. Does it serialize accessing
> the same table? Here are my settings attached from the
> 'master' db. The slave (second master) is the same,
> except it is running on ports 54324 (db), 7104 (recovery)
> and 7204 (lifecheck). The postmasters and the replication
> are started in this order with the options below:
>
> PGDATA=/home1/zozo/pgd3 PGPORT=54321 PATH=/home1/zozo/pgc2/bin:$PATH
> pg_ctl start
> PGDATA=/home1/zozo/pgd3 PGPORT=54321 PATH=/home1/zozo/pgc2/bin:$PATH
> pgreplicate -n -v
> PGDATA=/home1/zozo/pgd4 PGPORT=54324 PATH=/home1/zozo/pgc2/bin:$PATH
> pg_ctl start \
> -l /home1/zozo/pgd4/recover.log -o '-i -R'
>
> The two psql sessions are started this way:
>
> PGDATA=/home1/zozo/pgd3 PGPORT=54321 PATH=/home1/zozo/pgc2/bin:$PATH
> psql -p 54321 dbx
> PGDATA=/home1/zozo/pgd4 PGPORT=54324 PATH=/home1/zozo/pgc2/bin:$PATH
> psql -p 54324 dbx
>
> Best regards,
> Zolt扤n B扤sz扤rm扤nyi
>
> a.mitani@xxxxxxxxxxxxxx 扤rta:
>> Hi,
>>
>> In order to make reliable, default setting of 'auto_lock_table' is true
>> in
>> postgresql.conf.
>>
>> When you set 'false' to 'auto_lock_table' in the configuration file, it
>> not use auto_lock_table.
>>
>> However, I can not find any semaphor problem in my replication server.
>> Would you please version of replication server and start-up option.
>>
>> Regards,
>> -----------------
>> At.Mitani
>>
>>
>>
>>> I run into a problem testing the latest pgcluster version.
>>> The setup looks like this: two pgcluster nodes are started
>>> on the same machine listening on different ports and
>>> pgreplicate is used between them.
>>>
>>> I connected two psql sessions to the two nodes or
>>> the same node (indifferent) and I was able to lock up
>>> pgreplicate up with the following procedure.
>>>
>>> This is the table I used for testing (no indexes,
>>> no unique checks):
>>>
>>> create table tbl3 (id integer, t text);
>>>
>>> And the scenario:
>>>
>>> On the first psql:
>>>
>>> dbx=# begin; insert into tbl3 (id, t) values (1, 'v');
>>> BEGIN
>>> INSERT 0 1
>>> dbx=#
>>>
>>> Now on the second psql:
>>>
>>> dbx=# begin; insert into tbl3 (id, t) values (1, 'x');
>>> BEGIN
>>> <- here it stalls
>>>
>>> Now on the first psql:
>>>
>>> dbx=# commit;
>>> <- it stalls too
>>>
>>> And here are the debug messages from "pgreplicate -n -v":
>>>
>>> ...
>>> 2006-10-18 12:45:25 [6804] DEBUG:pgrecovery_loop():last master
>>> host-81-17-177-202.dunaweb.hu - 54321
>>> 2006-10-18 12:45:25 [6804] DEBUG:pgrecovery_loop():last target
>>> host-81-17-177-202.dunaweb.hu - 54324
>>> 2006-10-18 12:45:25 [6804] DEBUG:PGRsend_queue():master
>>> host-81-17-177-202.dunaweb.hu - 54321
>>> 2006-10-18 12:45:25 [6804] DEBUG:PGRsend_queue():target
>>> host-81-17-177-202.dunaweb.hu - 54324
>>> 2006-10-18 12:45:25 [6804] DEBUG:PGRsend_queue():send_queue return
>>> status
>>> 0
>>> 2006-10-18 12:45:25 [6804] DEBUG:pgrecovery_loop():recovery accept port
>>> 8101
>>> 2006-10-18 12:45:25 [6804] DEBUG:read_packet():receive packet
>>> 2006-10-18 12:45:25 [6804] DEBUG:no = 11
>>> 2006-10-18 12:45:25 [6804] DEBUG:max_connect = 100
>>> 2006-10-18 12:45:25 [6804] DEBUG:port = 54324
>>> 2006-10-18 12:45:25 [6804] DEBUG:recoveryPort = 7104
>>> 2006-10-18 12:45:25 [6804] DEBUG:hostName =
>>> host-81-17-177-202.dunaweb.hu
>>> 2006-10-18 12:45:25 [6804] DEBUG:pg_data = /home1/zozo/pgd4
>>> 2006-10-18 12:45:25 [6804] DEBUG:pgrecovery_loop():receive packet no:11
>>> 2006-10-18 12:45:36 [6849] DEBUG:PGRread_packet():PG_read_query returns
>>> begin.
>>> 2006-10-18 12:45:36 [6849] DEBUG:PGRreturn_result():PGRreturn_result[1]
>>> 2006-10-18 12:45:36 [6849] DEBUG:PGRreturn_result():128 send
>>> 2006-10-18 12:45:36 [6849] DEBUG:PGRread_packet():PG_read_query returns
>>> begin.
>>> 2006-10-18 12:45:36 [6849] DEBUG:PGRdo_replicate():query :: begin
>>> 2006-10-18 12:45:36 [6849] DEBUG:cmdSts=Q
>>> 2006-10-18 12:45:36 [6849] DEBUG:cmdType=B
>>> 2006-10-18 12:45:36 [6849] DEBUG:rlog=0
>>> 2006-10-18 12:45:36 [6849] DEBUG:port=54321
>>> 2006-10-18 12:45:36 [6849] DEBUG:pid=6846
>>> 2006-10-18 12:45:36 [6849] DEBUG:from_host=81.17.177.202
>>> 2006-10-18 12:45:36 [6849] DEBUG:dbName=dbx
>>> 2006-10-18 12:45:36 [6849] DEBUG:userName=zozo
>>> 2006-10-18 12:45:36 [6849] DEBUG:recieve sec=1161168336
>>> 2006-10-18 12:45:36 [6849] DEBUG:recieve usec=220377
>>> 2006-10-18 12:45:36 [6849] DEBUG:query_size=5
>>> 2006-10-18 12:45:36 [6849] DEBUG:request_id=1
>>> 2006-10-18 12:45:36 [6849] DEBUG:replicate_id=0
>>> 2006-10-18 12:45:36 [6849] DEBUG:recovery_status=0
>>> 2006-10-18 12:45:36 [6849] DEBUG:query=begin
>>> 2006-10-18 12:45:36 [6849] DEBUG:sem_lock [1] req
>>> 2006-10-18 12:45:36 [6849] DEBUG:sem_lock [1] got it
>>> 2006-10-18 12:45:36 [6849] DEBUG:same host
>>> 2006-10-18 12:45:36 [6849] DEBUG:PGRis_same_host():not same host
>>> 2006-10-18 12:45:36 [6849] DEBUG:pgr_createConn():PQsetdbLogin
>>> host[81.17.177.202] port[54324] db[dbx] user[zozo]
>>> 2006-10-18 12:45:36 [6849] DEBUG:pgr_createConn():PQsetdbLogin ok!!
>>> 2006-10-18 12:45:36 [6849] DEBUG:start thread_send_cluster()
>>> 2006-10-18 12:45:36 [6849]
>>> DEBUG:send_replicate_packet_to_server():sync_command(SELECT
>>> PGR_SYSTEM_COMMAND_FUNCTION(3,1161168336,220377,126,1,2) )
>>> 2006-10-18 12:45:36 [6849]
>>> DEBUG:send_replicate_packet_to_server():sync_command(SELECT
>>> PGR_SYSTEM_COMMAND_FUNCTION(8,2,0,1) )
>>> 2006-10-18 12:45:36 [6849]
>>> DEBUG:send_replicate_packet_to_server():sync_command returns
>>> SYSTEM_COMMAND
>>> 2006-10-18 12:45:36 [6849]
>>> DEBUG:send_replicate_packet_to_server():PQexec send :begin
>>> 2006-10-18 12:45:36 [6849]
>>> DEBUG:send_replicate_packet_to_server():PQexec returns :BEGIN
>>> 2006-10-18 12:45:36 [6849] DEBUG:thread_send_cluster():return value
>>> from
>>> send_replicate_packet_to_server() is 0
>>> 2006-10-18 12:45:36 [6849] DEBUG:thread_send_cluster():pthread_exit[1]
>>> 2006-10-18 12:45:36 [6849]
>>> DEBUG:PGRreturn_result():PGRreturn_result[3,1161168336,220377,126,1,2]
>>> 2006-10-18 12:45:36 [6849] DEBUG:PGRreturn_result():128 send
>>> 2006-10-18 12:45:36 [6849] DEBUG:end thread_send_source()
>>> 2006-10-18 12:45:36 [6849] DEBUG:sem_unlock[1]
>>> 2006-10-18 12:45:36 [6849] DEBUG:PGRread_packet():PG_read_query returns
>>> PGR_QUERY_DONE_NOTICE_CMD.
>>> 2006-10-18 12:45:36 [6849] DEBUG:PGRread_packet():PG_read_query returns
>>> insert into tbl3 (id, t) values (1, 'v').
>>> 2006-10-18 12:45:36 [6849] DEBUG:PGRreturn_result():PGRreturn_result[1]
>>> 2006-10-18 12:45:36 [6849] DEBUG:PGRreturn_result():128 send
>>> 2006-10-18 12:45:36 [6849] DEBUG:PGRread_packet():PG_read_query returns
>>> insert into tbl3 (id, t) values (1, 'v').
>>> 2006-10-18 12:45:36 [6849] DEBUG:PGRdo_replicate():query :: insert into
>>> tbl3 (id, t) values (1, 'v')
>>> 2006-10-18 12:45:36 [6849] DEBUG:cmdSts=Q
>>> 2006-10-18 12:45:36 [6849] DEBUG:cmdType=I
>>> 2006-10-18 12:45:36 [6849] DEBUG:rlog=0
>>> 2006-10-18 12:45:36 [6849] DEBUG:port=54321
>>> 2006-10-18 12:45:36 [6849] DEBUG:pid=6846
>>> 2006-10-18 12:45:36 [6849] DEBUG:from_host=81.17.177.202
>>> 2006-10-18 12:45:36 [6849] DEBUG:dbName=dbx
>>> 2006-10-18 12:45:36 [6849] DEBUG:userName=zozo
>>> 2006-10-18 12:45:36 [6849] DEBUG:recieve sec=1161168336
>>> 2006-10-18 12:45:36 [6849] DEBUG:recieve usec=226765
>>> 2006-10-18 12:45:36 [6849] DEBUG:query_size=40
>>> 2006-10-18 12:45:36 [6849] DEBUG:request_id=2
>>> 2006-10-18 12:45:36 [6849] DEBUG:replicate_id=0
>>> 2006-10-18 12:45:36 [6849] DEBUG:recovery_status=0
>>> 2006-10-18 12:45:36 [6849] DEBUG:query=insert into tbl3 (id, t) values
>>> (1, 'v')
>>> 2006-10-18 12:45:36 [6849] DEBUG:sem_lock [1] req
>>> 2006-10-18 12:45:36 [6849] DEBUG:sem_lock [1] got it
>>> 2006-10-18 12:45:36 [6849] DEBUG:same host
>>> 2006-10-18 12:45:36 [6849] DEBUG:PGRis_same_host():not same host
>>> 2006-10-18 12:45:36 [6849] DEBUG:start thread_send_cluster()
>>> 2006-10-18 12:45:36 [6849]
>>> DEBUG:send_replicate_packet_to_server():sync_command(SELECT
>>> PGR_SYSTEM_COMMAND_FUNCTION(3,1161168336,226765,127,1,3) )
>>> 2006-10-18 12:45:36 [6849]
>>> DEBUG:send_replicate_packet_to_server():sync_command(SELECT
>>> PGR_SYSTEM_COMMAND_FUNCTION(8,3,0,1) )
>>> 2006-10-18 12:45:36 [6849]
>>> DEBUG:send_replicate_packet_to_server():sync_command returns
>>> SYSTEM_COMMAND
>>> 2006-10-18 12:45:36 [6849]
>>> DEBUG:send_replicate_packet_to_server():PQexec send :insert into tbl3
>>> (id, t) values (1, 'v')
>>> 2006-10-18 12:45:36 [6849]
>>> DEBUG:send_replicate_packet_to_server():PQexec returns :INSERT 0 1
>>> 2006-10-18 12:45:36 [6849] DEBUG:thread_send_cluster():return value
>>> from
>>> send_replicate_packet_to_server() is 0
>>> 2006-10-18 12:45:36 [6849] DEBUG:thread_send_cluster():pthread_exit[1]
>>> 2006-10-18 12:45:36 [6849]
>>> DEBUG:PGRreturn_result():PGRreturn_result[3,1161168336,226765,127,1,3]
>>> 2006-10-18 12:45:36 [6849] DEBUG:PGRreturn_result():128 send
>>> 2006-10-18 12:45:36 [6849] DEBUG:PGRread_packet():PG_read_query returns
>>> PGR_QUERY_DONE_NOTICE_CMD.
>>> 2006-10-18 12:45:36 [6849]
>>> DEBUG:read_answer():answer[PGR_QUERY_DONE_NOTICE_CMD]
>>> 2006-10-18 12:45:36 [6849] DEBUG:read_answer():QUERY DONE
>>> 2006-10-18 12:45:36 [6849] DEBUG:end thread_send_source()
>>> 2006-10-18 12:45:36 [6849] DEBUG:sem_unlock[1]
>>> 2006-10-18 12:45:39 [6853] DEBUG:PGRread_packet():PG_read_query returns
>>> begin.
>>> 2006-10-18 12:45:39 [6853] DEBUG:PGRreturn_result():PGRreturn_result[1]
>>> 2006-10-18 12:45:39 [6853] DEBUG:PGRreturn_result():128 send
>>> 2006-10-18 12:45:39 [6853] DEBUG:PGRread_packet():PG_read_query returns
>>> begin.
>>> 2006-10-18 12:45:39 [6853] DEBUG:PGRdo_replicate():query :: begin
>>> 2006-10-18 12:45:39 [6853] DEBUG:cmdSts=Q
>>> 2006-10-18 12:45:39 [6853] DEBUG:cmdType=B
>>> 2006-10-18 12:45:39 [6853] DEBUG:rlog=0
>>> 2006-10-18 12:45:39 [6853] DEBUG:port=54324
>>> 2006-10-18 12:45:39 [6853] DEBUG:pid=6848
>>> 2006-10-18 12:45:39 [6853] DEBUG:from_host=81.17.177.202
>>> 2006-10-18 12:45:39 [6853] DEBUG:dbName=dbx
>>> 2006-10-18 12:45:39 [6853] DEBUG:userName=zozo
>>> 2006-10-18 12:45:39 [6853] DEBUG:recieve sec=1161168339
>>> 2006-10-18 12:45:39 [6853] DEBUG:recieve usec=390299
>>> 2006-10-18 12:45:39 [6853] DEBUG:query_size=5
>>> 2006-10-18 12:45:39 [6853] DEBUG:request_id=1
>>> 2006-10-18 12:45:39 [6853] DEBUG:replicate_id=0
>>> 2006-10-18 12:45:39 [6853] DEBUG:recovery_status=0
>>> 2006-10-18 12:45:39 [6853] DEBUG:query=begin
>>> 2006-10-18 12:45:39 [6853] DEBUG:sem_lock [1] req
>>> 2006-10-18 12:45:39 [6853] DEBUG:sem_lock [1] got it
>>> 2006-10-18 12:45:39 [6853] DEBUG:PGRis_same_host():not same host
>>> 2006-10-18 12:45:39 [6853] DEBUG:pgr_createConn():PQsetdbLogin
>>> host[81.17.177.202] port[54321] db[dbx] user[zozo]
>>> 2006-10-18 12:45:39 [6853] DEBUG:pgr_createConn():PQsetdbLogin ok!!
>>> 2006-10-18 12:45:39 [6853] DEBUG:same host
>>> 2006-10-18 12:45:39 [6853] DEBUG:start thread_send_cluster()
>>> 2006-10-18 12:45:39 [6853]
>>> DEBUG:send_replicate_packet_to_server():sync_command(SELECT
>>> PGR_SYSTEM_COMMAND_FUNCTION(3,1161168339,390299,128,1,4) )
>>> 2006-10-18 12:45:39 [6853]
>>> DEBUG:send_replicate_packet_to_server():sync_command(SELECT
>>> PGR_SYSTEM_COMMAND_FUNCTION(8,4,0,1) )
>>> 2006-10-18 12:45:39 [6853]
>>> DEBUG:send_replicate_packet_to_server():sync_command returns
>>> SYSTEM_COMMAND
>>> 2006-10-18 12:45:39 [6853]
>>> DEBUG:send_replicate_packet_to_server():PQexec send :begin
>>> 2006-10-18 12:45:39 [6853]
>>> DEBUG:send_replicate_packet_to_server():PQexec returns :BEGIN
>>> 2006-10-18 12:45:39 [6853] DEBUG:thread_send_cluster():return value
>>> from
>>> send_replicate_packet_to_server() is 0
>>> 2006-10-18 12:45:39 [6853] DEBUG:thread_send_cluster():pthread_exit[0]
>>> 2006-10-18 12:45:39 [6853]
>>> DEBUG:PGRreturn_result():PGRreturn_result[3,1161168339,390299,128,1,4]
>>> 2006-10-18 12:45:39 [6853] DEBUG:PGRreturn_result():128 send
>>> 2006-10-18 12:45:39 [6853] DEBUG:end thread_send_source()
>>> 2006-10-18 12:45:39 [6853] DEBUG:sem_unlock[1]
>>> 2006-10-18 12:45:39 [6853] DEBUG:PGRread_packet():PG_read_query returns
>>> PGR_QUERY_DONE_NOTICE_CMD.
>>> 2006-10-18 12:45:39 [6853] DEBUG:PGRread_packet():PG_read_query returns
>>> insert into tbl3 (id, t) values (1, 'x').
>>> 2006-10-18 12:45:39 [6853] DEBUG:PGRreturn_result():PGRreturn_result[1]
>>> 2006-10-18 12:45:39 [6853] DEBUG:PGRreturn_result():128 send
>>> 2006-10-18 12:45:39 [6853] DEBUG:PGRread_packet():PG_read_query returns
>>> insert into tbl3 (id, t) values (1, 'x').
>>> 2006-10-18 12:45:39 [6853] DEBUG:PGRdo_replicate():query :: insert into
>>> tbl3 (id, t) values (1, 'x')
>>> 2006-10-18 12:45:39 [6853] DEBUG:cmdSts=Q
>>> 2006-10-18 12:45:39 [6853] DEBUG:cmdType=I
>>> 2006-10-18 12:45:39 [6853] DEBUG:rlog=0
>>> 2006-10-18 12:45:39 [6853] DEBUG:port=54324
>>> 2006-10-18 12:45:39 [6853] DEBUG:pid=6848
>>> 2006-10-18 12:45:39 [6853] DEBUG:from_host=81.17.177.202
>>> 2006-10-18 12:45:39 [6853] DEBUG:dbName=dbx
>>> 2006-10-18 12:45:39 [6853] DEBUG:userName=zozo
>>> 2006-10-18 12:45:39 [6853] DEBUG:recieve sec=1161168339
>>> 2006-10-18 12:45:39 [6853] DEBUG:recieve usec=396922
>>> 2006-10-18 12:45:39 [6853] DEBUG:query_size=40
>>> 2006-10-18 12:45:39 [6853] DEBUG:request_id=2
>>> 2006-10-18 12:45:39 [6853] DEBUG:replicate_id=0
>>> 2006-10-18 12:45:39 [6853] DEBUG:recovery_status=0
>>> 2006-10-18 12:45:39 [6853] DEBUG:query=insert into tbl3 (id, t) values
>>> (1, 'x')
>>> 2006-10-18 12:45:39 [6853] DEBUG:sem_lock [1] req
>>> 2006-10-18 12:45:39 [6853] DEBUG:sem_lock [1] got it
>>> 2006-10-18 12:45:39 [6853] DEBUG:PGRis_same_host():not same host
>>> 2006-10-18 12:45:39 [6853] DEBUG:same host
>>> 2006-10-18 12:45:39 [6853] DEBUG:start thread_send_cluster()
>>> 2006-10-18 12:45:39 [6853]
>>> DEBUG:send_replicate_packet_to_server():sync_command(SELECT
>>> PGR_SYSTEM_COMMAND_FUNCTION(3,1161168339,396922,129,1,5) )
>>> 2006-10-18 12:45:39 [6853]
>>> DEBUG:send_replicate_packet_to_server():sync_command(SELECT
>>> PGR_SYSTEM_COMMAND_FUNCTION(8,5,0,1) )
>>> 2006-10-18 12:45:39 [6853]
>>> DEBUG:send_replicate_packet_to_server():sync_command returns
>>> SYSTEM_COMMAND
>>> 2006-10-18 12:45:44 [6849] DEBUG:PGRread_packet():PG_read_query returns
>>> commit.
>>> 2006-10-18 12:45:44 [6849] DEBUG:PGRreturn_result():PGRreturn_result[1]
>>> 2006-10-18 12:45:44 [6849] DEBUG:PGRreturn_result():128 send
>>> 2006-10-18 12:45:44 [6849] DEBUG:PGRread_packet():PG_read_query returns
>>> commit.
>>> 2006-10-18 12:45:44 [6849] DEBUG:PGRdo_replicate():query :: commit
>>> 2006-10-18 12:45:44 [6849] DEBUG:cmdSts=T
>>> 2006-10-18 12:45:44 [6849] DEBUG:cmdType=E
>>> 2006-10-18 12:45:44 [6849] DEBUG:rlog=0
>>> 2006-10-18 12:45:44 [6849] DEBUG:port=54321
>>> 2006-10-18 12:45:44 [6849] DEBUG:pid=6846
>>> 2006-10-18 12:45:44 [6849] DEBUG:from_host=81.17.177.202
>>> 2006-10-18 12:45:44 [6849] DEBUG:dbName=dbx
>>> 2006-10-18 12:45:44 [6849] DEBUG:userName=zozo
>>> 2006-10-18 12:45:44 [6849] DEBUG:recieve sec=1161168344
>>> 2006-10-18 12:45:44 [6849] DEBUG:recieve usec=124490
>>> 2006-10-18 12:45:44 [6849] DEBUG:query_size=6
>>> 2006-10-18 12:45:44 [6849] DEBUG:request_id=3
>>> 2006-10-18 12:45:44 [6849] DEBUG:replicate_id=0
>>> 2006-10-18 12:45:44 [6849] DEBUG:recovery_status=0
>>> 2006-10-18 12:45:44 [6849] DEBUG:query=commit
>>> 2006-10-18 12:45:44 [6849] DEBUG:sem_lock [1] req
>>> <- here it locks up
>>>
>>> As you can see from the last few lines,
>>> one of the pgreplicate threads was able to
>>> get the semaphore but doesn't release it
>>> for some reason and meanwhile the other
>>> thread is locked at locking the semaphore.
>>>
>>> Looking at the code, it seems that pgreplicate
>>> uses only one semaphore for replicating
>>> across nodes and also the same semaphore
>>> for different client connections that are
>>> connected to different nodes as sem_cnt = 1
>>> is hardcoded. So it seems a bit error prone.
>>>
>>> When I use single statements or non-overlapping
>>> transactions, the above error doesn't happen.
>>>
>>> Best regards,
>>> Zolt疣 B?z?m駭yi
>>>
>>>
>>>
>>> _______________________________________________
>>> Pgcluster-general mailing list
>>> Pgcluster-general@xxxxxxxxxxxxx
>>> http://pgfoundry.org/mailman/listinfo/pgcluster-general
>>>
>>>
>>
>> _______________________________________________
>> Pgcluster-general mailing list
>> Pgcluster-general@xxxxxxxxxxxxx
>> http://pgfoundry.org/mailman/listinfo/pgcluster-general
>>
>>
>
> _______________________________________________
> Pgcluster-general mailing list
> Pgcluster-general@xxxxxxxxxxxxx
> http://pgfoundry.org/mailman/listinfo/pgcluster-general
>
|