logo       

Re: Group merge problem after network disconnect: msg#00026

java.javagroups.general

Subject: Re: Group merge problem after network disconnect

1. Set TCP.loopback to true
2. Set shun="false" in FD and GMS
3. Try this with 2.4 latest (I made some changes regarding TCP as of late)

Shan Ponnusamy wrote:
>
> Hi,
>
>
>
> I have a three machine Jboss cache cluster. I see normal operation
> and state merge when one of the jboss cache instances is shutdown and
> restarted. However, when I simulate a network partitioning by
> disabling network connection and enabling it a little later, the
> partitioned node is not able to rejoin the group.
>
>
>
> When I started all the machines (172.16.15.130, 172.16.15.136,
> 172.16.15.137), they are able to find each other and jboss cache
> inserts are replicated across the cluster as expected. When I disable
> the network connection for 172.16.15.137 and enabling it a little
> later, 172.16.15.137 is not able to join the rest of the group
> members. 172.16.15.137 keeps sending heartbeat messages to
> 172.16.15.130. It does not get any response from 172.16.15.130 and
> 172.16.15.137 ends up transmitting SUSPECT message again and again.
>
>
>
> I think that I am making a mistake in configuration. Can you guys
> point me out? I have tested using jgroups version 2.2.9.2 as well as
> jgroups version 2.4. 172.16.15.130 is a windows XP machine while the
> rest of them are Vmware workstation instances running windows 2000
> professional.
>
>
>
> Thanks,
>
> Shan
>
>
>
> ---------------------- Configuration for machine (172.16.15.137)
> -----------------------------------------------------------------------
>
>
>
> <attribute name="ClusterConfig">
>
> <config>
>
> <TCP start_port="17910" bind_addr="172.16.15.137"/>
>
> <TCPPING
> initial_hosts="172.16.15.130[17910],172.16.15.136[17910]"
> port_range="5" timeout="13000" num_initial_members="2"
> up_thread="true" down_thread="true"/>
>
> <MERGE2 max_interval="20000" min_interval="15000"/>
>
> <FD shun="true" up_thread="true" down_thread="true"
> timeout="5500" max_tries="5" />
>
> <VERIFY_SUSPECT timeout="1500" up_thread="false"
> down_thread="false" />
>
> <pbcast.NAKACK gc_lag="100" retransmit_timeout="13000"
> up_thread="true" down_thread="true" />
>
> <pbcast.STABLE desired_avg_gossip="20000"
> up_thread="false" down_thread="false" />
>
> <pbcast.GMS join_timeout="15000" join_retry_timeout="5000"
> shun="true" print_local_addr="false" down_thread="true"
> up_thread="true" />
>
> <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
>
> </config>
>
> </attribute>
>
>
>
> ---------------------- Configuration for machine (172.16.15.136)
> -----------------------------------------------------------------------
>
>
>
> <attribute name="ClusterConfig">
>
> <config>
>
> <TCP start_port="17910" bind_addr="172.16.15.136"/>
>
> <TCPPING
> initial_hosts="172.16.15.130[17910],172.16.15.137[17910]"
> port_range="5" timeout="13000" num_initial_members="2"
> up_thread="true" down_thread="true"/>
>
> <MERGE2 max_interval="20000" min_interval="15000"/>
>
> <FD shun="true" up_thread="true" down_thread="true"
> timeout="5500" max_tries="5" />
>
> <VERIFY_SUSPECT timeout="1500" up_thread="false"
> down_thread="false" />
>
> <pbcast.NAKACK gc_lag="100" retransmit_timeout="13000"
> up_thread="true" down_thread="true" />
>
> <pbcast.STABLE desired_avg_gossip="20000"
> up_thread="false" down_thread="false" />
>
> <pbcast.GMS join_timeout="15000" join_retry_timeout="5000"
> shun="true" print_local_addr="false" down_thread="true"
> up_thread="true" />
>
> <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
>
> </config>
>
> </attribute>
>
>
>
> ---------------------- Configuration for machine (172.16.15.130)
> -----------------------------------------------------------------------
>
>
>
> <attribute name="ClusterConfig">
>
> <config>
>
> <TCP start_port="17910" bind_addr="172.16.15.130"/>
>
> <TCPPING
> initial_hosts="172.16.15.136[17910],172.16.15.137[17910]"
> port_range="5" timeout="13000" num_initial_members="2"
> up_thread="true" down_thread="true"/>
>
> <MERGE2 max_interval="20000" min_interval="15000"/>
>
> <FD shun="true" up_thread="true" down_thread="true"
> timeout="5500" max_tries="5" />
>
> <VERIFY_SUSPECT timeout="1500" up_thread="false"
> down_thread="false" />
>
> <pbcast.NAKACK gc_lag="100" retransmit_timeout="13000"
> up_thread="true" down_thread="true" />
>
> <pbcast.STABLE desired_avg_gossip="20000"
> up_thread="false" down_thread="false" />
>
> <pbcast.GMS join_timeout="15000" join_retry_timeout="5000"
> shun="true" print_local_addr="false" down_thread="true"
> up_thread="true" />
>
> <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
>
> </config>
>
> </attribute>
>
>
>
>
> The information contained in this electronic mail transmission may be
> privileged and confidential, and therefore, protected from disclosure.
> If you have received this communication in error, please notify us
> immediately by replying to this message and deleting it from your
> computer without copying or disclosing it.
> ------------------------------------------------------------------------
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> ------------------------------------------------------------------------
>
> _______________________________________________
> javagroups-users mailing list
> javagroups-users@xxxxxxxxxxxxxxxxxxxxx
> https://lists.sourceforge.net/lists/listinfo/javagroups-users
>

--
Bela Ban
Lead JGroups / Manager JBoss Clustering Group
JBoss - a division of Red Hat


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise