logo       

Re: [ERROR] NAKACK.handleXmitReq(): msg#00063

java.javagroups.general

Subject: Re: [ERROR] NAKACK.handleXmitReq()

On Monday 25 September 2006 11:03, Bela Ban wrote:
> It is almost impossible to see from those snippets what caused your
> OOMs. Did you run *exactly* the same configuration I emailed you ? You
> probably need to connect to one of the instances with a profiler, to see
> what's causing the leak.
> Can you post the config you used, plus the VM startup options again ?
>

Hi Bela,

the protocol stack template is the following (BIND_ADDRESS and
MCAST_ADDRESS are variables).

<config>
<TCP_NIO bind_addr=BIND_ADDRESS
tcp_nodelay="true"
recv_buf_size="20000000"
send_buf_size="640000"
loopback="true"
discard_incompatible_packets="true"
max_bundle_size="64000"
max_bundle_timeout="30"
use_incoming_packet_handler="true"
use_outgoing_packet_handler="false"
down_thread="false" up_thread="false"
enable_bundling="true"
start_port="7800"
use_send_queues="false"
sock_conn_timeout="300"
skip_suspected_members="true"
reader_threads="3"
writer_threads="3"
processor_threads="3"
processor_minThreads="3"
processor_maxThreads="3"
processor_queueSize="100"
processor_keepAliveTime="-1"/>
<MPING timeout="2000"
num_initial_members="2"
mcast_addr=MCAST_ADDRESS
bind_addr=BIND_ADDRESS
down_thread="false" up_thread="false"/>
<MERGE2 max_interval="100000"
down_thread="false"
up_thread="false"
min_interval="20000"/>
<FD_SOCK down_thread="false"
up_thread="false"/>
<FD timeout="10000"
max_tries="5"
down_thread="false"
up_thread="false"
shun="true"/>
<VERIFY_SUSPECT timeout="1500"
down_thread="false"
up_thread="false"/>
<pbcast.NAKACK max_xmit_size="60000"
use_mcast_xmit="false"
gc_lag="0"
retransmit_timeout="300,600,1200,2400,4800"
down_thread="false"
up_thread="false"
discard_delivered_msgs="true"/>
<pbcast.STABLE stability_delay="1000"
desired_avg_gossip="50000"
down_thread="false"
up_thread="false"
max_bytes="2000000"/>
<pbcast.GMS print_local_addr="true"
join_timeout="3000"
down_thread="false" up_thread="false"
join_retry_timeout="2000" shun="true"
view_bundling="true"/>
<FC max_credits="2000000"
down_thread="false"
up_thread="false"
min_threshold="0.10"/>
<FRAG2 frag_size="60000"
down_thread="false"
up_thread="false"/>
<pbcast.STREAMING_STATE_TRANSFER down_thread="false"
up_thread="false"
use_flush="true"
use_reading_thread="true"/>
<!-- pbcast.STATE_TRANSFER down_thread="false"
up_thread="false"
use_flush="false"/ -->
<pbcast.FLUSH down_thread="false"
up_thread="false"/>
</config>

VM options used:
-Xms512M
-Xmx512M
-XX:+PrintClassHistogram
-XX:MaxPermSize=364M
-XX:PermSize=364M
-XX:MaxNewSize=348m
-XX:NewSize=348m
-XX:SurvivorRatio=128
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:MaxTenuringThreshold=0
-XX:CMSInitiatingOccupancyFraction=60
-XX:+DisableExplicitGC
-verbose:gc
-Xloggc:/project/comfw/perf_test_output/oks_1/gc.log
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime
-server

-Dlog4j.configuration=file:///home/comfw/public/perf_tests/config/log4j.properties

-Dresolve.dns=false
-Dbind.address=BIND_ADDRESS
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=11138
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false

Perf test configuration file:
transport=org.jgroups.tests.perf.transports.JGroupsTransport
num_msgs=2000000000
msg_size=256
num_members=3
num_senders=3
log_interval=1000000
processing_delay=0
dump_transport_stats=false
jmx=false
start_port=7800

Here you have the GC dump obtained with kill -QUIT executed in all members, it
can be useful to understand where it is the leak.

oks: 3645.109
num #instances #bytes class name
--------------------------------------
2: 52861 2960216
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap
3: 52861 2548240
[LEDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$Entry;
4: 105766 2538384
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$Entry
6: 52837 1690784 org.jgroups.Message
9: 45048 1441536 org.jgroups.protocols.pbcast.NakAckHeader
17: 30157 482512
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$EntrySet
18: 52861 422888
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$BarrierLock
19: 22316 357056 org.jgroups.util.Queue$Element
20: 14909 238544 org.jgroups.protocols.TpHeader

oks: 22128.299
num #instances #bytes class name
--------------------------------------
3: 30619 1714664
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap
4: 61275 1470600
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$Entry
6: 30619 1374544
[LEDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$Entry;
9: 30597 979104 org.jgroups.Message
10: 30549 977568 org.jgroups.protocols.pbcast.NakAckHeader
17: 24510 392160
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$EntrySet
18: 30619 244952
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$BarrierLock
24: 6074 97184 org.jgroups.util.Queue$Element
25: 6058 96928 org.jgroups.protocols.TpHeader

oks: 138294.544:
num #instances #bytes class name
--------------------------------------
6: 12830 718480
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap
9: 25677 616248
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$Entry
10: 12830 616096
[LEDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$Entry;
12: 12870 411840 java.util.TreeMap$Entry
13: 12818 410176 org.jgroups.Message
14: 12817 410144 org.jgroups.protocols.pbcast.NakAckHeader
17: 12824 205184
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$EntrySet
23: 12830 102640
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$BarrierLock


vezer (coordinator) 3603.415
num #instances #bytes class name
--------------------------------------
4: 19099 1069544
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap
6: 38225 917400
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$Entry
7: 19099 916976
[LEDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$Entry;
11: 19132 612224 java.util.TreeMap$Entry
12: 19082 610624 org.jgroups.Message
13: 19081 610592 org.jgroups.protocols.pbcast.NakAckHeader
16: 19095 305520
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$EntrySet
22: 19099 152792
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$BarrierLock

vezer: 22118.898
num #instances #bytes class name
--------------------------------------
3: 30924 1731744
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap
4: 61879 1485096
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$Entry
5: 30924 1480560
[LEDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$Entry;
10: 30906 988992 org.jgroups.Message
11: 27871 891872 org.jgroups.protocols.pbcast.NakAckHeader
12: 21605 691360 java.util.TreeMap$Entry
17: 21566 345056
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$EntrySet
18: 30924 247392
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$BarrierLock
23: 9265 148240 org.jgroups.util.Queue$Element
25: 6319 101104 org.jgroups.protocols.TpHeader
29: 701 37320 [Ljava.lang.Object;

oster: 3672.489
num #instances #bytes class name
--------------------------------------
4: 24071 1347976
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap
6: 24071 1160944
[LEDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$Entry;
7: 48166 1155984
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$Entry
9: 24056 769792 org.jgroups.Message
10: 23199 742368 org.jgroups.protocols.pbcast.NakAckHeader
11: 21871 699872 java.util.TreeMap$Entry
16: 21831 349296
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$EntrySet
19: 24071 192568
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$BarrierLock
27: 2124 33984 org.jgroups.util.Queue$Element
29: 370 29600 java.lang.reflect.Method
30: 1381 22096 org.jgroups.protocols.TpHeader


oster: 22129.201
num #instances #bytes class name
--------------------------------------
2: 35135 1967560
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap
3: 70303 1687272
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$Entry
4: 35135 1686704
[LEDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$Entry;
8: 35115 1123680 org.jgroups.Message
9: 35114 1123648 org.jgroups.protocols.pbcast.NakAckHeader
15: 35131 562096
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$EntrySet
17: 35135 281080
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$BarrierLock

oster: 138499.082:
num #instances #bytes class name
--------------------------------------
1: 20093 2344256 <no name>
2: 20093 1451128 <methodKlass>
3: 32338 1223552 <symbolKlass>
4: 1826 1001344 <constantPoolKlass>
5: 1826 666120 <instanceKlassKlass>
6: 1701 663264 <constantPoolCacheKlass>
7: 2129 661616 [B
8: 7457 497712 [C
9: 950 237072 <methodDataKlass>
10: 7893 189432 java.lang.String
11: 2008 176704 java.lang.Class
12: 2391 172504 [I
13: 2649 172320 [S
14: 2887 133400 [[I
15: 474 50000 [Ljava.util.HashMap$Entry;
16: 1881 45144 java.util.HashMap$Entry
17: 153 44064 <objArrayKlassKlass>
18: 607 32192 [Ljava.lang.Object;
19: 370 29600 java.lang.reflect.Method
20: 323 20672 java.lang.reflect.Constructor
21: 784 18816 java.util.Hashtable$Entry
22: 458 18320 java.util.HashMap
23: 521 16672 java.lang.ref.SoftReference
24: 412 13184 java.util.LinkedHashMap$Entry
25: 151 12344 [Ljava.util.Hashtable$Entry;
26: 580 9280 java.lang.Integer
27: 369 6720 [Ljava.lang.Class;
28: 335 5360 java.lang.Long
29: 123 4920 java.util.Hashtable
30: 49 4704 java.io.ObjectStreamClass
31: 182 4368 java.util.Vector
32: 119 4064 [Ljava.lang.String;
33: 100 4000 sun.misc.SoftCache$ValueCell
34: 159 3816 java.util.ArrayList
35: 153 3672 java.security.Provider$ServiceKey
36: 149 3576 javax.management.MBeanAttributeInfo
37: 142 3408 sun.management.MXBeanSupport$AttributeMethod
38: 58 3248 sun.reflect.DelegatingClassLoader
39: 77 3080 java.math.BigInteger
40: 55 3080 java.net.URL
41: 32 3072 java.lang.Thread
42: 122 2928 sun.security.util.ObjectIdentifier
43: 36 2880 [Ljava.lang.ThreadLocal$ThreadLocalMap$Entry;
44: 169 2704 java.util.jar.Attributes$Name
45: 20 2576 [J
46: 52 2496 java.security.Provider$Service
47: 73 2336 java.lang.ref.Finalizer
48: 8 2304 <typeArrayKlassKlass>
49: 94 2256 org.jgroups.util.List$Element
50: 92 2208 sun.reflect.NativeConstructorAccessorImpl
51: 129 2064 java.util.jar.Attributes
52: 31 1984 sun.security.pkcs11.SunPKCS11$P11Service
53: 49 1960 java.util.WeakHashMap$Entry
54: 241 1928 java.lang.Object
55: 59 1888 java.io.ObjectStreamField
56: 116 1856 <compiledICHolderKlass>
57: 57 1824 java.util.TreeMap$Entry
58: 31 1736 java.nio.DirectByteBuffer
59: 24 1728 java.lang.reflect.Field
60: 70 1680 java.lang.ref.WeakReference
61: 104 1664 sun.reflect.DelegatingConstructorAccessorImpl
62: 17 1632 java.util.jar.JarFile$JarFileEntry
63: 10 1592 [Z
64: 49 1568 java.lang.ThreadLocal$ThreadLocalMap$Entry
65: 13 1560 java.net.SocksSocketImpl
66: 24 1536 java.util.jar.JarFile
67: 48 1536 java.util.concurrent.ConcurrentHashMap$Segment
68: 60 1440 sun.reflect.generics.tree.SimpleClassTypeSignature
69: 90 1440 java.util.HashSet
70: 35 1400 org.apache.log4j.Logger
71: 57 1368
EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap$Entry
72: 42 1344 sun.security.pkcs11.SunPKCS11$Descriptor

best regards
gianluca

--
Gianluca Puggelli
skype:pugg1138

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise