osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Artemis 2.5.0 - Colocated scaledown cluster issues


On Wed, May 2, 2018 at 3:01 AM, Ilkka Virolainen
<Ilkka.Virolainen@xxxxxxxxxx> wrote:
> Hello,
>
> As well as some previous issues [1] I have some problems with my Artemis cluster. My setup [2] is a symmetric two node cluster of colocated instances with scaledown. As well as the node restart causing a problematic state in replication [1] there are other issues, namely:
>
> 1) After running for approximately two weeks one of the nodes crashed to heap space exhaustion. Heap dump analysis would indicate that this is due to cluster connection failing and millions of messages would end up in the internal store-and-forward queue causing an eventual OOM exception - I guess the internal messages are not paged?

You can configure it to paging...

Also.. on cluster conneciton you can configure the max-retry of the
cluster-connectoin...

I'm not talking about replication here. .this is probably about
another node that still connected.

>
> 2) I have now run the cluster for ~2 weeks and the cluster has ended up in a state where messages are being redistributed from node 1 to node 2 BUT not the other way around. This can be the same issue as 1) but I cannot tell for sure. I tried setting the core server logging level to DEBUG on node 2 and sending messages to a test topic but I get no references to the address name in Artemis logs.

Check what I talked about reconnects on cluster connection.



If you were using master.. there's a way you can consume messages from
the internal queue.. and send them manually using producer /
consumer.. you will need to get a snapshot from master.


>
> I realize that it's difficult to address these problems given the information at hand and due to the problematic nature of the circumstances in which they occur: they (excl. the issue described in [1]) start to appear after running a cluster for a long time and there's no apparent cause or easy way of replication. I would however appreciate if anyone has tips to debug this issue further or has advice on where to look for a probable cause.
>
> - Ilkka
>
> [1] Backup voting issue: http://activemq.2283324.n4.nabble.com/Artemis-2-5-0-Problems-with-colocated-scaledown-td4737583.html#a4737808
> [2] Sample brokers: https://github.com/ilkkavi/activemq-artemis/tree/scaledown-issue/issues/IssueExample/src/main/resources/activemq



-- 
Clebert Suconic