[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Artemis 2.5.0 - Replication Failure During Compacting


Hello,

I have a replicating master-slave -pair of Artemis 2.5.0. It seems to mainly work as it should, but it started logging replication failures after having run for a week. I have included my broker configuration in [1]. There are also exceptions on the client side after the replication problems surfaced [2].  Why would replication suddenly start failing without an apparent reason? So far I have encountered issues with every HA strategy I have tried.

I started both nodes at 2018-05-08 and yesterday, 2018-05-14 node a, the master, logs (multiple times):

09:08:47,318 WARN  [org.apache.activemq.artemis.core.server] Error during message dispatch: java.lang.RuntimeException: ActiveMQIllegalStateException[errorType=ILLEGAL_STATE message=File 126743097.msg has a null channel]
                             at org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl.getReadOnlyBodyBuffer(LargeServerMessageImpl.java:216) [artemis-server-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.core.protocol.openwire.OpenWireMessageConverter.toAMQMessage(OpenWireMessageConverter.java:516) [artemis-openwire-protocol-2.5.0.jar:]
                             at org.apache.activemq.artemis.core.protocol.openwire.OpenWireMessageConverter.createMessageDispatch(OpenWireMessageConverter.java:492) [artemis-openwire-protocol-2.5.0.jar:]
                             at org.apache.activemq.artemis.core.protocol.openwire.amq.AMQConsumer.handleDeliver(AMQConsumer.java:227) [artemis-openwire-protocol-2.5.0.jar:]
                             at org.apache.activemq.artemis.core.protocol.openwire.amq.AMQSession.sendMessage(AMQSession.java:312) [artemis-openwire-protocol-2.5.0.jar:]
                             at org.apache.activemq.artemis.core.server.impl.ServerConsumerImpl.deliverStandardMessage(ServerConsumerImpl.java:1099) [artemis-server-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.core.server.impl.ServerConsumerImpl.proceedDeliver(ServerConsumerImpl.java:462) [artemis-server-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.core.server.impl.QueueImpl.proceedDeliver(QueueImpl.java:2900) [artemis-server-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.core.server.impl.QueueImpl.deliver(QueueImpl.java:2374) [artemis-server-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.core.server.impl.QueueImpl.access$2000(QueueImpl.java:105) [artemis-server-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:3173) [artemis-server-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) [artemis-commons-2.5.0.jar:2.5.0]
                             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_162]
                             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_162]
                             at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.5.0.jar:2.5.0]
Caused by: ActiveMQIllegalStateException[errorType=ILLEGAL_STATE message=File 126743097.msg has a null channel]
                             at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.read(NIOSequentialFile.java:163) [artemis-journal-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.read(NIOSequentialFile.java:155) [artemis-journal-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl.getReadOnlyBodyBuffer(LargeServerMessageImpl.java:213) [artemis-server-2.5.0.jar:2.5.0]
                             ... 16 more


And node b. the slave, logs (multiple times):

14:07:43,549 WARN  [org.apache.activemq.artemis.journal] AMQ142011: Error on reading compacting for JournalFileImpl: (activemq-data-115.amq id = 141, recordID = 141)
14:07:43,572 ERROR [org.apache.activemq.artemis.journal] AMQ144003: Error compacting: java.lang.Exception: Error on reading compacting for JournalFileImpl: (activemq-data-115.amq id = 141, recordID = 141)
                             at org.apache.activemq.artemis.core.journal.impl.JournalImpl.compact(JournalImpl.java:1569) [artemis-journal-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.core.journal.impl.JournalImpl$14.run(JournalImpl.java:2111) [artemis-journal-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) [artemis-commons-2.5.0.jar:2.5.0]
                             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_152]
                             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_152]
                             at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.5.0.jar:2.5.0]
Caused by: java.lang.Exception: Inconsistency during compacting: RollbackRecord ID = 126646250 for an already rolled back transaction during compacting
                             at org.apache.activemq.artemis.core.journal.impl.JournalImpl.readJournalFile(JournalImpl.java:761) [artemis-journal-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.core.journal.impl.JournalImpl.compact(JournalImpl.java:1566) [artemis-journal-2.5.0.jar:2.5.0]
                             ... 7 more
Caused by: java.lang.IllegalStateException: Inconsistency during compacting: RollbackRecord ID = 126646250 for an already rolled back transaction during compacting
                             at org.apache.activemq.artemis.core.journal.impl.JournalCompactor.onReadRollbackRecord(JournalCompactor.java:349) [artemis-journal-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.core.journal.impl.JournalImpl.readJournalFile(JournalImpl.java:733) [artemis-journal-2.5.0.jar:2.5.0]
                             ... 8 more

Then when I restart node b, the master is unable to start replication at first:

08:58:28,580 WARN  [org.apache.activemq.artemis.journal] AMQ142011: Error on reading compacting for JournalFileImpl: (activemq-data-155.amq id = 155, recordID = 155)
08:58:28,607 ERROR [org.apache.activemq.artemis.journal] AMQ144003: Error compacting: java.lang.Exception: Error on reading compacting for JournalFileImpl: (activemq-data-155.amq id = 155, recordID = 155)
                             at org.apache.activemq.artemis.core.journal.impl.JournalImpl.compact(JournalImpl.java:1569) [artemis-journal-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.core.journal.impl.JournalImpl$12.run(JournalImpl.java:1466) [artemis-journal-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) [artemis-commons-2.5.0.jar:2.5.0]
                             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_162]
                             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_162]
                             at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.5.0.jar:2.5.0]
Caused by: java.lang.Exception: Inconsistency during compacting: RollbackRecord ID = 133826263 for an already rolled back transaction during compacting
                             at org.apache.activemq.artemis.core.journal.impl.JournalImpl.readJournalFile(JournalImpl.java:761) [artemis-journal-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.core.journal.impl.JournalImpl.compact(JournalImpl.java:1566) [artemis-journal-2.5.0.jar:2.5.0]
                             ... 7 more
Caused by: java.lang.IllegalStateException: Inconsistency during compacting: RollbackRecord ID = 133826263 for an already rolled back transaction during compacting
                             at org.apache.activemq.artemis.core.journal.impl.JournalCompactor.onReadRollbackRecord(JournalCompactor.java:349) [artemis-journal-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.core.journal.impl.JournalImpl.readJournalFile(JournalImpl.java:733) [artemis-journal-2.5.0.jar:2.5.0]
                             ... 8 more

08:58:28,608 WARN  [org.apache.activemq.artemis.core.server] AMQ222013: Error when trying to start replication: java.lang.RuntimeException: Error during compact, look at the logs
                             at org.apache.activemq.artemis.core.journal.impl.JournalImpl.scheduleCompactAndBlock(JournalImpl.java:1481) [artemis-journal-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.startReplication(JournalStorageManager.java:553) [artemis-server-2.5.0.jar:2.5.0]
                             at org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation$2.run(SharedNothingLiveActivation.java:178) [artemis-server-2.5.0.jar:2.5.0]
                             at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_162]

The replication continues normally after I restart node b a second time.

[1] Sample broker configurations https://github.com/ilkkavi/activemq-artemis/blob/scaledown-issue/issues/IssueExample/src/main/resources/replication/
[2] Sample of client side errors (some details omitted):

23:59:23.939 [container-245] WARN  com.atomikos.jms.AbstractJmsProxy - Error delegating call to createQueue on JMS driver
javax.jms.JMSException: AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 45
                             at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:399)
                             at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:318)
                             at org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.queueQuery(ActiveMQSessionContext.java:254)
                             at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.queueQuery(ClientSessionImpl.java:344)
                             at org.apache.activemq.artemis.jms.client.ActiveMQSession.lookupQueue(ActiveMQSession.java:1064)
                             at org.apache.activemq.artemis.jms.client.ActiveMQSession.createQueue(ActiveMQSession.java:358)
...
Caused by: org.apache.activemq.artemis.api.core.ActiveMQConnectionTimedOutException: AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 45
                             ... 78 common frames omitted
23:59:23.939 [container-245] ERROR o.s.j.l.a.MessageListenerAdapter - Listener execution failed
org.springframework.jms.listener.adapter.ListenerExecutionFailedException: Listener method 'handleMessage' threw exception; nested exception is org.springframework.jms.UncategorizedJmsException: Uncategorized exception occurred during JMS processing; nested exception is javax.jms.JMSException: AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 45
                             at org.springframework.jms.listener.adapter.MessageListenerAdapter.invokeListenerMethod(MessageListenerAdapter.java:322)
                             at org.springframework.jms.listener.adapter.MessageListenerAdapter.onMessage(MessageListenerAdapter.java:243)
                             at java.lang.Thread.run(Thread.java:748)
Caused by: org.springframework.jms.UncategorizedJmsException: Uncategorized exception occurred during JMS processing; nested exception is javax.jms.JMSException: AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 45

                             at org.springframework.jms.core.JmsTemplate.convertAndSend(JmsTemplate.java:658)
                             ...
                             ... 12 common frames omitted
Caused by: javax.jms.JMSException: AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 45
                             at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:399)
                             at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:318)
                             at org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.queueQuery(ActiveMQSessionContext.java:254)
                             at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.queueQuery(ClientSessionImpl.java:344)
                             at org.apache.activemq.artemis.jms.client.ActiveMQSession.lookupQueue(ActiveMQSession.java:1064)
                             at org.apache.activemq.artemis.jms.client.ActiveMQSession.createQueue(ActiveMQSession.java:358)
                             at sun.reflect.GeneratedMethodAccessor114.invoke(Unknown Source)
                             at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                             at java.lang.reflect.Method.invoke(Method.java:498)
                             at com.atomikos.jms.AtomikosJmsXaSessionProxy.invoke(AtomikosJmsXaSessionProxy.java:174)
                             at com.sun.proxy.$Proxy185.createQueue(Unknown Source)
                             at org.springframework.jms.support.destination.DynamicDestinationResolver.resolveQueue(DynamicDestinationResolver.java:84)
                             at org.springframework.jms.support.destination.DynamicDestinationResolver.resolveDestinationName(DynamicDestinationResolver.java:58)
                             at org.springframework.jms.support.destination.JmsDestinationAccessor.resolveDestinationName(JmsDestinationAccessor.java:114)
                             at org.springframework.jms.core.JmsTemplate.access$200(JmsTemplate.java:90)
                             at org.springframework.jms.core.JmsTemplate$4.doInJms(JmsTemplate.java:573)
                             at org.springframework.jms.core.JmsTemplate.execute(JmsTemplate.java:484)
                             ... 61 common frames omitted
Caused by: org.apache.activemq.artemis.api.core.ActiveMQConnectionTimedOutException: AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 45
                             ... 78 common frames omitted
23:59:58.166 [Atomikos:2596]   WARN  o.a.activemq.artemis.core.client - AMQ212052: Packet PACKET(SessionQueueQueryResponseMessage_V2)[type=-7, channelID=12, packetObject=SessionQueueQueryResponseMessage_V2, address=jms.queue.queue.name, name=queue.name, consumerCount=1, filterString=null, durable=true, exists=true, temporary=false, messageCount=0, autoCreationEnabled=true] was answered out of sequence due to a previous server timeout and it's being ignored
java.lang.Exception: trace
                             at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:384)
                             at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:318)
                             at org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.xaEnd(ActiveMQSessionContext.java:383)
                             at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.end(ClientSessionImpl.java:1173)
                             at com.atomikos.datasource.xa.XAResourceTransaction.suspend(XAResourceTransaction.java:253)
                             at com.atomikos.datasource.xa.XAResourceTransaction.rollback(XAResourceTransaction.java:448)
                             at com.atomikos.icatch.imp.RollbackMessage.send(RollbackMessage.java:47)
                             at com.atomikos.icatch.imp.RollbackMessage.send(RollbackMessage.java:20)
                             at com.atomikos.icatch.imp.PropagationMessage.submit(PropagationMessage.java:67)
                             at com.atomikos.icatch.imp.Propagator$PropagatorThread.run(Propagator.java:63)
                             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                             at java.lang.Thread.run(Thread.java:748)