osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fwd: Re: How to gracefully decommission a highly loaded node?


After few hours, i just removed the node. done another node decommissioned, which finished successfully (the writer app was down, so no pressure on the cluster) 
Started another node decommission (third), Since didn't have time to wait for decommissioning to finish, i started the writer Application, when almost most of decommissioning-node's streaming was done and only a few GBs to two other nodes remained to be streamed.
After 12 Hours i checked the decommissioning node  and netstats says: LEAVING, Restore Replica Count....!
So just ran removednode on this one too.
Is there something wrong with decommissioning while someones writing to Cluster?
Using Apache Cassandra 3.11.2

Sent using Zoho Mail



============ Forwarded message ============
From : onmstester onmstester <onmstester@xxxxxxxx.INVALID>
To : "user"<user@xxxxxxxxxxxxxxxxxxxx>
Date : Wed, 05 Dec 2018 09:00:34 +0330
Subject : Fwd: Re: How to gracefully decommission a highly loaded node?
============ Forwarded message ============

After a long time stuck in LEAVING, and "not doing any streams", i killed Cassandra process and restart it, then again ran nodetool decommission (Datastax recipe for stuck decommission),
now it says, LEAVING, "unbootstrap $(the node id)"

What's going on? Should i forget about decommission and just remove the node?

There is an issue to make decommission resumable:

but i couldn't figure out how this suppose to work? I was expecting that after restarting stucked-decommission-cassandra, it resume the decommissioning process, but the node became UN after restart.

Sent using Zoho Mail



============ Forwarded message ============
From : Simon Fontana Oscarsson <simon.fontana.oscarsson@xxxxxxxxxxxx>
Date : Tue, 04 Dec 2018 15:20:15 +0330
Subject : Re: How to gracefully decommission a highly loaded node?
============ Forwarded message ============





---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xxxxxxxxxxxxxxxxxxxx
For additional commands, e-mail: user-help@xxxxxxxxxxxxxxxxxxxx
Hi,

If it already uses 100 % CPU I have a hard time seeing it being able to do a decomission while serving requests. If you have a lot of free space I would first try nodetool disableautocompaction. If you don't see any progress in nodetool netstats you can also disablebinary, disablethrift and disablehandoff to stop serving client requests. 

--
SIMON FONTANA OSCARSSON
Software Developer

Ericsson
Ölandsgatan 1
37133 Karlskrona, Sweden
simon.fontana.oscarsson@xxxxxxxxxxxx
www.ericsson.com

On tis, 2018-12-04 at 14:21 +0330, onmstester onmstester wrote:

One node suddenly uses 100% CPU, i suspect hardware problems and do not have time to trace that, so decided to just remove the node from the cluster, but although the node state changed to UL, but no sign of Leaving: the node is still compacting and flushing memtables, writing mutations and CPU is 100% for hours since.
Is there any means to force a Cassandra Node to just decommission and stop doing normal things?
Due to W.CL=ONE, i can not use removenode and shutdown the node

Best Regards

Sent using Zoho Mail




Attachment: smime.p7s
Description: Binary data

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xxxxxxxxxxxxxxxxxxxx
For additional commands, e-mail: user-help@xxxxxxxxxxxxxxxxxxxx