Since the start of our org, cassandra used to be a SPOF, due to recent priorities we changed our code base so that cassandra won't be SPOF anymore, and during that process we made a kill switch within the code(PHP), this kill switch would ensure that no connection is made to the cassandra for any queries.
During the testing phase of kill switch we have identified a strange behaviour that CPU and Load Average would go down from 400%(cpu), 14-20(load on a 16 core machine) to 20%(cpu), 2-3(load)
and even if the kill switch is activated only for 30 secs, then cpu would go down from 400 to 20, and maintain at 20% for atleast 24 hrs before it starts to increase back to 400 and stay consistent from then. and this is for all the nodes but not just a few.
Cassandra Version: 2.2.4
Number of Nodes: 8
AWS Instance Type: c4.4xlarge
Number of Open Files: 30k to 50k (depending on number of auto scaled php nodes)
Would be grateful for any explanation regarding this strange behaviour
Thanks & Regards
SRE/SDE at Zomato