Seeing high kswapd usage means there's a lot of churn in the page cache. It doesn't mean you're using swap, it means the box is spending time clearing pages out of the page cache to make room for the stuff you're reading now.
Thanks for your thoughts!
machines don't have enough memory - they are way undersized for a production workload.
Well, they were doing fine since around February this year. The issue started to appear out of the blue sky.
Things that make it worse:
* high readahead (use 8kb on ssd)
* high compression chunk length when reading small rows / partitions. Nobody specifies this, 64KB by default is awful. I almost always switch to 4KB-16KB here but on these boxes you're kind of screwed since you're already basically out of memory.
That's interesting, even though from my understanding Cassandra is mostly doing sequential IO. But I'm not sure this is really relevant to the issue at hand, as reading is done from the root device.
What could it be reading from there? After the JVM has started up and config file is parsed I really don't see why should it read anything additionally. Or am I missing something?
To make it clear: normally the root EBS on these nodes is doing at most 10 reads per second. When the issue starts, reads per second jump to hundreds within few minutes (sometimes there's a preceding period of slow build up, but in the end it's really exponential).
I'd never put Cassandra in production with less than 30GB ram and 8 cores per box.
We had to tweak the heap size once we started to run repairs, because default heuristic aimed too low for us. Otherwise, as I've said we've seen zero problems with our workload.