I just saw this message still has no response.
Do we have any inbuilt features to log slow\resource heavy queries?
I think we don't have this in Cassandra yet. I also believe it's in DSE, if it's something you'd really need :).
I tried to check data in system_traces.sessions to check current running sessions but even that does not have any data.
This is expected. Those tables ('sessions' and 'events') are empty unless you run: `nodetool settraceprobability 0.001` or similar.
Here you say: set the probability to actually trace a query to 0.1%.
This is off by default and can do a lot of damage if set to a high value as each query being tracked will do more writes in Cassandra. Move up incrementally with tracing until you go enough data representative enough of the problems you're facing.
That being said, it's very rare I need to trace queries to understand what's wrong with the cluster. Generally, I find a lot of useful information in one of those when I'm facing issues:
- `nodetool tpstats` - Any pending / dropped requests?
- `grep -e "WARN" -e "ERROR" /var/log/cassandra/system.log` - Any error or warning explaining faced issues? Check here if:
-- tombstones might be an issue
-- Large partitions are being compacted (how big?)
- `nodetool (proxy)histograms` to see the latencies, number of sstables touched per read and more stuff.
I'm asking this because I can see 2 nodes in my cluster going Out Of Memory multiple times but we're not able to find the reason for it. I'm suspecting that it is a heavy read query but can't find any logs for it.
Ah. Is the Out of Memory inside the heap or of the native memory?
If it's a heap issue, tuning GC is probably the way to go and this information might help:
(all come with distinct consequences/tradeoffs of course)
- Reduce the heap size?
- Reduce the sizes of indexes by increasing the `min/max_index_interval` of the biggest tables?
- Use more memory?
- Reduce `bloom_filter_fp_chance` ?
Finally, I'm a big fan of Monitoring. With the proper monitroring system AND dashboards in place, you would probably see what's wrong at first sight. Then understanding it or fixing it can take some time, but Monitoring makes it really easy to see 'something' is wrong, a spike somewhere in some chart and start digging from there. Many providers are now offering Dashboards out of the box for Cassandra (I worked on Datadog ones, but other tools have it such as Sematext SPM). Also, on open source tools Criteo released Cassandra Monitoring related systems working on the top of Prometheus. People also use Grafana/Graphite and other standard tools. You might find nice dashboards there too.
I hope this will help (if it's not too late :)).
France / Spain
The Last Pickle - Apache Cassandra Consulting