[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Compacting more than the actual used space

Hi Alexander

Thanks. Using the compression ratio, the sizes check out.

Regarding the new values for compaction throughput, that explains it then. We are using 2.1. :-)

Pedro Gordo

On Mon, 5 Nov 2018 at 19:53, Alexander Dejanovski <alex@xxxxxxxxxxxxxxxxx> wrote:
You can check cfstats to see what's the compression ratio.
It's totally possible to have the values you're reporting as a compression ratio of 0.2 is quite common depending on the data you're storing (compressed size is then 20% of the original data).

Compaction throughput changes are taken into account for running compactions starting with Cassandra 2.2 if I'm correct. Your compaction could be bound by cpu, not I/O in that case.


Le lun. 5 nov. 2018 à 20:41, Pedro Gordo <pedro.gordo1986@xxxxxxxxx> a écrit :

We have an ongoing compaction for roughly 2.5 TB, but "nodetool status" reports a load of 1.09 TB. Even if we take into account that the load presented by "nodetool status" is the compressed size, I very much doubt that compression would work to reduce from 2.5 TB to 1.09.
We can also take into account that, even if this is the biggest table, there are other tables in the system, so the 1.09 TB reported is not just for the table being compacted.

What could lead to results like this? We have 4 attached volumes for data directories. Could this be a likely cause for such discrepancy?

Bonus question: changing the compaction throughput to 0 (removing the throttling), had no impacts in the current compaction. Do new compaction throughput values only come into effect when a new compaction kicks in?


Pedro Gordo
Alexander Dejanovski

Apache Cassandra Consulting