[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

CASSANDRA-13241 lower default chunk_length_in_kb


This is regarding https://issues.apache.org/jira/browse/CASSANDRA-13241

This ticket has languished for a while. IMO it's too late in 4.0 to implement a more memory efficient representation for compressed chunk offsets. However I don't think we should put out another release with the current 64k default as it's pretty unreasonable.

I propose that we lower the value to 16kb. 4k might never be the correct default anyways as there is a cost to compression and 16k will still be a large improvement.

Benedict and Jon Haddad are both +1 on making this change for 4.0. In the past there has been some consensus about reducing this value although maybe with more memory efficiency.

The napkin math for what this costs is:
"If you have 1TB of uncompressed data, with 64k chunks that's 16M chunks at 8 bytes each (128MB).
With 16k chunks, that's 512MB.
With 4k chunks, it's 2G.
Per terabyte of data (pre-compression)."

By way of comparison memory mapping the files has a similar cost per 4k page of 8 bytes. Multiple mappings makes this more expensive. With a default of 16kb this would be 4x less expensive than memory mapping a file. I only mention this to give a sense of the costs we are already paying. I am not saying they are directly related.

I'll wait a week for discussion and if there is consensus make the change.


To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx