[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RangeAwareCompaction for manual token management

I don't want to comment on the 10540 ticket since it seems very well
focused on vnode-aligned sstable partitioning and compaction. I'm pretty
excited about that ticket. RACS should enable:

- smaller scale LCS, more constrained I/O consumption
- less sstables to hit in read path
- multithreaded/multiprocessor compactions and even serving of data based
on individual vnode or pools of vnodes
- better alignment of tombstones with data they should be
nullifying/eventually removing
- repair streaming efficiency
- backups have more granularity for not uploading sstables that didn't
change for the range since last backup snapshot

There is ongoing discussions as to using Priam for cluster management where
I am, and as I understand it (superficially) Priam does not use vnodes and
use manual tokens, and expands via node multiples. I believe it has certain
advantages over vnodes including expanding by multiple machines at once,
backups could possibly do (nodecount / RF) number of nodes for data backups
rather than the mess of vnodes where you have to do basically all of them.

But we could still do some divisor split of the manual range and apply RACS
to that. I guess this would be vnode-lite. We could have some number like
100 subranges on a  node and expansion might just involve temporary lower
bound count of subranges until the sstables can be reprocessed to the
typical subrange count.

Is this theoretically correct, or are there glaring things I might have
missed with respect to RACS-style compaction and manual tokens?