[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Compaction strategy for update heavy workload


I wouldn't use TWCS if there's updates, you're going to risk having
data that's never deleted and really small sstables sticking around
forever. 
How do you risk having data sticking around forever when everything is TTL'd? 

If you use really large buckets, what's the point of TWCS?
No one said anything about really large buckets. I'd also note that if the data was so small per partition it would be entirely reasonable to not bucket by partition key (and window) and thus updates would become irrelevant.

Honestly this is such a small workload you could easily use STCS or
LCS and you'd likely never, ever see a problem.

While the numbers sound small, there must be some logical reason to have so many nodes. In my experience STCS and LCS both have their own drawbacks in regards to updates, more so when you have high data density, which sounds like it might be the case here. It's not hard to test these things and it's important to get these things right at the start to save yourself some serious pain down the track.

On 13 June 2018 at 22:41, Jonathan Haddad <jon@xxxxxxxxxxxxx> wrote:
I wouldn't use TWCS if there's updates, you're going to risk having
data that's never deleted and really small sstables sticking around
forever.  If you use really large buckets, what's the point of TWCS?

Honestly this is such a small workload you could easily use STCS or
LCS and you'd likely never, ever see a problem.
On Wed, Jun 13, 2018 at 3:34 PM kurt greaves <kurt@xxxxxxxxxxxxxxx> wrote:
>
> TWCS is probably still worth trying. If you mean updating old rows in TWCS "out of order updates" will only really mean you'll hit more SSTables on read. This might add a bit of complexity in your client if your bucketing partitions (not strictly necessary), but that's about it. As long as you're not specifying "USING TIMESTAMP" you still get the main benefit of efficient dropping of SSTables - C* only cares about the write timestamp of the data in regards to TTL's, not timestamps stored in your partition/clustering key.
> Also keep in mind that you can specify the window size in TWCS, so if you can increase it enough to cover the "out of order" updates then that will also solve the problem w.r.t old buckets.
>
> In regards to LCS, the only way to really know if it'll be too much compaction overhead is to test it, but for the most part you should consider your read/write ratio, rather than the total number of reads/writes (unless it's so small that it's irrelevant, which it may well be).
>
> On 13 June 2018 at 19:25, manuj singh <s.manuj545@xxxxxxxxx> wrote:
>>
>> Hi all,
>> I am trying to determine compaction strategy for our use case.
>> In our use case we will have updates on a row a few times. And we have a ttl also defined on the table level.
>> Our typical workload is less then 1000 writes + reads per second. At the max it could go up to 2500 per second.
>> We use SSD and have around 64 gb of ram on each node. Our cluster size is around 70 nodes.
>>
>> I looked at time series but we cant guarantee that the updates will happen within a give time window. And if we have out of order updates it might impact on when we remove that data from the disk.
>>
>> So i was looking at level tiered, which supposedly is good when you have updates. However its io bound and will affect the writes. everywhere i read it says its not good for write heavy workload.
>> But Looking at our write velocity, is it really write heavy ?
>>
>> I guess what i am trying to find out is will level tiered compaction will impact the writes in our use case or it will be fine given our write rate is not that much.
>> Also is there anything else i should keep in mind while deciding on the compaction strategy.
>>
>> Thanks!!
>
>


--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@xxxxxxxxxxxxxxxxxxxx