[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fwd: Re: A quick question on unlogged batch

unlogged batch meaningfully outperforms parallel execution of individual statements, especially at scale, and creates lower memory pressure on both the clients and cluster. 

They do outperform parallel individuals, but in cost of higher pressure on coordinators which leads to more blocked Natives and dropped mutations,
Actually i think that 10-20% better write performance + 20-30% less CPU usage on client machines (we don't care about client machines in compare with cluster machines) which is outcome of batch statements with multiple partitions on each batch, does not worth it, because less-busy cluster nodes are needed to answer read queries, compactions, repairs, etc

The biggest major downside to unlogged batches are that the unit of retry during failure is the entire batch.  So if you use a retry policy, write timeouts will tip over your cluster a lot faster than individual statements.  Bounding your batch sizes helps mitigate this risk.  

I assume that in most scenarios, the client machines are in the same network with Cassandra cluster, so is it still faster?

Thank you all. Now I understand whether to use batch or asynchronous writes really depends on use case. Till now batch writes work for me in a 8 nodes cluster with over 500 million requests per day.

Did you compare the cluster performance including blocked natives, dropped mutations, 95 percentiles, cluster CPU usage, etc  in two scenarios (batch vs single)?
Although 500M per day is not so much for 8 nodes cluster (if the node spec is compliant with datastax recommendations) and async single statements could handle it (just demands high CPU on client machine), the impact of such things (non compliant batch statements annoying the cluster) would show up after some weeks, when suddenly a lot of cluster tasks need to be run simultaneously; one or two big compactions are running on most of the nodes, some hinted hand offs and cluster could not keep up and starts to became slower and slower.
The way to prevent it sooner, would be keep the error counters as low as possible, things like blocked NTPs, dropped, errors, hinted hinted hand-offs, latencies, etc.