That’s a very good point from Sylvain that I forgot/missed. That said, we’ve seen plenty of scenarios where overall system throughput is improved through unlogged batches. One of my colleagues did quite a bit of benchmarking on this topic for his talk at last year’s C* summit: http://www.slideshare.
net/DataStax/microbatching- highperformance-writes-adam- zegelin-instaclustr-cassandra- summit-2016On Thu, 9 Feb 2017 at 20:52 Benjamin Roth <benjamin.roth@xxxxxxxxx> wrote:Ok got it.But it's interesting that this is supported:DELETE/SELECT FROM ks.cf WHERE (pk1) IN ((1), (2), (3));This is technically mostly the same (Token awareness, coordination/routing, read performance, ...), right?2017-02-09 10:43 GMT+01:00 Sylvain Lebresne <sylvain@xxxxxxxxxxxx>:This is a statement on multiple partitions and there is really no optimization the code internally does on that. In fact, I strongly advise you to not use a batch but rather simply do a for loop client side and send statement individually. That way, your driver will be able to use proper token-awareness for each request (while if you send a batch, one coordinator will be picked up and will have to forward most statement, doing more network hops at the end of the day). The only case where using a batch is indeed legit is if you care about all the statement being atomic, but in that case it's a logged batch you want.That's btw more or less why we never bothered implementing that: it's totally doable technically, but it's not really such a good idea performance wise in practice most of the time, and you can easily work it around with a batch if you need atomicity.Which is not saying it will never be and shouldn't be supported btw, there is something to be said for the consistency of the CQL language in general. But it's why no-one took time to do it so far.On Thu, Feb 9, 2017 at 10:36 AM, Benjamin Roth <benjamin.roth@xxxxxxxxx> wrote:Yes, thats the workaround - I'll try that.Would you agree it would be better for internal optimizations to process this within a single statement?2017-02-09 10:32 GMT+01:00 Ben Slater <ben.slater@xxxxxxxxxxxxxxx>:Yep, that makes it clear. I think an unlogged batch of prepared statements with one statement per PK tuple would be roughly equivalent? And probably no more complex to generate in the client?On Thu, 9 Feb 2017 at 20:22 Benjamin Roth <benjamin.roth@xxxxxxxxx> wrote:Maybe that makes it clear:DELETE FROM ks.cf WHERE (parti tionkey1, partitionkey2) IN (( 1, 2), (1, 3), (2, 3), (3, 4));If want to delete or select a bunch of records identified by their multi-partitionkey tuples.2017-02-09 10:18 GMT+01:00 Ben Slater <ben.slater@xxxxxxxxxxxxxxx>:Are you looking this to be equivalent to (PK1=1 AND PK2=2) or are you looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else?CheersBenOn Thu, 9 Feb 2017 at 20:09 Benjamin Roth <benjamin.roth@xxxxxxxxx> wrote:Hi Guys,CQL says this is not allowed:
DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));1. Is there a reason for it? There shouldn't be a performance penalty, it is a PK lookup, the same thing works with a single pk column2. Is there a known workaround for it?It would be much of a help to have it for daily business, IMHO it's a waste of resources to run multiple queries just to fetch a bunch of records by a PK.Thanks in advance for any reply------------