[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Optimizing queries for partition keys

Thanks (and apologies for the delayed response); that was the kind of
feedback we were looking for.

We backported the fix for CASSANDRA-10657 to 3.0.16, and it partially
addresses our problem in the sense that it does limit the data sent on
the wire.  The performance is still extremely poor, however, due to the
fact that Cassandra continues to read large volumes of data from disk.
(We've also confirmed this behavior in 3.11.2.)

With a bit more investigation, we now believe the problem (after
CASSNDRA-10657 is applied) is in RebufferingInputStream.skipBytes(),
which appears to read bytes in order to skip them.  The subclass used in
our case, RandomAccessReader, exposes a seek(), so we overrode
skipBytes() in it to make use of seek(), and that seems to resolve the

This change is intuitively much safer than the one we'd originally
identified, but we'd still like to confirm with you folks whether it's
likely safe and, if so whether it's also potentially worth contributing.


On 2018-03-22 18:16, Benjamin Lerer wrote:
You should check the 3.x release. CASSANDRA-10657 could have fixed your

On Thu, Mar 22, 2018 at 9:15 PM, Benjamin Lerer <benjamin.lerer@xxxxxxxxxxxx

Syvlain explained the problem in CASSANDRA-4536:
" Let me note that in CQL3 a row that have no live column don't exist, so
we can't really implement this with a range slice having an empty columns
list. Instead we should do a range slice with a full-row slice predicate
with a count of 1, to make sure we do have a live column before including
the partition key. "

By using ColumnFilter.selectionBuilder(); you do not select all the
columns. By consequence, some partitions might be returned while they
should not.

On Thu, Mar 22, 2018 at 6:24 PM, Sam Klock <sklock@xxxxxxxxxx> wrote:

Cassandra devs,

We use workflows in some of our clusters (running 3.0.15) that involve
"SELECT DISTINCT key FROM..."-style queries.  For some tables, we
observed extremely poor performance under light load (i.e., a small
number of rows per second and frequent timeouts), which we eventually
traced to replicas shipping entire rows (which in some cases could store
on the order of MBs of data) to service the query.  That surprised us
(partly because 2.1 doesn't seem to behave this way), so we did some
digging, and we eventually came up with a patch that modifies
SelectStatement.java in the following way: if the selection in the query
only includes the partition key, then when building a ColumnFilter for
the query, use:

     builder = ColumnFilter.selectionBuilder();

instead of:

     builder = ColumnFilter.allColumnsBuilder();

to initialize the ColumnFilter.Builder in gatherQueriedColumns().  That
seems to repair the performance regression, and it doesn't appear to
break any functionality (based on the unit tests and some smoke tests we
ran involving insertions and deletions).

We'd like to contribute this patch back to the project, but we're not
convinced that there aren't subtle correctness issues we're missing,
judging both from comments in the code and the existence of
CASSANDRA-5912, which suggests optimizing this kind of query is

So: does this change sound safe to make, or are there corner cases we
need to account for?  If there are corner cases, are there plausibly
ways of addressing them at the SelectStatement level, or will we need to
look deeper?


To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx

To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx