Re: Optimizing queries for partition keys
Your finding is interesting. Effectively, if the number of bytes to skip is
larger than the remaining bytes in the buffer + the buffer size it could be
faster to use seek.
Feel free to open a JIRA ticket and attach your patch. It will be great if
you could add to the ticket your table schema as well
as some information on your environment (e.g. disk type).
On Tue, Apr 17, 2018 at 8:53 PM, Sam Klock <sklock@xxxxxxxxxx> wrote:
> Thanks (and apologies for the delayed response); that was the kind of
> feedback we were looking for.
> We backported the fix for CASSANDRA-10657 to 3.0.16, and it partially
> addresses our problem in the sense that it does limit the data sent on
> the wire. The performance is still extremely poor, however, due to the
> fact that Cassandra continues to read large volumes of data from disk.
> (We've also confirmed this behavior in 3.11.2.)
> With a bit more investigation, we now believe the problem (after
> CASSNDRA-10657 is applied) is in RebufferingInputStream.skipBytes(),
> which appears to read bytes in order to skip them. The subclass used in
> our case, RandomAccessReader, exposes a seek(), so we overrode
> skipBytes() in it to make use of seek(), and that seems to resolve the
> This change is intuitively much safer than the one we'd originally
> identified, but we'd still like to confirm with you folks whether it's
> likely safe and, if so whether it's also potentially worth contributing.
> On 2018-03-22 18:16, Benjamin Lerer wrote:
>> You should check the 3.x release. CASSANDRA-10657 could have fixed your
>> On Thu, Mar 22, 2018 at 9:15 PM, Benjamin Lerer <
>> Syvlain explained the problem in CASSANDRA-4536:
>>> " Let me note that in CQL3 a row that have no live column don't exist, so
>>> we can't really implement this with a range slice having an empty columns
>>> list. Instead we should do a range slice with a full-row slice predicate
>>> with a count of 1, to make sure we do have a live column before including
>>> the partition key. "
>>> By using ColumnFilter.selectionBuilder(); you do not select all the
>>> columns. By consequence, some partitions might be returned while they
>>> should not.
>>> On Thu, Mar 22, 2018 at 6:24 PM, Sam Klock <sklock@xxxxxxxxxx> wrote:
>>> Cassandra devs,
>>>> We use workflows in some of our clusters (running 3.0.15) that involve
>>>> "SELECT DISTINCT key FROM..."-style queries. For some tables, we
>>>> observed extremely poor performance under light load (i.e., a small
>>>> number of rows per second and frequent timeouts), which we eventually
>>>> traced to replicas shipping entire rows (which in some cases could store
>>>> on the order of MBs of data) to service the query. That surprised us
>>>> (partly because 2.1 doesn't seem to behave this way), so we did some
>>>> digging, and we eventually came up with a patch that modifies
>>>> SelectStatement.java in the following way: if the selection in the query
>>>> only includes the partition key, then when building a ColumnFilter for
>>>> the query, use:
>>>> builder = ColumnFilter.selectionBuilder();
>>>> instead of:
>>>> builder = ColumnFilter.allColumnsBuilder();
>>>> to initialize the ColumnFilter.Builder in gatherQueriedColumns(). That
>>>> seems to repair the performance regression, and it doesn't appear to
>>>> break any functionality (based on the unit tests and some smoke tests we
>>>> ran involving insertions and deletions).
>>>> We'd like to contribute this patch back to the project, but we're not
>>>> convinced that there aren't subtle correctness issues we're missing,
>>>> judging both from comments in the code and the existence of
>>>> CASSANDRA-5912, which suggests optimizing this kind of query is
>>>> So: does this change sound safe to make, or are there corner cases we
>>>> need to account for? If there are corner cases, are there plausibly
>>>> ways of addressing them at the SelectStatement level, or will we need to
>>>> look deeper?
>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx