Hi, we built a simple system to migrate live cassandra data to other databases, mainly by using these queries:
1. SELECT DISTINCT TOKEN(partition_key) FROM table WHERE TOKEN(partition_key) > current_offset AND TOKEN(partition_key) <= upper_bound LIMIT token_fetch_size
2. Any cql query that retrieves all rows, given a set of tokens
And we observed that the "SELECT DISTINCT TOKEN" query takes way longer when the table is wide partitioned (about 200+ rows on average), look like the underlying operation is not linear.
Is it that the query would scan every rows of every partitions found until token_fetch_size is met? Or is it due to some low-level operations that are naturally more time consuming when dealing with wide partitioned data?
Any advice on this question or where to find the concerning code would be appreciated.