[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: secondary index table - tombstones surviving compactions

The answer to Question 3 is "yes."  One of the more subtle points about
tombstones is that Cassandra won't remove them during compaction if there
is a bloom filter on any SSTable on that replica indicating that it
contains the same partition (not primary) key.  Even if it is older than
gc_grace, and would otherwise be a candidate for cleanup.

If you're recycling partition keys, your tombstones may never be able to be
cleaned up, because in this scenario there is a high probability that an
SSTable not involved in that compaction also contains the same partition
key, and so compaction cannot have confidence that it's safe to remove the
tombstone (it would have to fully materialize every record in the
compaction, which is too expensive).

In general it is an antipattern in Cassandra to write to a given partition
indefinitely for this and other reasons.

On Fri, May 18, 2018 at 2:37 AM Roman Bielik <
roman.bielik@xxxxxxxxxxxxxxxxxxxx> wrote:

> Hi,
> I have a Cassandra 3.11 table (with compact storage) and using secondary
> indices with rather unique data stored in the indexed columns. There are
> many inserts and deletes, so in order to avoid tombstones piling up I'm
> re-using primary keys from a pool (which works fine).
> I'm aware that this design pattern is not ideal, but for now I can not
> change it easily.
> The problem is, the size of 2nd index tables keeps growing (filled with
> tombstones) no matter what.
> I tried some aggressive configuration (just for testing) in order to
> expedite the tombstone removal but with little-to-zero effect:
> COMPACTION = { 'class':
> 'LeveledCompactionStrategy', 'unchecked_tombstone_compaction': 'true',
> 'tombstone_compaction_interval': 600 }
> gc_grace_seconds = 600
> I'm aware that perhaps Materialized views could provide a solution to this,
> but I'm bind to the Thrift interface, so can not use them.
> Questions:
> 1. Is there something I'm missing? How come compaction does not remove the
> obsolete indices/tombstones from 2nd index tables? Can I trigger the
> cleanup manually somehow?
> I have tried nodetool flush, compact, rebuild_index on both data table and
> internal Index table, but with no result.
> 2. When deleting a record I'm deleting the whole row at once - which would
> create one tombstone for the whole record if I'm correct. Would it help to
> delete the indexed columns separately creating extra tombstone for each
> cell?
> As I understand the underlying mechanism, the indexed column value must be
> read in order a proper tombstone for the index is created for it.
> 3. Could the fact that I'm reusing the primary key of a deleted record
> shortly for a new insert interact with the secondary index tombstone
> removal?
> Will be grateful for any advice.
> Regards,
> Roman
> --
>  <http://www.openmindnetworks.com>
>  <http://www.openmindnetworks.com/>
> <https://www.linkedin.com/company/openmind-networks>
> <https://twitter.com/Openmind_Ntwks>  <http://www.openmindnetworks.com/>