[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

saving distinct data in cassandra result in many tombstones


I needed to save a distinct value for a key in each hour, the problem with saving everything and computing distincts in memory is that there
are too many repeated data.
Table schema:
Table distinct(
hourNumber int,
key text,
distinctValue long
primary key (hourNumber)

I want to retrieve distinct count of all keys in a specific hour and using this data model it would be achieved by reading a single partition.
The problem : i can't read from this table, system.log indicates that more than 100K tombstones read and no live data in it. The gc_grace time is
the default (10 days), so i thought decreasing it to 1 hour and run compaction, but is this a right approach at all? i mean the whole idea of replacing
some millions of rows. each  10 times in a partition again and again that creates alot of tombstones just to achieve distinct behavior?

Thanks in advance

Sent using Zoho Mail