Re: Compaction process stuck

You probably have a very large partition in that file. Nodetool cfstats will show you the largest compacted partition now - I suspect it's much higher than before. 

On Thu, Jul 5, 2018 at 9:50 PM, atul atri <atulatri2004@xxxxxxxxx> wrote:
Hi Chris,

Compaction process finally finished. It took long time though.

Thank you very much for all your help.

Please let me know if you have any guidelines to make future compaction processes faster.

Thanks & Regards,
Atul Atri.

On 5 July 2018 at 22:05, atul atri <atulatri2004@xxxxxxxxx> wrote:
Hi Cris,

Thank you for reply.

I already have tried to run "nodetool stop compaction" and this does not help. I have restarted each node in cluster one by one and compaction starts again. It gets stuck on same table.

Following in 'nodetool compactionstats' output. It's stuck at 1336035468 for more than 35  hours at least.

pending tasks: 1
          compaction type        keyspace           table       completed           total      unit  progress
               Compactionnotification_system_v1user_notification      1336035468      1660997721     bytes    80.44%
Active compaction remaining time :   0h00m38s

Following is output for "nodetool cfstats".

Table: user_notification
        SSTable count: 18
        Space used (live), bytes: 17247516201
        Space used (total), bytes: 17316488652
        SSTable Compression Ratio: 0.41922805938461566
        Number of keys (estimate): 32556160
        Memtable cell count: 44717
        Memtable data size, bytes: 27705294
        Memtable switch count: 5
        Local read count: 0
        Local read latency: 0.000 ms
        Local write count: 236961
        Local write latency: 0.047 ms
        Pending tasks: 0
        Bloom filter false positives: 0
        Bloom filter false ratio: 0.00000
        Bloom filter space used, bytes: 72414688
        Compacted partition minimum bytes: 104
        Compacted partition maximum bytes: 4966933177
        Compacted partition mean bytes: 1183
        Average live cells per slice (last five minutes): 0.0
        Average tombstones per slice (last five minutes): 0.0

Please let me know if any more information. I am really thankful to you for spending time on this investigation.

Thanks & Regards,
Atul Atri.

On 5 July 2018 at 20:54, Chris Lohfink <clohfink@xxxxxxxxx> wrote:
That looks a bit to me like it isnt stuck but just a long running compaction. Can you include the output of `nodetool compactionstats` and the `nodetool cfstats` with schema for the table thats being compacted (redacted names if necessary).

Can stop compaction with `nodetool stop COMPACTION` or restarting the node.


On Jul 5, 2018, at 12:08 AM, atul atri <atulatri2004@xxxxxxxxx> wrote:


We noticed that compaction process is also hanging on a node in backup ring. Please find attached thread dump for both servers. Recently, we have made few changes in cluster topology.

a. Added new server in backup data-center and decommissioned old server. Backup ring only has 2 server.
b. Added new node in primary data-center. Now it has 4 nods.

Is there way we can stop this compaction? As we have added a new node in this cluster and we are waiting to run cleanup on this node on which compaction is hanging. I am afraid that cleanup will not start until compaction job finishes.

1. cass-logg02.prod2.thread_dump.out: Thread dump from old node in primary datacenter
2. cass-logg03.prod1.thread_dump.out: Thread dump from new node in backup datacenter. This node is added recently.

Your help is much appreciated.

Thanks & Regards,
Atul Atri.

On 4 July 2018 at 21:15, atul atri <atulatri2004@xxxxxxxxx> wrote:
Hi Chris,
Thanks for reply.

Unfortunately, our servers do not have jstack installed.
I tried "kill -3 <PID>" option but that is also not generating thread dump.

Is there any other way I can generate thread dump?

Thanks & Regards,
Atul Atri.

On 4 July 2018 at 20:32, Chris Lohfink <clohfink@xxxxxxxxx> wrote:
Can you take a thread dump (jstack) and share the state of the compaction threads? Also check for “Exception” in logs


Sent from my iPhone

On Jul 4, 2018, at 8:37 AM, atul atri <atulatri2004@xxxxxxxxx> wrote:


On one of our server, compaction process is hanging. It's stuck at 80%. It was stuck for last 3 days. And today we did a cluster restart (one host at time). And again it is stuck at same 80%. CPU usages are 100% and there seems no IO issue. We are seeing following kinds of WARNING in system.log

BatchStatement.java (line 226) Batch of prepared statements for [****, *****] is of size 7557, exceeding specified threshold of 5120 by 2437.

Other than this there seems no error.  I have tried to stop compaction process, but it does not stop. Cassandra version is 2.1.

 Can someone please guide us in solving this issue?

Thanks & Regards,
Atul Atri.

