osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Disk full during new node bootstrap


Hi,

Well, I guess knowing the disk behaviour would be useful to understand if it is really filling up and why.

  • What is the disk capacity?
  • Does it actually fill up?

- If it is filling up, it might mean that all your nodes are not running with enough available space and that any node can go down due to an heavy compaction at any time.

- It might also come from an imbalance on the ownership (use 'nodetool status <ks>'  to know that for sure) or in the load ('df -h' or 'du -sh /path_to_cassandra/' on all nodes and compare the disk in use.

- Last idea that came to mind is that the streaming throughput might be too high or  the compaction throughput too low. So in this last case, if your data is growing way bigger than the dataset original size, it is probably because compaction can't keep up with what is being streamed. As I believe that a joining node has an unthrottled compaction: (https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L149-L150), it looks like a good idea to restrict the streaming throughput.

C*heers,
-----------------------
Alain Rodriguez - @arodream - alain@xxxxxxxxxxxxxxxxx
France

The Last Pickle - Apache Cassandra Consulting



2017-02-04 11:11 GMT+01:00 techpyaasa . <techpyaasa@xxxxxxxxx>:
Cluster Information:
Name: xxxx Cluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
397560b8-7245-3903-8828-60a97e5be4aa: [xxx.xxx.xxx.75, xxx.xxx.xxx.134, xxx.xxx.xxx.192, xxx.xxx.xxx.132, xxx.xxx.xxx.133, xxx.xxx.xxx.115, xxx.xxx.xxx.78, xxx.xxx.xxx.123, xxx.xxx.xxx.70, xxx.xxx.xxx.167, xxx.xxx.xxx.168, xxx.xxx.xxx.169, xxx.xxx.xxx.146, xxx.xxx.xxx.145, xxx.xxx.xxx.144, xxx.xxx.xxx.143, xxx.xxx.xxx.140, xxx.xxx.xxx.139, xxx.xxx.xxx.126, xxx.xxx.xxx.136, xxx.xxx.xxx.135, xxx.xxx.xxx.191, xxx.xxx.xxx.133, xxx.xxx.xxx.79, xxx.xxx.xxx.131, xxx.xxx.xxx.77]

ReleaseVersion: 2.0.17
---------------------------------------------------------------
Note: Ownership information does not include topology; for complete information, specify a keyspace
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address          Load       Tokens  Owns   Host ID                               Rack
UN  xxx.xxx.xxx.168  847.62 GB  256     4.1%   2302491a-a8b5-4aa6-bda7-f1544064c4e3  GRP3
UN  xxx.xxx.xxx.169  819.64 GB  256     4.2%   d5e5bc3d-38de-4043-abca-08ac09f29a46  GRP1
UN  xxx.xxx.xxx.75   874.69 GB  256     4.1%   fdd32c67-3cea-4174-b59b-c1ea14e1a334  GRP1
UN  xxx.xxx.xxx.78   850.07 GB  256     4.0%   a8332f22-a75f-4d7c-8b71-7284f6fe208f  GRP3
UN  xxx.xxx.xxx.126  836.88 GB  256     4.0%   71be90d8-97db-4155-b4fc-da59d78331ef  GRP1
UN  xxx.xxx.xxx.191  751.08 GB  256     4.1%   a9023df8-a8b3-484b-a03d-0fdea35007bd  GRP3
UN  xxx.xxx.xxx.192  888.03 GB  256     3.8%   f4ad42d5-cee0-4d3e-a4f1-7cdeb5d7390a  GRP2
UN  xxx.xxx.xxx.132  688.86 GB  256     3.8%   6a465101-29e7-4792-8269-851200a70023  GRP2
UN  xxx.xxx.xxx.133  855.66 GB  256     4.0%   751ce15a-10f1-44cf-9357-04da7e21b511  GRP2
UN  xxx.xxx.xxx.134  869.32 GB  256     3.7%   bdd166fd-95a7-4119-bbae-f05fe26ddb01  GRP3
UN  xxx.xxx.xxx.70   792.15 GB  256     4.2%   2b6b642d-6842-47d4-bdc1-95226fd2b85d  GRP1
UN  xxx.xxx.xxx.167  732.82 GB  256     4.0%   45f6684f-d6a0-4cba-875c-9db459646545  GRP2
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address          Load       Tokens  Owns   Host ID                               Rack
UN  xxx.xxx.xxx.136   762.85 GB  256     3.9%   ebc67006-80e6-40de-95a9-79b90b254750  GRP3
UN  xxx.xxx.xxx.139   807.68 GB  256     4.5%   e27ca655-3186-417f-a927-cab63dc34248  GRP3
UN  xxx.xxx.xxx.140   771.94 GB  256     4.1%   26531fe3-7f2c-4ce5-a41b-5c79c5976141  GRP1
UN  xxx.xxx.xxx.77    505.54 GB  256     3.5%   d1ad7194-d7fb-47cf-92d1-4206ff12f8aa  GRP1
UN  xxx.xxx.xxx.143   900.14 GB  256     4.1%   74e1009c-0506-4d7a-b517-d37182385a21  GRP2
UJ  xxx.xxx.xxx.79    636.08 GB  256     ?      91b64758-67c2-48e7-86eb-43f7509c2287  GRP3
UN  xxx.xxx.xxx.131   788 GB     256     4.0%   5b27a680-d7c0-4ead-85cc-c295b83eda5b  GRP2
UN  xxx.xxx.xxx.133   898.27 GB  256     3.8%   5b24f211-678e-4614-bd59-8ea13aa2397c  GRP1
UN  xxx.xxx.xxx.135   868.14 GB  256     4.1%   8c2b5d1c-e43e-41f4-b21e-58cb525cfbab  GRP2
UN  xxx.xxx.xxx.123   848.86 GB  256     4.0%   87cfff8f-1cfc-44c5-b608-5b40d6894182  GRP3
UN  xxx.xxx.xxx.144   830.99 GB  256     3.6%   31b8cf4b-dd08-4ee6-8c25-90ad6dadbdc4  GRP3
UN  xxx.xxx.xxx.145   832.22 GB  256     4.3%   dd8c97df-7ec9-436b-8b29-4c25a8a89184  GRP1
UN  xxx.xxx.xxx.146   830.02 GB  256     4.2%   88b52574-8569-4d58-ba43-8fe1c742eea4  GRP2
UN  xxx.xxx.xxx.115   878.5 GB   256     3.9%   20817b9e-b761-437e-aa7b-49e90483c69f  GRP1




Total keyspaces 5  with 'class': 'NetworkTopologyStrategy',   and replication with 'DC2': '3',   'DC1': '3'

On Sat, Feb 4, 2017 at 3:22 PM, Alexander Dejanovski <alex@xxxxxxxxxxxxxxxxx> wrote:
Hi,

could you share with us the following informations ? 

- "nodetool status" output
- Keyspace definitions (we need to check the replication strategy you're using on all keyspaces)
- Specifics about what you're calling "groups" in a DC. Are these racks ?

Thanks

On Sat, Feb 4, 2017 at 10:41 AM laxmikanth sadula <laxmikanth524@xxxxxxxxx> wrote:
Yes .. same number of tokens...
256

On Sat, Feb 4, 2017 at 11:56 AM, Jonathan Haddad <jon@xxxxxxxxxxxxx> wrote:
Are you using the same number of tokens on the new node as the old ones?

On Fri, Feb 3, 2017 at 8:31 PM techpyaasa . <techpyaasa@xxxxxxxxx> wrote:
Hi,

We are using c* 2.0.17 , 2 DCs , RF=3.

When I try to add new node to one group in a DC , I got disk full. Can someone please tell what is the best way to resolve this?

Run compaction for nodes in that group(to which I'm going to add new node, as data streams to new nodes from nodes of group to which it is added)

OR

Boootstrap/add  2(multiple nodes) at a time?


Please suggest better way to fix this.

Thanks in advance

Techpyaasa




--
Regards,
Laxmikanth
99621 38051

--
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting