OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: cqlsh COPY ... TO ... doesn't work if one node down


Hi,

The error shows that, the cqlsh connection with down node is failed. 
So, you should debug why it happened. 

Although, you have mentioned other node in cqlsh command '10.0.0.154'
my guess is, the down node was present in connection pool, hence it was attempted for connection.

Ideally the availability of data should not be hampered due to unavailability of one replica out of 5.
Also the stack trace is about 'cqlsh' connection error.

I think once you get your connection sorted, the COPY should work as usual.

Regards,
Anup
     

On 30 June 2018 at 15:05, Dmitry Simonov <dimmoborgir@xxxxxxxxx> wrote:
Hello!

I have cassandra cluster with 5 nodes.
There is a (relatively small) keyspace X with RF5.
One node goes down.

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.0.0.82   253.64 MB  256          100.0%            839bef9d-79af-422c-a21f-33bdcf4493c1  rack1
UN  10.0.0.154  255.92 MB  256          100.0%            ce23f3a7-67d2-47c0-9ece-7a5dd67c4105  rack1
UN  10.0.0.76   461.26 MB  256          100.0%            c8e18603-0ede-43f0-b713-3ff47ad92323  rack1
UN  10.0.0.94   575.78 MB  256          100.0%            9a324dbc-5ae1-4788-80e4-d86dcaae5a4c  rack1
DN  10.0.0.47   ?          256          100.0%            7b628ca2-4e47-457a-ba42-5191f7e5374b  rack1

I try to export some data using COPY TO, but it fails after long retries.
Why does it fail?
How can I make a copy?
There must be 4 copies of each row on other (alive) replicas.

cqlsh 10.0.0.154 -e "COPY X.Y TO 'backup/X.Y' WITH NUMPROCESSES=1"

Using 1 child processes

Starting copy of X.Y with columns [key, column1, value].
2018-06-29 19:12:23,661 Failed to create connection pool for new host 10.0.0.47:
Traceback (most recent call last):
  File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/cluster.py", line 2476, in run_add_or_renew_pool
    new_pool = HostConnection(host, distance, self)
  File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/pool.py", line 332, in __init__
    self._connection = session.cluster.connection_factory(host.address)
  File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/cluster.py", line 1205, in connection_factory
    return self.connection_class.factory(address, self.connect_timeout, *args, **kwargs)
  File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/connection.py", line 332, in factory
    conn = cls(host, *args, **kwargs)
  File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/io/asyncorereactor.py", line 344, in __init__
    self._connect_socket()
  File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/connection.py", line 371, in _connect_socket
    raise socket.error(sockerr.errno, "Tried connecting to %s. Last error: %s" % ([a[4] for a in addresses], sockerr.strerror or sockerr))
OSError: [Errno None] Tried connecting to [('10.0.0.47', 9042)]. Last error: timed out
2018-06-29 19:12:23,665 Host 10.0.0.47 has been marked down
2018-06-29 19:12:29,674 Error attempting to reconnect to 10.0.0.47, scheduling retry in 2.0 seconds: [Errno None] Tried connecting to [('10.0.0.47', 9042)]. Last error: timed out
2018-06-29 19:12:36,684 Error attempting to reconnect to 10.0.0.47, scheduling retry in 4.0 seconds: [Errno None] Tried connecting to [('10.0.0.47', 9042)]. Last error: timed out
2018-06-29 19:12:45,696 Error attempting to reconnect to 10.0.0.47, scheduling retry in 8.0 seconds: [Errno None] Tried connecting to [('10.0.0.47', 9042)]. Last error: timed out
2018-06-29 19:12:58,716 Error attempting to reconnect to 10.0.0.47, scheduling retry in 16.0 seconds: [Errno None] Tried connecting to [('10.0.0.47', 9042)]. Last error: timed out
2018-06-29 19:13:19,756 Error attempting to reconnect to 10.0.0.47, scheduling retry in 32.0 seconds: [Errno None] Tried connecting to [('10.0.0.47', 9042)]. Last error: timed out
2018-06-29 19:13:56,834 Error attempting to reconnect to 10.0.0.47, scheduling retry in 64.0 seconds: [Errno None] Tried connecting to [('10.0.0.47', 9042)]. Last error: timed out
2018-06-29 19:15:05,887 Error attempting to reconnect to 10.0.0.47, scheduling retry in 128.0 seconds: [Errno None] Tried connecting to [('10.0.0.47', 9042)]. Last error: timed out
2018-06-29 19:17:18,982 Error attempting to reconnect to 10.0.0.47, scheduling retry in 256.0 seconds: [Errno None] Tried connecting to [('10.0.0.47', 9042)]. Last error: timed out
2018-06-29 19:21:40,064 Error attempting to reconnect to 10.0.0.47, scheduling retry in 512.0 seconds: [Errno None] Tried connecting to [('10.0.0.47', 9042)]. Last error: timed out
<stdin>:1:(4, 'Interrupted system call')
IOError:
IOError:
IOError:
IOError:
IOError:


--
Best Regards,
Dmitry Simonov



--

Anup Shirolkar

Consultant

+61 420 602 338


    

Read our latest technical blog posts here.