osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Testing 4.0 upgrading


Hi Jason,

Thanks for responding.

I have created two jiras for these things: CASSANDRA-14841 and 
CASSANDRA-14842.

I understand the initial 4.0->3.x connection problem, since the incoming 
connection from the old node isn't accepted the 4.0 node will never know 
that it tries to connect to a old node and continuous to try to connect 
on the storage_port instead on the ssl_storage_port.

I will enable SSL debugging and activate wire tracing and see if I can 
find out more.

I can also add that I have upgraded all nodes in the cluster to 4.0 and 
once a node was upgraded to 4.0 it started talking to the other 4.0 node 
as expected. I had data in the cluster from before the upgrade and after 
doing "nodetool upgradesstables" there where no issues restarting the 
clients.

Now I will reinstall the old version and do the upgrade again and 
hopefully I can have more information soon.

/Tommy


On 2018-10-23 19:12, Jason Brown wrote:
> Hi Tommy,
>
> Thank you for taking the time to start kicking the tires on the upgrade to
> 4.0. It looks like you've found two bugs:
>
> 1) "Unknown column coordinator_port during deserialization" (reported on
> 3.x nodes)
>
> - looks like the 4.0 node isn't filtering out a column from a system table
> that 3.0 doesn't know about. Most likely due to CASSANDRA-7544. Can you
> open a JIRA for this, and tag @aweisberg?
>
> 2) SSL connection problems
>
> I unserstand the 4.0 -> 3.X connection problem, and documented it at [1] in
> MessagingService. TL;DR we don't know the version of a peer when restarting
> and need to wait for that peer to connect to the local node and pass it's
> correct messaging version (if the local node cannot interpret the handhsake
> from the peer). However, why for the inbound connection to the 4.0 node it
> is seeing SSLv2 is unclear. Can you open a separate JIRA, and we'll go from
> there? In the meantime, maybe enable the JDK's SSL debugging [2] on the 3.x
> node to see exactly what it is trying to do? Also, you can enable wire
> tracing on the 4.0 node by setting this value to true [3] and recompiling.
> We can followup further in the jira.
>
> Thanks!
>
> -Jason
>
> [1]
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/MessagingService.java#L1668
> [2]
> https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/ReadDebug.html
> [3]
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/async/NettyFactory.java#L84
>
> On Tue, Oct 23, 2018 at 2:44 AM Tommy Stendahl <tommy.stendahl@xxxxxxxxxxxx>
> wrote:
>
>> Hi,
>>
>> I have been testing upgrade to 4.0, I started out with a cluster with
>> 3.0.15 and server encryption enabled. Due to some issues in my
>> environment I did upgrade one of the nodes to 3.11.3, I think this
>> turned out to be a good thing since I could observer the behaviour of
>> upgrading from both 3.0.15 and 3.11.3 at the same time.
>>
>> At first I didn't have much success at all, it look like found multiple
>> issues mostly with server encryption so I decided to simplify thing and
>> disabled server encryption.
>>
>> So with server encryption disabled the upgrade was working ok, what I
>> did notice was exceptions in the 3.0.15 and 3.11.3 nodes once the first
>> 4.0 node started.
>>
>> 3.0.15 exception:
>> 2018-10-22T11:05:38.883+0200 ERROR
>> [MessagingService-Incoming-/10.216.193.244] CassandraDaemon.java:223
>> Exception in thread Thread[MessagingService-Incoming-/10.216.193.244
>> ,5,main]
>> java.lang.RuntimeException: Unknown column coordinator_port during
>> deserialization
>>           at
>> org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:433)
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>           at
>> org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:447)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>           at
>> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:647)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>           at
>> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:584)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>           at
>> org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>           at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98)
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>           at
>> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>           at
>> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>           at
>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>
>> 3.11.3 exception:
>> 2018-10-22T11:12:05.060+0200 ERROR
>> [MessagingService-Incoming-/10.216.193.244] CassandraDaemon.java:228
>> Exception in thread Thread[MessagingService-Incoming-/10.216.193.244
>> ,5,main]
>> java.lang.RuntimeException: Unknown column coordinator_port during
>> deserialization
>>           at
>> org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:452)
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>>           at
>> org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:482)
>>
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>>           at
>> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:760)
>>
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>>           at
>> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:697)
>>
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>>           at
>> org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50)
>>
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>>           at org.apache.cassandra.net.MessageIn.read(MessageIn.java:123)
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>>           at
>> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:192)
>>
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>>           at
>> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:180)
>>
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>>           at
>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:94)
>>
>> ~[apache-cassandra-3.11.3.jar:3.11.3]
>>
>> I got one or two of these exceptions every minute, I could not see that
>> this was causing any problems but I don't really like exceptions in the
>> log. Should we expect exceptions like this during an upgrade or is this
>> a problem we should look in to?
>>
>> So now I enabled server encryption again, with this configuration:
>> storage_port: 12700
>> ssl_storage_port: 12701
>> server_encryption_options:
>>       # set to true for allowing secure incoming connections
>>       enabled: true
>>       # If enabled and optional are both set to true, encrypted and
>> unencrypted connections are handled on the storage_port
>>       optional: false
>>       # if enabled, will open up an encrypted listening socket on
>> ssl_storage_port. Should be used
>>       # during upgrade to 4.0; otherwise, set to false.
>>       enable_legacy_ssl_storage_port: true
>>       # on outbound connections, determine which type of peers to
>> securely connect to. 'enabled' must be set to true.
>>       internode_encryption: all
>>       keystore: /usr/share/cassandra/.ssl/cil-intern/server/keystore.jks
>>       keystore_password: '*********'
>>       truststore: /usr/share/cassandra/.ssl/cil-intern/server/truststore.jks
>>       truststore_password: '**********'
>>       # More advanced defaults below:
>>       protocol: TLSv1.2
>>       # store_type: JKS
>>       cipher_suites:
>>
>> [TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA]
>>       # require_client_auth: false
>>       # require_endpoint_verification: false
>>
>> As far as I understand this configuration is correct but the 4.0 node
>> can't contact the old nodes. I have tried with the default protocol
>> setting also but it makes no difference. The behaviour is different in
>> 3.0.15 and 3.11.3, in 3.0.15 I get this exception:
>> 2018-10-22T11:57:44.366+0200 ERROR
>> [MessagingService-NettyInbound-Thread-3-8]
>> InboundHandshakeHandler.java:300 Failed to properly handshake with peer
>> /10.216.193.246:58296. Closing the channel.
>> io.netty.handler.codec.DecoderException:
>> javax.net.ssl.SSLHandshakeException: SSLv2Hello is disabled
>>           at
>>
>> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459)
>>           at
>>
>> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
>>           at
>>
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>>           at
>>
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>>           at
>>
>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
>>           at
>>
>> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
>>           at
>>
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>>           at
>>
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>>           at
>>
>> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
>>           at
>>
>> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:808)
>>           at
>> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:417)
>>           at
>> io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:317)
>>           at
>>
>> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
>>           at
>>
>> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>>           at java.lang.Thread.run(Thread.java:748)
>> Caused by: javax.net.ssl.SSLHandshakeException: SSLv2Hello is disabled
>>           at
>> sun.security.ssl.InputRecord.handleUnknownRecord(InputRecord.java:637)
>>           at sun.security.ssl.InputRecord.read(InputRecord.java:527)
>>           at
>> sun.security.ssl.EngineInputRecord.read(EngineInputRecord.java:382)
>>           at
>> sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:962)
>>           at
>> sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:907)
>>           at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781)
>>           at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
>>           at
>> io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:294)
>>           at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1275)
>>           at
>> io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1177)
>>           at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1221)
>>           at
>>
>> io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
>>           at
>>
>> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
>>           ... 14 common frames omitted
>>
>> and in the 4.0 debug log I see this related to the 3.0.15 node:
>> 2018-10-22T13:25:34.414+0200 DEBUG [ScheduledFastTasks:1]
>> OutboundMessagingConnection.java:259 connection attempt 788 to
>> 10.216.193.246:12700 (GOSSIP)
>> 2018-10-22T13:25:34.414+0200 DEBUG [ScheduledFastTasks:1]
>> NettyFactory.java:321 creating outbound bootstrap to peer
>> 10.216.193.246:12700, compression: true, encryption: enabled (jdk),
>> coalesce: DISABLED, protocolV
>> ersion: 12
>> 2018-10-22T13:25:34.526+0200 DEBUG [GossipTasks:1]
>> OutboundMessagingConnection.java:259 connection attempt 789 to
>> 10.216.193.246:12700 (GOSSIP)
>> 2018-10-22T13:25:34.527+0200 DEBUG [GossipTasks:1] NettyFactory.java:321
>> creating outbound bootstrap to peer 10.216.193.246:12700, compression:
>> true, encryption: enabled (jdk), coalesce: DISABLED, protocolVersion:
>>    12
>>
>> In the 3.11.3 node I don't see any errors, I do see this:
>> 2018-10-22T13:20:36.270+0200  INFO [HANDSHAKE-/10.216.193.244]
>> OutboundTcpConnection.java:561 Handshaking version with /10.216.193.244
>>
>> and in the 4.0 node I find this:
>> 2018-10-22T13:20:36.270+0200  INFO
>> [MessagingService-NettyInbound-Thread-3-1]
>> InboundHandshakeHandler.java:146 connection from peer
>> /10.216.193.243:54262, protocol = TLSv1.2, cipher suite =
>> TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
>> 2018-10-22T13:20:41.147+0200 DEBUG [main]
>> OutboundMessagingConnection.java:259 connection attempt 0 to
>> 10.216.193.243:12700 (GOSSIP)
>> 2018-10-22T13:20:41.147+0200 DEBUG [main] NettyFactory.java:321 creating
>> outbound bootstrap to peer 10.216.193.243:12700, compression: true,
>> encryption: enabled (jdk), coalesce: DISABLED, protocolVersion: 11
>> 2018-10-22T13:20:41.248+0200 DEBUG [ScheduledFastTasks:1]
>> OutboundMessagingConnection.java:259 connection attempt 1 to
>> 10.216.193.243:12700 (GOSSIP)
>> 2018-10-22T13:20:41.248+0200 DEBUG [ScheduledFastTasks:1]
>> NettyFactory.java:321 creating outbound bootstrap to peer
>> 10.216.193.243:12700, compression: true, encryption: enabled (jdk),
>> coalesce: DISABLED, protocolV
>> ersion: 11
>>
>> It looks like the 4.0 node doesn’t accept incoming messages from the old
>> nodes on the ssl_storage_port and continues to try to contact the old
>> nodes on the storage_port that are disabled in the old nodes. I don't
>> really know what to make of this, any ideas or suggestions?
>>
>> Regards,
>> Tommy
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
>>
>>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx