the safest way is indeed (as suggested by Jon) to upgrade the whole cluster as quick as possible, and stop all operations that could generate streaming until all nodes are using the target version.
That includes repair, topology changes (bootstraps, decommissions) and rebuilds.
You should also avoid all schema changes as they are most probably going to partially fail in mixed versions clusters.
Run a rolling upgradesstables once the whole cluster is upgraded. You can (should?) use cstar for that operation as it'll be able to run upgradesstables with topology awareness, leaving a quorum of replicas free of the operation at all time.
As upgradesstables will use compaction slots, you could raise your number of compactors to 4 at least and use "-j 2" to have two slots used by the upgradesstables. This will leave 2 compactors available for standard compactions.
Alex, another TLP guy ;)
On Tue, Oct 30, 2018 at 4:21 PM Carl Mueller <firstname.lastname@example.org> wrote:
We are about to finally embark on some version upgrades for lots of clusters, 2.1.x and 2.2.x targetting eventually 3.11.x
I have seen recipes that do the full binary upgrade + upgrade sstables for 1 node before moving forward, while I've seen a 2016 vote by Jon Haddad (a TLP guy) that backs doing the binary version upgrades through the cluster on a rolling basis, then doing the upgradesstables on a rolling basis.
Under what cluster conditions are streaming/node replacement precluded, that is we are vulnerable to a cloud provided dumping one of our nodes under us or hardware failure? We ain't apple, but we do have 30+ node datacenters and 80-100 node clusters.
Is the node replacement and streaming only disabled while there are heterogenous cassandra versions, or until all the sstables have been upgraded in the cluster?
My instincts tell me the best thing to do is to get all the cassandra nodes to the same version without the upgradesstables step through the cluster, and then roll through the upgradesstables as needed, and that upgradesstables is a node-local concern that doesn't impact streaming or node replacement or other situations since cassandra can read old version sstables and new sstables would simply be the new format.