osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] changing default token behavior for 4.0


Only that it makes it easier to spin up a cluster.

I'm for removing it entirely as well, however I think we should keep it
around at least until the next major just as a safety precaution until the
algorithm is properly battle tested.

This is not a strongly held opinion though, I'm just foreseeing the "new
defaults don't work for my edge case" problem.

On Sun., 23 Sep. 2018, 04:12 Jonathan Haddad, <jon@xxxxxxxxxxxxx> wrote:

> Is there a use case for random allocation? How does it help with testing? I
> can’t see a reason to keep it around.
>
> On Sat, Sep 22, 2018 at 3:06 AM kurt greaves <kurt@xxxxxxxxxxxxxxx> wrote:
>
> > +1. I've been making a case for this for some time now, and was actually
> a
> > focus of my talk last week. I'd be very happy to get this into 4.0.
> >
> > We've tested various num_tokens with the algorithm on various sized
> > clusters and we've found that typically 16 works best. With lower numbers
> > we found that balance is good initially but as a cluster gets larger you
> > have some problems. E.g We saw that on a 60 node cluster with 8 tokens
> per
> > node we were seeing a difference of 22% in token ownership, but on a <=12
> > node cluster a difference of only 12%. 16 tokens on the other hand wasn't
> > perfect but generally gave a better balance regardless of cluster size at
> > least up to 100 nodes. TBH we should probably do some proper testing and
> > record all the results for this before we pick a default (I'm happy to do
> > this - think we can use the original testing script for this).
> >
> > But anyway, I'd say Jon is on the right track. Personally how I'd like to
> > see it is that we:
> >
> >    1. Change allocate_tokens_for_keyspace to allocate_tokens_for_rf in
> the
> >    same way that DSE does it. Allowing a user to specify a RF to allocate
> >    from, and allowing multiple DC's.
> >    2. Add a new boolean property random_token_allocation, defaults to
> > false.
> >    3. Make allocate_tokens_for_rf default to *unset**.
> >    4. Make allocate_tokens_for_rf *required*** if num_tokens > 1 and
> >    random_token_allocation != true.
> >    5. Default num_tokens to 16 (or whatever we find appropriate)
> >
> > * I think setting a default is asking for trouble. When people are going
> to
> > add new DC's/nodes we don't want to risk them adding a node with the
> wrong
> > RF. I think it's safe to say that a user should have to think about this
> > before they spin up their cluster.
> > ** Following above, it should be required to be set so that we don't have
> > people accidentally using random allocation. I think we should really be
> > aiming to get rid of random allocation completely, but provide a new
> > property to enable it for backwards compatibility (also for testing).
> >
> > It's worth noting that a smaller number of tokens *theoretically*
> decreases
> > the time for replacement/rebuild, so if we're considering QUORUM
> > availability with vnodes there's an argument against having a very low
> > num_tokens. I think it's better to utilise NTS and racks to reduce the
> > chance of a QUORUM outage over banking on having a lower number of
> tokens,
> > as with just a low number of tokens unless you go all the way to 1 you
> are
> > just relying on luck that 2 nodes don't overlap. Guess what I'm saying is
> > that I think we should be choosing a num_tokens that gives the best
> > distribution for most cluster sizes rather than choosing one that
> > "decreases" the probability of an outage.
> >
> > Also I think we should continue using CASSANDRA-13701 to track this. TBH
> I
> > think in general we should be a bit better at searching for and using
> > existing tickets...
> >
> > On Sat, 22 Sep 2018 at 18:13, Stefan Podkowinski <spod@xxxxxxxxxx>
> wrote:
> >
> > > There already have been some discussions on this here:
> > > https://issues.apache.org/jira/browse/CASSANDRA-13701
> > >
> > > The mentioned blocker there on the token allocation shouldn't exist
> > > anymore. Although it would be good to get more feedback on it, in case
> > > we want to enable it by default, along with new defaults for number of
> > > tokens.
> > >
> > >
> > > On 22.09.18 06:30, Dinesh Joshi wrote:
> > > > Jon, thanks for starting this thread!
> > > >
> > > > I have created CASSANDRA-14784 to track this.
> > > >
> > > > Dinesh
> > > >
> > > >> On Sep 21, 2018, at 9:18 PM, Sankalp Kohli <kohlisankalp@xxxxxxxxx>
> > > wrote:
> > > >>
> > > >> Putting it on JIRA is to make sure someone is assigned to it and it
> is
> > > tracked. Changes should be discussed over ML like you are saying.
> > > >>
> > > >> On Sep 21, 2018, at 21:02, Jonathan Haddad <jon@xxxxxxxxxxxxx>
> wrote:
> > > >>
> > > >>>> We should create a JIRA to find what other defaults we need
> revisit.
> > > >>> Changing a default is a pretty big deal, I think we should discuss
> > any
> > > >>> changes to defaults here on the ML before moving it into JIRA.
> It's
> > > nice
> > > >>> to get a bit more discussion around the change than what happens in
> > > JIRA.
> > > >>>
> > > >>> We (TLP) did some testing on 4 tokens and found it to work
> > surprisingly
> > > >>> well.   It wasn't particularly formal, but we verified the load
> stays
> > > >>> pretty even with only 4 tokens as we added nodes to the cluster.
> > > Higher
> > > >>> token count hurts availability by increasing the number of nodes
> any
> > > given
> > > >>> node is a neighbor with, meaning any 2 nodes that fail have an
> > > increased
> > > >>> chance of downtime when using QUORUM.  In addition, with the recent
> > > >>> streaming optimization it seems the token counts will give a
> greater
> > > chance
> > > >>> of a node streaming entire sstables (with LCS), meaning we'll do a
> > > better
> > > >>> job with node density out of the box.
> > > >>>
> > > >>> Next week I can try to put together something a little more
> > convincing.
> > > >>> Weekend time.
> > > >>>
> > > >>> Jon
> > > >>>
> > > >>>
> > > >>> On Fri, Sep 21, 2018 at 8:45 PM sankalp kohli <
> > kohlisankalp@xxxxxxxxx>
> > > >>> wrote:
> > > >>>
> > > >>>> +1 to lowering it.
> > > >>>> Thanks Jon for starting this.We should create a JIRA to find what
> > > other
> > > >>>> defaults we need revisit. (Please keep this discussion for
> "default
> > > token"
> > > >>>> only.  )
> > > >>>>
> > > >>>>> On Fri, Sep 21, 2018 at 8:26 PM Jeff Jirsa <jjirsa@xxxxxxxxx>
> > wrote:
> > > >>>>>
> > > >>>>> Also agree it should be lowered, but definitely not to 1, and
> > > probably
> > > >>>>> something closer to 32 than 4.
> > > >>>>>
> > > >>>>> --
> > > >>>>> Jeff Jirsa
> > > >>>>>
> > > >>>>>
> > > >>>>>> On Sep 21, 2018, at 8:24 PM, Jeremy Hanna <
> > > jeremy.hanna1234@xxxxxxxxx>
> > > >>>>> wrote:
> > > >>>>>> I agree that it should be lowered. What I’ve seen debated a bit
> in
> > > the
> > > >>>>> past is the number but I don’t think anyone thinks that it should
> > > remain
> > > >>>>> 256.
> > > >>>>>>> On Sep 21, 2018, at 7:05 PM, Jonathan Haddad <
> jon@xxxxxxxxxxxxx>
> > > >>>> wrote:
> > > >>>>>>> One thing that's really, really bothered me for a while is how
> we
> > > >>>>> default
> > > >>>>>>> to 256 tokens still.  There's no experienced operator that
> leaves
> > > it
> > > >>>> as
> > > >>>>> is
> > > >>>>>>> at this point, meaning the only people using 256 are the poor
> > folks
> > > >>>> that
> > > >>>>>>> just got started using C*.  I've worked with over a hundred
> > > clusters
> > > >>>> in
> > > >>>>> the
> > > >>>>>>> last couple years, and I think I only worked with one that had
> > > lowered
> > > >>>>> it
> > > >>>>>>> to something else.
> > > >>>>>>>
> > > >>>>>>> I think it's time we changed the default to 4 (or 8, up for
> > > debate).
> > > >>>>>>>
> > > >>>>>>> To improve the behavior, we need to change a couple other
> things.
> > > The
> > > >>>>>>> allocate_tokens_for_keyspace setting is... odd.  It requires
> you
> > > have
> > > >>>> a
> > > >>>>>>> keyspace already created, which doesn't help on new clusters.
> > What
> > > >>>> I'd
> > > >>>>>>> like to do is add a new setting, allocate_tokens_for_rf, and
> set
> > > it to
> > > >>>>> 3 by
> > > >>>>>>> default.
> > > >>>>>>>
> > > >>>>>>> To handle clusters that are already using 256 tokens, we could
> > > prevent
> > > >>>>> the
> > > >>>>>>> new node from joining unless a -D flag is set to explicitly
> allow
> > > >>>>>>> imbalanced tokens.
> > > >>>>>>>
> > > >>>>>>> We've agreed to a trunk freeze, but I feel like this is
> important
> > > >>>> enough
> > > >>>>>>> (and pretty trivial) to do now.  I'd also personally
> characterize
> > > this
> > > >>>>> as a
> > > >>>>>>> bug fix since 256 is horribly broken when the cluster gets to
> any
> > > >>>>>>> reasonable size, but maybe I'm alone there.
> > > >>>>>>>
> > > >>>>>>> I honestly can't think of a use case where random tokens is a
> > good
> > > >>>>> choice
> > > >>>>>>> anymore, so I'd be fine / ecstatic with removing it completely
> > and
> > > >>>>>>> requiring either allocate_tokens_for_keyspace (for existing
> > > clusters)
> > > >>>>>>> or allocate_tokens_for_rf
> > > >>>>>>> to be set.
> > > >>>>>>>
> > > >>>>>>> Thoughts?  Objections?
> > > >>>>>>> --
> > > >>>>>>> Jon Haddad
> > > >>>>>>> http://www.rustyrazorblade.com
> > > >>>>>>> twitter: rustyrazorblade
> > > >>>>>>
> > > ---------------------------------------------------------------------
> > > >>>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > > >>>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > > >>>>>>
> > > >>>>>
> > ---------------------------------------------------------------------
> > > >>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > > >>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > > >>>>>
> > > >>>>>
> > > >>>
> > > >>> --
> > > >>> Jon Haddad
> > > >>> http://www.rustyrazorblade.com
> > > >>> twitter: rustyrazorblade
> > > >>
> ---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > > >> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > > >>
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > > > For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > > For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > >
> > >
> >
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>