osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] changing default token behavior for 4.0


Is there a use case for random allocation? How does it help with testing? I
can’t see a reason to keep it around.

On Sat, Sep 22, 2018 at 3:06 AM kurt greaves <kurt@xxxxxxxxxxxxxxx> wrote:

> +1. I've been making a case for this for some time now, and was actually a
> focus of my talk last week. I'd be very happy to get this into 4.0.
>
> We've tested various num_tokens with the algorithm on various sized
> clusters and we've found that typically 16 works best. With lower numbers
> we found that balance is good initially but as a cluster gets larger you
> have some problems. E.g We saw that on a 60 node cluster with 8 tokens per
> node we were seeing a difference of 22% in token ownership, but on a <=12
> node cluster a difference of only 12%. 16 tokens on the other hand wasn't
> perfect but generally gave a better balance regardless of cluster size at
> least up to 100 nodes. TBH we should probably do some proper testing and
> record all the results for this before we pick a default (I'm happy to do
> this - think we can use the original testing script for this).
>
> But anyway, I'd say Jon is on the right track. Personally how I'd like to
> see it is that we:
>
>    1. Change allocate_tokens_for_keyspace to allocate_tokens_for_rf in the
>    same way that DSE does it. Allowing a user to specify a RF to allocate
>    from, and allowing multiple DC's.
>    2. Add a new boolean property random_token_allocation, defaults to
> false.
>    3. Make allocate_tokens_for_rf default to *unset**.
>    4. Make allocate_tokens_for_rf *required*** if num_tokens > 1 and
>    random_token_allocation != true.
>    5. Default num_tokens to 16 (or whatever we find appropriate)
>
> * I think setting a default is asking for trouble. When people are going to
> add new DC's/nodes we don't want to risk them adding a node with the wrong
> RF. I think it's safe to say that a user should have to think about this
> before they spin up their cluster.
> ** Following above, it should be required to be set so that we don't have
> people accidentally using random allocation. I think we should really be
> aiming to get rid of random allocation completely, but provide a new
> property to enable it for backwards compatibility (also for testing).
>
> It's worth noting that a smaller number of tokens *theoretically* decreases
> the time for replacement/rebuild, so if we're considering QUORUM
> availability with vnodes there's an argument against having a very low
> num_tokens. I think it's better to utilise NTS and racks to reduce the
> chance of a QUORUM outage over banking on having a lower number of tokens,
> as with just a low number of tokens unless you go all the way to 1 you are
> just relying on luck that 2 nodes don't overlap. Guess what I'm saying is
> that I think we should be choosing a num_tokens that gives the best
> distribution for most cluster sizes rather than choosing one that
> "decreases" the probability of an outage.
>
> Also I think we should continue using CASSANDRA-13701 to track this. TBH I
> think in general we should be a bit better at searching for and using
> existing tickets...
>
> On Sat, 22 Sep 2018 at 18:13, Stefan Podkowinski <spod@xxxxxxxxxx> wrote:
>
> > There already have been some discussions on this here:
> > https://issues.apache.org/jira/browse/CASSANDRA-13701
> >
> > The mentioned blocker there on the token allocation shouldn't exist
> > anymore. Although it would be good to get more feedback on it, in case
> > we want to enable it by default, along with new defaults for number of
> > tokens.
> >
> >
> > On 22.09.18 06:30, Dinesh Joshi wrote:
> > > Jon, thanks for starting this thread!
> > >
> > > I have created CASSANDRA-14784 to track this.
> > >
> > > Dinesh
> > >
> > >> On Sep 21, 2018, at 9:18 PM, Sankalp Kohli <kohlisankalp@xxxxxxxxx>
> > wrote:
> > >>
> > >> Putting it on JIRA is to make sure someone is assigned to it and it is
> > tracked. Changes should be discussed over ML like you are saying.
> > >>
> > >> On Sep 21, 2018, at 21:02, Jonathan Haddad <jon@xxxxxxxxxxxxx> wrote:
> > >>
> > >>>> We should create a JIRA to find what other defaults we need revisit.
> > >>> Changing a default is a pretty big deal, I think we should discuss
> any
> > >>> changes to defaults here on the ML before moving it into JIRA.  It's
> > nice
> > >>> to get a bit more discussion around the change than what happens in
> > JIRA.
> > >>>
> > >>> We (TLP) did some testing on 4 tokens and found it to work
> surprisingly
> > >>> well.   It wasn't particularly formal, but we verified the load stays
> > >>> pretty even with only 4 tokens as we added nodes to the cluster.
> > Higher
> > >>> token count hurts availability by increasing the number of nodes any
> > given
> > >>> node is a neighbor with, meaning any 2 nodes that fail have an
> > increased
> > >>> chance of downtime when using QUORUM.  In addition, with the recent
> > >>> streaming optimization it seems the token counts will give a greater
> > chance
> > >>> of a node streaming entire sstables (with LCS), meaning we'll do a
> > better
> > >>> job with node density out of the box.
> > >>>
> > >>> Next week I can try to put together something a little more
> convincing.
> > >>> Weekend time.
> > >>>
> > >>> Jon
> > >>>
> > >>>
> > >>> On Fri, Sep 21, 2018 at 8:45 PM sankalp kohli <
> kohlisankalp@xxxxxxxxx>
> > >>> wrote:
> > >>>
> > >>>> +1 to lowering it.
> > >>>> Thanks Jon for starting this.We should create a JIRA to find what
> > other
> > >>>> defaults we need revisit. (Please keep this discussion for "default
> > token"
> > >>>> only.  )
> > >>>>
> > >>>>> On Fri, Sep 21, 2018 at 8:26 PM Jeff Jirsa <jjirsa@xxxxxxxxx>
> wrote:
> > >>>>>
> > >>>>> Also agree it should be lowered, but definitely not to 1, and
> > probably
> > >>>>> something closer to 32 than 4.
> > >>>>>
> > >>>>> --
> > >>>>> Jeff Jirsa
> > >>>>>
> > >>>>>
> > >>>>>> On Sep 21, 2018, at 8:24 PM, Jeremy Hanna <
> > jeremy.hanna1234@xxxxxxxxx>
> > >>>>> wrote:
> > >>>>>> I agree that it should be lowered. What I’ve seen debated a bit in
> > the
> > >>>>> past is the number but I don’t think anyone thinks that it should
> > remain
> > >>>>> 256.
> > >>>>>>> On Sep 21, 2018, at 7:05 PM, Jonathan Haddad <jon@xxxxxxxxxxxxx>
> > >>>> wrote:
> > >>>>>>> One thing that's really, really bothered me for a while is how we
> > >>>>> default
> > >>>>>>> to 256 tokens still.  There's no experienced operator that leaves
> > it
> > >>>> as
> > >>>>> is
> > >>>>>>> at this point, meaning the only people using 256 are the poor
> folks
> > >>>> that
> > >>>>>>> just got started using C*.  I've worked with over a hundred
> > clusters
> > >>>> in
> > >>>>> the
> > >>>>>>> last couple years, and I think I only worked with one that had
> > lowered
> > >>>>> it
> > >>>>>>> to something else.
> > >>>>>>>
> > >>>>>>> I think it's time we changed the default to 4 (or 8, up for
> > debate).
> > >>>>>>>
> > >>>>>>> To improve the behavior, we need to change a couple other things.
> > The
> > >>>>>>> allocate_tokens_for_keyspace setting is... odd.  It requires you
> > have
> > >>>> a
> > >>>>>>> keyspace already created, which doesn't help on new clusters.
> What
> > >>>> I'd
> > >>>>>>> like to do is add a new setting, allocate_tokens_for_rf, and set
> > it to
> > >>>>> 3 by
> > >>>>>>> default.
> > >>>>>>>
> > >>>>>>> To handle clusters that are already using 256 tokens, we could
> > prevent
> > >>>>> the
> > >>>>>>> new node from joining unless a -D flag is set to explicitly allow
> > >>>>>>> imbalanced tokens.
> > >>>>>>>
> > >>>>>>> We've agreed to a trunk freeze, but I feel like this is important
> > >>>> enough
> > >>>>>>> (and pretty trivial) to do now.  I'd also personally characterize
> > this
> > >>>>> as a
> > >>>>>>> bug fix since 256 is horribly broken when the cluster gets to any
> > >>>>>>> reasonable size, but maybe I'm alone there.
> > >>>>>>>
> > >>>>>>> I honestly can't think of a use case where random tokens is a
> good
> > >>>>> choice
> > >>>>>>> anymore, so I'd be fine / ecstatic with removing it completely
> and
> > >>>>>>> requiring either allocate_tokens_for_keyspace (for existing
> > clusters)
> > >>>>>>> or allocate_tokens_for_rf
> > >>>>>>> to be set.
> > >>>>>>>
> > >>>>>>> Thoughts?  Objections?
> > >>>>>>> --
> > >>>>>>> Jon Haddad
> > >>>>>>> http://www.rustyrazorblade.com
> > >>>>>>> twitter: rustyrazorblade
> > >>>>>>
> > ---------------------------------------------------------------------
> > >>>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > >>>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > >>>>>>
> > >>>>>
> ---------------------------------------------------------------------
> > >>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > >>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > >>>>>
> > >>>>>
> > >>>
> > >>> --
> > >>> Jon Haddad
> > >>> http://www.rustyrazorblade.com
> > >>> twitter: rustyrazorblade
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > >> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > >>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > > For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> >
> >
>
-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade