OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] changing default token behavior for 4.0


Late comment, but I'll write it anyway.

The main advantage of random allocation over the new allocation strategy is
that it seems to be significantly better when dealing with node *removals*,
when the order of removal is not the inverse of the order of addition. This
can lead to severely unbalanced clusters when the new strategy is enabled.

I tend to go with the random allocation for this reason: you can freely
add/remove nodes when needed, and the data distribution will remain "good
enough". It's only when the data density becomes high enough that the new
token allocation strategy really matters, imho.

Hope that helps!

Tom van der Woerdt
Site Reliability Engineer

Booking.com B.V.
Vijzelstraat 66-80 Amsterdam 1017HL Netherlands
[image: Booking.com] <https://www.booking.com/>
The world's #1 accommodation site
43 languages, 198+ offices worldwide, 120,000+ global destinations,
1,550,000+ room nights booked every day
No booking fees, best price always guaranteed
Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)


On Sat, Sep 22, 2018 at 8:12 PM Jonathan Haddad <jon@xxxxxxxxxxxxx> wrote:

> Is there a use case for random allocation? How does it help with testing? I
> can’t see a reason to keep it around.
>
> On Sat, Sep 22, 2018 at 3:06 AM kurt greaves <kurt@xxxxxxxxxxxxxxx> wrote:
>
> > +1. I've been making a case for this for some time now, and was actually
> a
> > focus of my talk last week. I'd be very happy to get this into 4.0.
> >
> > We've tested various num_tokens with the algorithm on various sized
> > clusters and we've found that typically 16 works best. With lower numbers
> > we found that balance is good initially but as a cluster gets larger you
> > have some problems. E.g We saw that on a 60 node cluster with 8 tokens
> per
> > node we were seeing a difference of 22% in token ownership, but on a <=12
> > node cluster a difference of only 12%. 16 tokens on the other hand wasn't
> > perfect but generally gave a better balance regardless of cluster size at
> > least up to 100 nodes. TBH we should probably do some proper testing and
> > record all the results for this before we pick a default (I'm happy to do
> > this - think we can use the original testing script for this).
> >
> > But anyway, I'd say Jon is on the right track. Personally how I'd like to
> > see it is that we:
> >
> >    1. Change allocate_tokens_for_keyspace to allocate_tokens_for_rf in
> the
> >    same way that DSE does it. Allowing a user to specify a RF to allocate
> >    from, and allowing multiple DC's.
> >    2. Add a new boolean property random_token_allocation, defaults to
> > false.
> >    3. Make allocate_tokens_for_rf default to *unset**.
> >    4. Make allocate_tokens_for_rf *required*** if num_tokens > 1 and
> >    random_token_allocation != true.
> >    5. Default num_tokens to 16 (or whatever we find appropriate)
> >
> > * I think setting a default is asking for trouble. When people are going
> to
> > add new DC's/nodes we don't want to risk them adding a node with the
> wrong
> > RF. I think it's safe to say that a user should have to think about this
> > before they spin up their cluster.
> > ** Following above, it should be required to be set so that we don't have
> > people accidentally using random allocation. I think we should really be
> > aiming to get rid of random allocation completely, but provide a new
> > property to enable it for backwards compatibility (also for testing).
> >
> > It's worth noting that a smaller number of tokens *theoretically*
> decreases
> > the time for replacement/rebuild, so if we're considering QUORUM
> > availability with vnodes there's an argument against having a very low
> > num_tokens. I think it's better to utilise NTS and racks to reduce the
> > chance of a QUORUM outage over banking on having a lower number of
> tokens,
> > as with just a low number of tokens unless you go all the way to 1 you
> are
> > just relying on luck that 2 nodes don't overlap. Guess what I'm saying is
> > that I think we should be choosing a num_tokens that gives the best
> > distribution for most cluster sizes rather than choosing one that
> > "decreases" the probability of an outage.
> >
> > Also I think we should continue using CASSANDRA-13701 to track this. TBH
> I
> > think in general we should be a bit better at searching for and using
> > existing tickets...
> >
> > On Sat, 22 Sep 2018 at 18:13, Stefan Podkowinski <spod@xxxxxxxxxx>
> wrote:
> >
> > > There already have been some discussions on this here:
> > > https://issues.apache.org/jira/browse/CASSANDRA-13701
> > >
> > > The mentioned blocker there on the token allocation shouldn't exist
> > > anymore. Although it would be good to get more feedback on it, in case
> > > we want to enable it by default, along with new defaults for number of
> > > tokens.
> > >
> > >
> > > On 22.09.18 06:30, Dinesh Joshi wrote:
> > > > Jon, thanks for starting this thread!
> > > >
> > > > I have created CASSANDRA-14784 to track this.
> > > >
> > > > Dinesh
> > > >
> > > >> On Sep 21, 2018, at 9:18 PM, Sankalp Kohli <kohlisankalp@xxxxxxxxx>
> > > wrote:
> > > >>
> > > >> Putting it on JIRA is to make sure someone is assigned to it and it
> is
> > > tracked. Changes should be discussed over ML like you are saying.
> > > >>
> > > >> On Sep 21, 2018, at 21:02, Jonathan Haddad <jon@xxxxxxxxxxxxx>
> wrote:
> > > >>
> > > >>>> We should create a JIRA to find what other defaults we need
> revisit.
> > > >>> Changing a default is a pretty big deal, I think we should discuss
> > any
> > > >>> changes to defaults here on the ML before moving it into JIRA.
> It's
> > > nice
> > > >>> to get a bit more discussion around the change than what happens in
> > > JIRA.
> > > >>>
> > > >>> We (TLP) did some testing on 4 tokens and found it to work
> > surprisingly
> > > >>> well.   It wasn't particularly formal, but we verified the load
> stays
> > > >>> pretty even with only 4 tokens as we added nodes to the cluster.
> > > Higher
> > > >>> token count hurts availability by increasing the number of nodes
> any
> > > given
> > > >>> node is a neighbor with, meaning any 2 nodes that fail have an
> > > increased
> > > >>> chance of downtime when using QUORUM.  In addition, with the recent
> > > >>> streaming optimization it seems the token counts will give a
> greater
> > > chance
> > > >>> of a node streaming entire sstables (with LCS), meaning we'll do a
> > > better
> > > >>> job with node density out of the box.
> > > >>>
> > > >>> Next week I can try to put together something a little more
> > convincing.
> > > >>> Weekend time.
> > > >>>
> > > >>> Jon
> > > >>>
> > > >>>
> > > >>> On Fri, Sep 21, 2018 at 8:45 PM sankalp kohli <
> > kohlisankalp@xxxxxxxxx>
> > > >>> wrote:
> > > >>>
> > > >>>> +1 to lowering it.
> > > >>>> Thanks Jon for starting this.We should create a JIRA to find what
> > > other
> > > >>>> defaults we need revisit. (Please keep this discussion for
> "default
> > > token"
> > > >>>> only.  )
> > > >>>>
> > > >>>>> On Fri, Sep 21, 2018 at 8:26 PM Jeff Jirsa <jjirsa@xxxxxxxxx>
> > wrote:
> > > >>>>>
> > > >>>>> Also agree it should be lowered, but definitely not to 1, and
> > > probably
> > > >>>>> something closer to 32 than 4.
> > > >>>>>
> > > >>>>> --
> > > >>>>> Jeff Jirsa
> > > >>>>>
> > > >>>>>
> > > >>>>>> On Sep 21, 2018, at 8:24 PM, Jeremy Hanna <
> > > jeremy.hanna1234@xxxxxxxxx>
> > > >>>>> wrote:
> > > >>>>>> I agree that it should be lowered. What I’ve seen debated a bit
> in
> > > the
> > > >>>>> past is the number but I don’t think anyone thinks that it should
> > > remain
> > > >>>>> 256.
> > > >>>>>>> On Sep 21, 2018, at 7:05 PM, Jonathan Haddad <
> jon@xxxxxxxxxxxxx>
> > > >>>> wrote:
> > > >>>>>>> One thing that's really, really bothered me for a while is how
> we
> > > >>>>> default
> > > >>>>>>> to 256 tokens still.  There's no experienced operator that
> leaves
> > > it
> > > >>>> as
> > > >>>>> is
> > > >>>>>>> at this point, meaning the only people using 256 are the poor
> > folks
> > > >>>> that
> > > >>>>>>> just got started using C*.  I've worked with over a hundred
> > > clusters
> > > >>>> in
> > > >>>>> the
> > > >>>>>>> last couple years, and I think I only worked with one that had
> > > lowered
> > > >>>>> it
> > > >>>>>>> to something else.
> > > >>>>>>>
> > > >>>>>>> I think it's time we changed the default to 4 (or 8, up for
> > > debate).
> > > >>>>>>>
> > > >>>>>>> To improve the behavior, we need to change a couple other
> things.
> > > The
> > > >>>>>>> allocate_tokens_for_keyspace setting is... odd.  It requires
> you
> > > have
> > > >>>> a
> > > >>>>>>> keyspace already created, which doesn't help on new clusters.
> > What
> > > >>>> I'd
> > > >>>>>>> like to do is add a new setting, allocate_tokens_for_rf, and
> set
> > > it to
> > > >>>>> 3 by
> > > >>>>>>> default.
> > > >>>>>>>
> > > >>>>>>> To handle clusters that are already using 256 tokens, we could
> > > prevent
> > > >>>>> the
> > > >>>>>>> new node from joining unless a -D flag is set to explicitly
> allow
> > > >>>>>>> imbalanced tokens.
> > > >>>>>>>
> > > >>>>>>> We've agreed to a trunk freeze, but I feel like this is
> important
> > > >>>> enough
> > > >>>>>>> (and pretty trivial) to do now.  I'd also personally
> characterize
> > > this
> > > >>>>> as a
> > > >>>>>>> bug fix since 256 is horribly broken when the cluster gets to
> any
> > > >>>>>>> reasonable size, but maybe I'm alone there.
> > > >>>>>>>
> > > >>>>>>> I honestly can't think of a use case where random tokens is a
> > good
> > > >>>>> choice
> > > >>>>>>> anymore, so I'd be fine / ecstatic with removing it completely
> > and
> > > >>>>>>> requiring either allocate_tokens_for_keyspace (for existing
> > > clusters)
> > > >>>>>>> or allocate_tokens_for_rf
> > > >>>>>>> to be set.
> > > >>>>>>>
> > > >>>>>>> Thoughts?  Objections?
> > > >>>>>>> --
> > > >>>>>>> Jon Haddad
> > > >>>>>>> http://www.rustyrazorblade.com
> > > >>>>>>> twitter: rustyrazorblade
> > > >>>>>>
> > > ---------------------------------------------------------------------
> > > >>>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > > >>>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > > >>>>>>
> > > >>>>>
> > ---------------------------------------------------------------------
> > > >>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > > >>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > > >>>>>
> > > >>>>>
> > > >>>
> > > >>> --
> > > >>> Jon Haddad
> > > >>> http://www.rustyrazorblade.com
> > > >>> twitter: rustyrazorblade
> > > >>
> ---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > > >> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > > >>
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > > > For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > > For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > >
> > >
> >
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>