osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] changing default token behavior for 4.0


This sounds worthy of a bug report!  We should at least document any such inadequacy, and come up with a plan to fix it.  It would be great if you could file a ticket with a detailed example of the problem.

> On 24 Sep 2018, at 14:57, Tom van der Woerdt <tom.vanderwoerdt@xxxxxxxxxxx> wrote:
> 
> Late comment, but I'll write it anyway.
> 
> The main advantage of random allocation over the new allocation strategy is
> that it seems to be significantly better when dealing with node *removals*,
> when the order of removal is not the inverse of the order of addition. This
> can lead to severely unbalanced clusters when the new strategy is enabled.
> 
> I tend to go with the random allocation for this reason: you can freely
> add/remove nodes when needed, and the data distribution will remain "good
> enough". It's only when the data density becomes high enough that the new
> token allocation strategy really matters, imho.
> 
> Hope that helps!
> 
> Tom van der Woerdt
> Site Reliability Engineer
> 
> Booking.com B.V.
> Vijzelstraat 66-80 Amsterdam 1017HL Netherlands
> [image: Booking.com] <https://www.booking.com/>
> The world's #1 accommodation site
> 43 languages, 198+ offices worldwide, 120,000+ global destinations,
> 1,550,000+ room nights booked every day
> No booking fees, best price always guaranteed
> Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)
> 
> 
> On Sat, Sep 22, 2018 at 8:12 PM Jonathan Haddad <jon@xxxxxxxxxxxxx> wrote:
> 
>> Is there a use case for random allocation? How does it help with testing? I
>> can’t see a reason to keep it around.
>> 
>> On Sat, Sep 22, 2018 at 3:06 AM kurt greaves <kurt@xxxxxxxxxxxxxxx> wrote:
>> 
>>> +1. I've been making a case for this for some time now, and was actually
>> a
>>> focus of my talk last week. I'd be very happy to get this into 4.0.
>>> 
>>> We've tested various num_tokens with the algorithm on various sized
>>> clusters and we've found that typically 16 works best. With lower numbers
>>> we found that balance is good initially but as a cluster gets larger you
>>> have some problems. E.g We saw that on a 60 node cluster with 8 tokens
>> per
>>> node we were seeing a difference of 22% in token ownership, but on a <=12
>>> node cluster a difference of only 12%. 16 tokens on the other hand wasn't
>>> perfect but generally gave a better balance regardless of cluster size at
>>> least up to 100 nodes. TBH we should probably do some proper testing and
>>> record all the results for this before we pick a default (I'm happy to do
>>> this - think we can use the original testing script for this).
>>> 
>>> But anyway, I'd say Jon is on the right track. Personally how I'd like to
>>> see it is that we:
>>> 
>>>   1. Change allocate_tokens_for_keyspace to allocate_tokens_for_rf in
>> the
>>>   same way that DSE does it. Allowing a user to specify a RF to allocate
>>>   from, and allowing multiple DC's.
>>>   2. Add a new boolean property random_token_allocation, defaults to
>>> false.
>>>   3. Make allocate_tokens_for_rf default to *unset**.
>>>   4. Make allocate_tokens_for_rf *required*** if num_tokens > 1 and
>>>   random_token_allocation != true.
>>>   5. Default num_tokens to 16 (or whatever we find appropriate)
>>> 
>>> * I think setting a default is asking for trouble. When people are going
>> to
>>> add new DC's/nodes we don't want to risk them adding a node with the
>> wrong
>>> RF. I think it's safe to say that a user should have to think about this
>>> before they spin up their cluster.
>>> ** Following above, it should be required to be set so that we don't have
>>> people accidentally using random allocation. I think we should really be
>>> aiming to get rid of random allocation completely, but provide a new
>>> property to enable it for backwards compatibility (also for testing).
>>> 
>>> It's worth noting that a smaller number of tokens *theoretically*
>> decreases
>>> the time for replacement/rebuild, so if we're considering QUORUM
>>> availability with vnodes there's an argument against having a very low
>>> num_tokens. I think it's better to utilise NTS and racks to reduce the
>>> chance of a QUORUM outage over banking on having a lower number of
>> tokens,
>>> as with just a low number of tokens unless you go all the way to 1 you
>> are
>>> just relying on luck that 2 nodes don't overlap. Guess what I'm saying is
>>> that I think we should be choosing a num_tokens that gives the best
>>> distribution for most cluster sizes rather than choosing one that
>>> "decreases" the probability of an outage.
>>> 
>>> Also I think we should continue using CASSANDRA-13701 to track this. TBH
>> I
>>> think in general we should be a bit better at searching for and using
>>> existing tickets...
>>> 
>>> On Sat, 22 Sep 2018 at 18:13, Stefan Podkowinski <spod@xxxxxxxxxx>
>> wrote:
>>> 
>>>> There already have been some discussions on this here:
>>>> https://issues.apache.org/jira/browse/CASSANDRA-13701
>>>> 
>>>> The mentioned blocker there on the token allocation shouldn't exist
>>>> anymore. Although it would be good to get more feedback on it, in case
>>>> we want to enable it by default, along with new defaults for number of
>>>> tokens.
>>>> 
>>>> 
>>>> On 22.09.18 06:30, Dinesh Joshi wrote:
>>>>> Jon, thanks for starting this thread!
>>>>> 
>>>>> I have created CASSANDRA-14784 to track this.
>>>>> 
>>>>> Dinesh
>>>>> 
>>>>>> On Sep 21, 2018, at 9:18 PM, Sankalp Kohli <kohlisankalp@xxxxxxxxx>
>>>> wrote:
>>>>>> 
>>>>>> Putting it on JIRA is to make sure someone is assigned to it and it
>> is
>>>> tracked. Changes should be discussed over ML like you are saying.
>>>>>> 
>>>>>> On Sep 21, 2018, at 21:02, Jonathan Haddad <jon@xxxxxxxxxxxxx>
>> wrote:
>>>>>> 
>>>>>>>> We should create a JIRA to find what other defaults we need
>> revisit.
>>>>>>> Changing a default is a pretty big deal, I think we should discuss
>>> any
>>>>>>> changes to defaults here on the ML before moving it into JIRA.
>> It's
>>>> nice
>>>>>>> to get a bit more discussion around the change than what happens in
>>>> JIRA.
>>>>>>> 
>>>>>>> We (TLP) did some testing on 4 tokens and found it to work
>>> surprisingly
>>>>>>> well.   It wasn't particularly formal, but we verified the load
>> stays
>>>>>>> pretty even with only 4 tokens as we added nodes to the cluster.
>>>> Higher
>>>>>>> token count hurts availability by increasing the number of nodes
>> any
>>>> given
>>>>>>> node is a neighbor with, meaning any 2 nodes that fail have an
>>>> increased
>>>>>>> chance of downtime when using QUORUM.  In addition, with the recent
>>>>>>> streaming optimization it seems the token counts will give a
>> greater
>>>> chance
>>>>>>> of a node streaming entire sstables (with LCS), meaning we'll do a
>>>> better
>>>>>>> job with node density out of the box.
>>>>>>> 
>>>>>>> Next week I can try to put together something a little more
>>> convincing.
>>>>>>> Weekend time.
>>>>>>> 
>>>>>>> Jon
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Sep 21, 2018 at 8:45 PM sankalp kohli <
>>> kohlisankalp@xxxxxxxxx>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> +1 to lowering it.
>>>>>>>> Thanks Jon for starting this.We should create a JIRA to find what
>>>> other
>>>>>>>> defaults we need revisit. (Please keep this discussion for
>> "default
>>>> token"
>>>>>>>> only.  )
>>>>>>>> 
>>>>>>>>> On Fri, Sep 21, 2018 at 8:26 PM Jeff Jirsa <jjirsa@xxxxxxxxx>
>>> wrote:
>>>>>>>>> 
>>>>>>>>> Also agree it should be lowered, but definitely not to 1, and
>>>> probably
>>>>>>>>> something closer to 32 than 4.
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Jeff Jirsa
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Sep 21, 2018, at 8:24 PM, Jeremy Hanna <
>>>> jeremy.hanna1234@xxxxxxxxx>
>>>>>>>>> wrote:
>>>>>>>>>> I agree that it should be lowered. What I’ve seen debated a bit
>> in
>>>> the
>>>>>>>>> past is the number but I don’t think anyone thinks that it should
>>>> remain
>>>>>>>>> 256.
>>>>>>>>>>> On Sep 21, 2018, at 7:05 PM, Jonathan Haddad <
>> jon@xxxxxxxxxxxxx>
>>>>>>>> wrote:
>>>>>>>>>>> One thing that's really, really bothered me for a while is how
>> we
>>>>>>>>> default
>>>>>>>>>>> to 256 tokens still.  There's no experienced operator that
>> leaves
>>>> it
>>>>>>>> as
>>>>>>>>> is
>>>>>>>>>>> at this point, meaning the only people using 256 are the poor
>>> folks
>>>>>>>> that
>>>>>>>>>>> just got started using C*.  I've worked with over a hundred
>>>> clusters
>>>>>>>> in
>>>>>>>>> the
>>>>>>>>>>> last couple years, and I think I only worked with one that had
>>>> lowered
>>>>>>>>> it
>>>>>>>>>>> to something else.
>>>>>>>>>>> 
>>>>>>>>>>> I think it's time we changed the default to 4 (or 8, up for
>>>> debate).
>>>>>>>>>>> 
>>>>>>>>>>> To improve the behavior, we need to change a couple other
>> things.
>>>> The
>>>>>>>>>>> allocate_tokens_for_keyspace setting is... odd.  It requires
>> you
>>>> have
>>>>>>>> a
>>>>>>>>>>> keyspace already created, which doesn't help on new clusters.
>>> What
>>>>>>>> I'd
>>>>>>>>>>> like to do is add a new setting, allocate_tokens_for_rf, and
>> set
>>>> it to
>>>>>>>>> 3 by
>>>>>>>>>>> default.
>>>>>>>>>>> 
>>>>>>>>>>> To handle clusters that are already using 256 tokens, we could
>>>> prevent
>>>>>>>>> the
>>>>>>>>>>> new node from joining unless a -D flag is set to explicitly
>> allow
>>>>>>>>>>> imbalanced tokens.
>>>>>>>>>>> 
>>>>>>>>>>> We've agreed to a trunk freeze, but I feel like this is
>> important
>>>>>>>> enough
>>>>>>>>>>> (and pretty trivial) to do now.  I'd also personally
>> characterize
>>>> this
>>>>>>>>> as a
>>>>>>>>>>> bug fix since 256 is horribly broken when the cluster gets to
>> any
>>>>>>>>>>> reasonable size, but maybe I'm alone there.
>>>>>>>>>>> 
>>>>>>>>>>> I honestly can't think of a use case where random tokens is a
>>> good
>>>>>>>>> choice
>>>>>>>>>>> anymore, so I'd be fine / ecstatic with removing it completely
>>> and
>>>>>>>>>>> requiring either allocate_tokens_for_keyspace (for existing
>>>> clusters)
>>>>>>>>>>> or allocate_tokens_for_rf
>>>>>>>>>>> to be set.
>>>>>>>>>>> 
>>>>>>>>>>> Thoughts?  Objections?
>>>>>>>>>>> --
>>>>>>>>>>> Jon Haddad
>>>>>>>>>>> http://www.rustyrazorblade.com
>>>>>>>>>>> twitter: rustyrazorblade
>>>>>>>>>> 
>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
>>>>>>>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
>>>>>>>>>> 
>>>>>>>>> 
>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
>>>>>>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Jon Haddad
>>>>>>> http://www.rustyrazorblade.com
>>>>>>> twitter: rustyrazorblade
>>>>>> 
>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
>>>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
>>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
>>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
>>>>> 
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
>>>> 
>>>> 
>>> 
>> --
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx