Re: Quantifying Virtual Node Impact on Cassandra Availability
If the blob link on github doesn't work for the pdf (looks like mobile
might not like it), try:
On Mon, Apr 16, 2018 at 1:14 PM, Joseph Lynch <joe.e.lynch@xxxxxxxxx> wrote:
> Josh Snyder and I have been working on evaluating virtual nodes for large
> scale deployments and while it seems like there is a lot of anecdotal
> support for reducing the vnode count , we couldn't find any concrete
> math on the topic, so we had some fun and took a whack at quantifying how
> different choices of num_tokens impact a Cassandra cluster.
> According to the model we developed  it seems that at small cluster
> sizes there isn't much of a negative impact on availability, but when
> clusters scale up to hundreds of hosts, vnodes have a major impact on
> availability. In particular, the probability of outage during short
> failures (e.g. process restarts or failures) or permanent failure (e.g.
> disk or machine failure) appears to be orders of magnitude higher for large
> The model attempts to explain why we may care about this and advances a
> few existing/new ideas for how to fix the scalability problems that vnodes
> fix without the availability (and consistency—due to the effects on repair)
> problems high num_tokens create. We would of course be very interested in
> any feedback. The model source code is on github , PRs are welcome or
> feel free to play around with the jupyter notebook to match your
> environment and see what the graphs look like. I didn't attach the pdf here
> because it's too large apparently (lots of pretty graphs).
> I know that users can always just pick whichever number they prefer, but I
> think the current default was chosen when token placement was random, and I
> wonder whether it's still the right default.
> Thank you,
> -Joey Lynch
>  https://issues.apache.org/jira/browse/CASSANDRA-13701
>  https://github.com/jolynch/python_performance_toolkit/
>  https://github.com/jolynch/python_performance_toolkit/tree/m