OSDir

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Repair scheduling tools


I personally would rather see improvements to reaper and supporting reaper
so the repair tool improvements aren't tied to Cassandra releases. If we
get to a place where the repair tools are stable then figuring out how to
bundle for the best install makes sense to me.

If we add things that will support reaper other repair solutions could also
take advantage.

Jeff

On Thu, Apr 5, 2018, 11:05 PM kurt greaves <kurt@xxxxxxxxxxxxxxx> wrote:

> Vnodes is related and because we made it a default lots of people are using
> it. Repairing a cluster with vnodes is a catastrophe (even a small one is
> often problematic), but we have to deal with it if we build in repair
> scheduling.
>
> Repair scheduling is very important and we should definitely include it
> with C* (sidecar long term makes most sense to me but only if we looked at
> moving other background ops to the sidecar), but I'm positive it's not
> going to work well with vnodes in their current state. Having said that, it
> should still support scheduling repairs on vnode clusters, but the
> vnode+repair problem should be fixed separately (and probably with more
> attention than we've given it) because it's a major issue.
>
> FWIW I know of 256 vnode clusters with > 100 nodes, yet I'd be surprised if
> any of them are currently successfully repairing.
>
> On 6 April 2018 at 03:03, Nate McCall <zznate.m@xxxxxxxxx> wrote:
>
> > I think a take away here is that we can't assume a level of operation
> > maturity will coincide automatically with scale. To make our core
> > features robust, we have to account for less-experienced users.
> >
> > A lot of folks on this thread have *really* strong ops and OpsViz
> > stories. Let's not forget that most of our users don't.
> > ((Un)fortunately, as a consulting firm, we tend to see the worst of
> > this).
> >
> > On Fri, Apr 6, 2018 at 2:52 PM, Jonathan Haddad <jon@xxxxxxxxxxxxx>
> wrote:
> > > Off the top of my head I can remember clusters with 600 or 700 nodes
> with
> > > 256 tokens.
> > >
> > > Not the best situation, but it’s real. 256 has been the default for
> > better
> > > or worse.
> > >
> > > On Thu, Apr 5, 2018 at 7:41 PM Joseph Lynch <joe.e.lynch@xxxxxxxxx>
> > wrote:
> > >
> > >> >
> > >> > We see this in larger clusters regularly. Usually folks have just
> > >> > 'grown into it' because it was the default.
> > >> >
> > >>
> > >> I could understand a few dozen nodes with 256 vnodes, but hundreds is
> > >> surprising. I have a whitepaper draft lying around showing how vnodes
> > >> decrease availability in large clusters by orders of magnitude, I'll
> > polish
> > >> it up and send it out to the list when I get a second.
> > >>
> > >> In the meantime, sorry for de-railing a conversation about repair
> > >> scheduling to talk about vnodes, let's chat about that in a different
> > >> thread :-)
> > >>
> > >> -Joey
> > >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> >
> >
>