OSDir

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Repair scheduling tools


In seeing so many companies grapple with running repairs successfully in
production, and seeing the success of distributed scheduled repair here at
Netflix, I strongly believe that adding this to Cassandra would be a great
addition to the database.  I am hoping, we as a community will make it easy
for teams to operate and run Cassandra by enhancing the core product, and
making the maintenances like repairs and compactions part of the database
without external tooling. We can have an experimental flag for the feature
and only teams who are confident with the service can enable them, while
others can fall back to default repairs.


*Regards,*

*Roopa Tangirala*

Engineering Manager CDE

*(408) 438-3156 - mobile*





On Tue, Apr 3, 2018 at 4:19 PM, Kenneth Brotman <
kenbrotman@xxxxxxxxx.invalid> wrote:

> Why not make it configurable?
>         auto_manage_repair_consistancy: true (default: false)
>
> Then users can use the built in auto repair function that would be created
> or continue to handle it as now.  Default behavior would be "false" so
> nothing changes on its own.  Just wondering why not have that option?  It
> might accelerate progress as others have already suggested.
>
> Kenneth Brotman
>
> -----Original Message-----
> From: Nate McCall [mailto:zznate.m@xxxxxxxxx]
> Sent: Tuesday, April 03, 2018 1:37 PM
> To: dev
> Subject: Re: Repair scheduling tools
>
> This document does a really good job of listing out some of the issues of
> coordinating scheduling repair. Regardless of which camp you fall into, it
> is certainly worth a read.
>
> On Wed, Apr 4, 2018 at 8:10 AM, Joseph Lynch <joe.e.lynch@xxxxxxxxx>
> wrote:
> > I just want to say I think it would be great for our users if we moved
> > repair scheduling into Cassandra itself. The team here at Netflix has
> > opened the ticket
> > <https://issues.apache.org/jira/browse/CASSANDRA-14346>
> > and have written a detailed design document
> > <https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9G
> > bFSEyGzEtM/edit#heading=h.iasguic42ger>
> > that includes problem discussion and prior art if anyone wants to
> > contribute to that. We tried to fairly discuss existing solutions,
> > what their drawbacks are, and a proposed solution.
> >
> > If we were to put this as part of the main Cassandra daemon, I think
> > it should probably be marked experimental and of course be something
> > that users opt into (table by table or cluster by cluster) with the
> > understanding that it might not fully work out of the box the first
> > time we ship it. We have to be willing to take risks but we also have
> > to be honest with our users. It may help build confidence if a few
> > major deployments use it (such as Netflix) and we are happy of course
> > to provide that QA as best we can.
> >
> > -Joey
> >
> > On Tue, Apr 3, 2018 at 10:48 AM, Blake Eggleston
> > <beggleston@xxxxxxxxx>
> > wrote:
> >
> >> Hi dev@,
> >>
> >>
> >>
> >> The question of the best way to schedule repairs came up on
> >> CASSANDRA-14346, and I thought it would be good to bring up the idea
> >> of an external tool on the dev list.
> >>
> >>
> >>
> >> Cassandra lacks any sort of tools for automating routine tasks that
> >> are required for running clusters, specifically repair. Regular
> >> repair is a must for most clusters, like compaction. This means that,
> >> especially as far as eventual consistency is concerned, Cassandra
> >> isn’t totally functional out of the box. Operators either need to
> >> find a 3rd party solution or implement one themselves. Adding this to
> >> Cassandra would make it easier to use.
> >>
> >>
> >>
> >> Is this something we should be doing? If so, what should it look like?
> >>
> >>
> >>
> >> Personally, I feel like this is a pretty big gap in the project and
> >> would like to see an out of process tool offered. Ideally, Cassandra
> >> would just take care of itself, but writing a distributed repair
> >> scheduler that you trust to run in production is a lot harder than
> >> writing a single process management application that can failover.
> >>
> >>
> >>
> >> Any thoughts on this?
> >>
> >>
> >>
> >> Thanks,
> >>
> >>
> >>
> >> Blake
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
>
>