OSDir

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Repair scheduling tools


Repair is critical for running C* and I agree with Roopa that it needs to
be part of the offering. I think we should make it easy for new users to
run C*.

Can we have a side car process which we can add to Apache Cassandra
offering and we can put this repair their? I am also fine putting it in C*
if side car is more long term.

On Tue, Apr 3, 2018 at 6:20 PM, Roopa Tangirala <
rtangirala@xxxxxxxxxxx.invalid> wrote:

> In seeing so many companies grapple with running repairs successfully in
> production, and seeing the success of distributed scheduled repair here at
> Netflix, I strongly believe that adding this to Cassandra would be a great
> addition to the database.  I am hoping, we as a community will make it easy
> for teams to operate and run Cassandra by enhancing the core product, and
> making the maintenances like repairs and compactions part of the database
> without external tooling. We can have an experimental flag for the feature
> and only teams who are confident with the service can enable them, while
> others can fall back to default repairs.
>
>
> *Regards,*
>
> *Roopa Tangirala*
>
> Engineering Manager CDE
>
> *(408) 438-3156 - mobile*
>
>
>
>
>
> On Tue, Apr 3, 2018 at 4:19 PM, Kenneth Brotman <
> kenbrotman@xxxxxxxxx.invalid> wrote:
>
> > Why not make it configurable?
> >         auto_manage_repair_consistancy: true (default: false)
> >
> > Then users can use the built in auto repair function that would be
> created
> > or continue to handle it as now.  Default behavior would be "false" so
> > nothing changes on its own.  Just wondering why not have that option?  It
> > might accelerate progress as others have already suggested.
> >
> > Kenneth Brotman
> >
> > -----Original Message-----
> > From: Nate McCall [mailto:zznate.m@xxxxxxxxx]
> > Sent: Tuesday, April 03, 2018 1:37 PM
> > To: dev
> > Subject: Re: Repair scheduling tools
> >
> > This document does a really good job of listing out some of the issues of
> > coordinating scheduling repair. Regardless of which camp you fall into,
> it
> > is certainly worth a read.
> >
> > On Wed, Apr 4, 2018 at 8:10 AM, Joseph Lynch <joe.e.lynch@xxxxxxxxx>
> > wrote:
> > > I just want to say I think it would be great for our users if we moved
> > > repair scheduling into Cassandra itself. The team here at Netflix has
> > > opened the ticket
> > > <https://issues.apache.org/jira/browse/CASSANDRA-14346>
> > > and have written a detailed design document
> > > <https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9G
> > > bFSEyGzEtM/edit#heading=h.iasguic42ger>
> > > that includes problem discussion and prior art if anyone wants to
> > > contribute to that. We tried to fairly discuss existing solutions,
> > > what their drawbacks are, and a proposed solution.
> > >
> > > If we were to put this as part of the main Cassandra daemon, I think
> > > it should probably be marked experimental and of course be something
> > > that users opt into (table by table or cluster by cluster) with the
> > > understanding that it might not fully work out of the box the first
> > > time we ship it. We have to be willing to take risks but we also have
> > > to be honest with our users. It may help build confidence if a few
> > > major deployments use it (such as Netflix) and we are happy of course
> > > to provide that QA as best we can.
> > >
> > > -Joey
> > >
> > > On Tue, Apr 3, 2018 at 10:48 AM, Blake Eggleston
> > > <beggleston@xxxxxxxxx>
> > > wrote:
> > >
> > >> Hi dev@,
> > >>
> > >>
> > >>
> > >> The question of the best way to schedule repairs came up on
> > >> CASSANDRA-14346, and I thought it would be good to bring up the idea
> > >> of an external tool on the dev list.
> > >>
> > >>
> > >>
> > >> Cassandra lacks any sort of tools for automating routine tasks that
> > >> are required for running clusters, specifically repair. Regular
> > >> repair is a must for most clusters, like compaction. This means that,
> > >> especially as far as eventual consistency is concerned, Cassandra
> > >> isn’t totally functional out of the box. Operators either need to
> > >> find a 3rd party solution or implement one themselves. Adding this to
> > >> Cassandra would make it easier to use.
> > >>
> > >>
> > >>
> > >> Is this something we should be doing? If so, what should it look like?
> > >>
> > >>
> > >>
> > >> Personally, I feel like this is a pretty big gap in the project and
> > >> would like to see an out of process tool offered. Ideally, Cassandra
> > >> would just take care of itself, but writing a distributed repair
> > >> scheduler that you trust to run in production is a lot harder than
> > >> writing a single process management application that can failover.
> > >>
> > >>
> > >>
> > >> Any thoughts on this?
> > >>
> > >>
> > >>
> > >> Thanks,
> > >>
> > >>
> > >>
> > >> Blake
> > >>
> > >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> >
> >
>