OSDir

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Repair scheduling tools


This document does a really good job of listing out some of the issues
of coordinating scheduling repair. Regardless of which camp you fall
into, it is certainly worth a read.

On Wed, Apr 4, 2018 at 8:10 AM, Joseph Lynch <joe.e.lynch@xxxxxxxxx> wrote:
> I just want to say I think it would be great for our users if we moved
> repair scheduling into Cassandra itself. The team here at Netflix has
> opened the ticket <https://issues.apache.org/jira/browse/CASSANDRA-14346>
> and have written a detailed design document
> <https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit#heading=h.iasguic42ger>
> that includes problem discussion and prior art if anyone wants to
> contribute to that. We tried to fairly discuss existing solutions, what
> their drawbacks are, and a proposed solution.
>
> If we were to put this as part of the main Cassandra daemon, I think it
> should probably be marked experimental and of course be something that
> users opt into (table by table or cluster by cluster) with the
> understanding that it might not fully work out of the box the first time we
> ship it. We have to be willing to take risks but we also have to be honest
> with our users. It may help build confidence if a few major deployments use
> it (such as Netflix) and we are happy of course to provide that QA as best
> we can.
>
> -Joey
>
> On Tue, Apr 3, 2018 at 10:48 AM, Blake Eggleston <beggleston@xxxxxxxxx>
> wrote:
>
>> Hi dev@,
>>
>>
>>
>> The question of the best way to schedule repairs came up on
>> CASSANDRA-14346, and I thought it would be good to bring up the idea of an
>> external tool on the dev list.
>>
>>
>>
>> Cassandra lacks any sort of tools for automating routine tasks that are
>> required for running clusters, specifically repair. Regular repair is a
>> must for most clusters, like compaction. This means that, especially as far
>> as eventual consistency is concerned, Cassandra isn’t totally functional
>> out of the box. Operators either need to find a 3rd party solution or
>> implement one themselves. Adding this to Cassandra would make it easier to
>> use.
>>
>>
>>
>> Is this something we should be doing? If so, what should it look like?
>>
>>
>>
>> Personally, I feel like this is a pretty big gap in the project and would
>> like to see an out of process tool offered. Ideally, Cassandra would just
>> take care of itself, but writing a distributed repair scheduler that you
>> trust to run in production is a lot harder than writing a single process
>> management application that can failover.
>>
>>
>>
>> Any thoughts on this?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Blake
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx