osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Repair scheduling tools


I think it's informative that Dor, Vinay, and I who have built sidecar
repair systems think that it's crucial to have the scheduling component in
the same process as the repair execution component. Like I said in the
ticket/design, it is *really* hard for repair scheduling process to
determine the internal state of the repair execution process. In our
current production system we have significant complexity in the code to
account for the differing daemon/sidecar life-cycles, repair state loss,
flakey JMX connections, authentication for the sidecar to speak JMX and
CQL, etc... It does seem though like there is significant concern that we
can't iterate quickly in the main process, and it would be easier to
iterate as a tool/sidecar, so I'll spend some time this week sketching out
in the design the additional components and resiliency factors required to
put the scheduler into such a tool.

I do have a hard time buying that an opt-in repair *scheduling* is going to
cause heap problems or impact the daemon significantly; the scheduler
literally reads a few bytes out of a Cassandra table and makes a function
call or two, and then sleeps for 2 minutes. Repair *execution* is the
actual heap intense part and is already part of the Cassandra daemon. If
the concern is that users will start actually running repair and expose
heap issues in repair, then that's great; let's fix it!

If we had a Cassandra sidecar I think it would generally be great to move
all the background tasks (compaction, repair, streaming, backup, etc...)
into the sidecar to cleanly separate the "latency critical" process from
the "throughput critical" process. This would also be great from an ops
perspective because you could choose to run the sidecar in a cgroup to
control usage of network, cpu and ram (you could even pin compaction and
repair to dedicated cores so that they do not interfere with then main
process), and you could upgrade the background process much more easily
with less risk. I think a key part of this though is the leading "if", as
far as I know we don't have a ticket or concrete proposal for a dedicated
Cassandra sidecar. Separately, sidecars are actually hard to do well, but I
think it's still a good direction for Cassandra to go longer term.

-Joey

On Wed, Apr 4, 2018 at 8:00 AM, Ben Bromhead <ben@xxxxxxxxxxxxxxx> wrote:

> +1 to including the implementation in Cassandra itself. Makes managed
> repair a first-class citizen, it nicely rounds out Cassandra's consistency
> story and makes it 1000x more likely that repairs will get run.
>
>
>
>
> On Wed, Apr 4, 2018 at 10:45 AM Jon Haddad <jon@xxxxxxxxxxxxx> wrote:
>
> > Implementation details aside, I’m firmly in the “it would be nice of C*
> > could take care of it” camp.  Reaper is pretty damn easy to use and
> people
> > *still* don’t put it in prod.
> >
> >
> > > On Apr 4, 2018, at 4:16 AM, Rahul Singh <rahul.xavier.singh@xxxxxxxxx>
> > wrote:
> > >
> > > I understand the merits of both approaches. In working with other DBs
> In
> > the “old country” of SQL, we often had to write indexing sequences
> manually
> > for important tables. It was “built into the product” but in order to
> > leverage the maximum benefits of indices we had to have different indices
> > other than the clustered (physical index). The process still sucked. It’s
> > never perfect.
> > >
> > > The JVM is already fraught with GC issues and putting another process
> > being managed in the same heapspace is what I’m worried about.
> Technically
> > the process could be in the same binary but started as a side Car or in
> the
> > same main process.
> > >
> > > Consider a process called “cassandra-agent” that’s sitting around with
> a
> > scheduler based on config or a Cassandra table. Distributed in the same
> > release. Shell / service scripts would start it. The end user knows it
> only
> > by examining the .sh files. This opens possibilities of including a GUI
> > hosted in the same process without cluttering the core coolness of
> > Cassandra.
> > >
> > > Best,
> > >
> > > --
> > > Rahul Singh
> > > rahul.singh@xxxxxxxx
> > >
> > > Anant Corporation
> > >
> > > On Apr 4, 2018, 2:50 AM -0400, Dor Laor <dor@xxxxxxxxxxxx>, wrote:
> > >> We at Scylla, implemented repair in a similar way to the Cassandra
> > reaper.
> > >> We do
> > >> that using an external application, written in go that manages repair
> > for
> > >> multiple clusters
> > >> and saves the data in an external Scylla cluster. The logic resembles
> > the
> > >> reaper one with
> > >> some specific internal sharding optimizations and uses the Scylla rest
> > api.
> > >>
> > >> However, I have doubts it's the ideal way. After playing a bit with
> > >> CockroachDB, I realized
> > >> it's super nice to have a single binary that repairs itself, provides
> a
> > GUI
> > >> and is the core DB.
> > >>
> > >> Even while distributed, you can elect a leader node to manage the
> > repair in
> > >> a consistent
> > >> way so the complexity can be reduced to a minimum. Repair can write
> its
> > >> status to the
> > >> system tables and to provide an api for progress, rate control, etc.
> > >>
> > >> The big advantage for repair to embedded in the core is that there is
> no
> > >> need to expose
> > >> internal state to the repair logic. So an external program doesn't
> need
> > to
> > >> deal with different
> > >> version of Cassandra, different repair capabilities of the core (such
> as
> > >> incremental on/off)
> > >> and so forth. A good database should schedule its own repair, it knows
> > >> whether the shreshold
> > >> of hintedhandoff was cross or not, it knows whether nodes where
> > replaced,
> > >> etc,
> > >>
> > >> My 2 cents. Dor
> > >>
> > >> On Tue, Apr 3, 2018 at 11:13 PM, Dinesh Joshi <
> > >> dinesh.joshi@xxxxxxxxx.invalid> wrote:
> > >>
> > >>> Simon,
> > >>> You could still do load aware repair outside of the main process by
> > >>> reading Cassandra's metrics.
> > >>> In general, I don't think the maintenance tasks necessarily need to
> > live
> > >>> in the main process. They could negatively impact the read / write
> > path.
> > >>> Unless strictly required by the serving path, it could live in a
> > sidecar
> > >>> process. There are multiple benefits including isolation, faster
> > iteration,
> > >>> loose coupling. For example - this would mean that the maintenance
> > tasks
> > >>> can have a different gc profile than the main process and it would be
> > ok.
> > >>> Today that is not the case.
> > >>> The only issue I see is that the project does not provide an official
> > >>> sidecar. Perhaps there should be one. We probably would've not had to
> > have
> > >>> this discussion ;)
> > >>> Dinesh
> > >>>
> > >>> On Tuesday, April 3, 2018, 10:12:56 PM PDT, Qingcun Zhou <
> > >>> zhouqingcun@xxxxxxxxx> wrote:
> > >>>
> > >>> Repair has been a problem for us at Uber. In general I'm in favor of
> > >>> including the scheduling logic in Cassandra daemon. It has the
> benefit
> > of
> > >>> introducing something like load-aware repair, eg, only schedule
> repair
> > >>> while no ongoing compaction or traffic is low, etc. As proposed by
> > others,
> > >>> we can expose keyspace/table-level configurations so that users can
> > opt-in.
> > >>> Regarding the risk, yes there will be problems at the beginning but
> in
> > the
> > >>> long run, users will appreciate that repair works out of the box,
> just
> > like
> > >>> compaction. We have large Cassandra deployments and can work with
> > Netflix
> > >>> folks for intensive testing to boost user confidence.
> > >>>
> > >>> On the other hand, have we looked into how other NoSQL databases do
> > repair?
> > >>> Is there a side car process?
> > >>>
> > >>>
> > >>> On Tue, Apr 3, 2018 at 9:21 PM, sankalp kohli <
> kohlisankalp@xxxxxxxxx
> > >>> wrote:
> > >>>
> > >>>> Repair is critical for running C* and I agree with Roopa that it
> > needs to
> > >>>> be part of the offering. I think we should make it easy for new
> users
> > to
> > >>>> run C*.
> > >>>>
> > >>>> Can we have a side car process which we can add to Apache Cassandra
> > >>>> offering and we can put this repair their? I am also fine putting it
> > in
> > >>> C*
> > >>>> if side car is more long term.
> > >>>>
> > >>>> On Tue, Apr 3, 2018 at 6:20 PM, Roopa Tangirala <
> > >>>> rtangirala@xxxxxxxxxxx.invalid> wrote:
> > >>>>
> > >>>>> In seeing so many companies grapple with running repairs
> successfully
> > >>> in
> > >>>>> production, and seeing the success of distributed scheduled repair
> > here
> > >>>> at
> > >>>>> Netflix, I strongly believe that adding this to Cassandra would be
> a
> > >>>> great
> > >>>>> addition to the database. I am hoping, we as a community will make
> it
> > >>>> easy
> > >>>>> for teams to operate and run Cassandra by enhancing the core
> product,
> > >>> and
> > >>>>> making the maintenances like repairs and compactions part of the
> > >>> database
> > >>>>> without external tooling. We can have an experimental flag for the
> > >>>> feature
> > >>>>> and only teams who are confident with the service can enable them,
> > >>> while
> > >>>>> others can fall back to default repairs.
> > >>>>>
> > >>>>>
> > >>>>> *Regards,*
> > >>>>>
> > >>>>> *Roopa Tangirala*
> > >>>>>
> > >>>>> Engineering Manager CDE
> > >>>>>
> > >>>>> *(408) 438-3156 - mobile*
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Tue, Apr 3, 2018 at 4:19 PM, Kenneth Brotman <
> > >>>>> kenbrotman@xxxxxxxxx.invalid> wrote:
> > >>>>>
> > >>>>>> Why not make it configurable?
> > >>>>>> auto_manage_repair_consistancy: true (default: false)
> > >>>>>>
> > >>>>>> Then users can use the built in auto repair function that would be
> > >>>>> created
> > >>>>>> or continue to handle it as now. Default behavior would be "false"
> > >>> so
> > >>>>>> nothing changes on its own. Just wondering why not have that
> option?
> > >>>> It
> > >>>>>> might accelerate progress as others have already suggested.
> > >>>>>>
> > >>>>>> Kenneth Brotman
> > >>>>>>
> > >>>>>> -----Original Message-----
> > >>>>>> From: Nate McCall [mailto:zznate.m@xxxxxxxxx]
> > >>>>>> Sent: Tuesday, April 03, 2018 1:37 PM
> > >>>>>> To: dev
> > >>>>>> Subject: Re: Repair scheduling tools
> > >>>>>>
> > >>>>>> This document does a really good job of listing out some of the
> > >>> issues
> > >>>> of
> > >>>>>> coordinating scheduling repair. Regardless of which camp you fall
> > >>> into,
> > >>>>> it
> > >>>>>> is certainly worth a read.
> > >>>>>>
> > >>>>>> On Wed, Apr 4, 2018 at 8:10 AM, Joseph Lynch <
> joe.e.lynch@xxxxxxxxx
> > >>>>>> wrote:
> > >>>>>>> I just want to say I think it would be great for our users if we
> > >>>> moved
> > >>>>>>> repair scheduling into Cassandra itself. The team here at Netflix
> > >>> has
> > >>>>>>> opened the ticket
> > >>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-14346
> > >>>>>>> and have written a detailed design document
> > >>>>>>> <https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_
> > >>>> t45rz7H3xs9G
> > >>>>>>> bFSEyGzEtM/edit#heading=h.iasguic42ger
> > >>>>>>> that includes problem discussion and prior art if anyone wants to
> > >>>>>>> contribute to that. We tried to fairly discuss existing
> solutions,
> > >>>>>>> what their drawbacks are, and a proposed solution.
> > >>>>>>>
> > >>>>>>> If we were to put this as part of the main Cassandra daemon, I
> > >>> think
> > >>>>>>> it should probably be marked experimental and of course be
> > >>> something
> > >>>>>>> that users opt into (table by table or cluster by cluster) with
> the
> > >>>>>>> understanding that it might not fully work out of the box the
> first
> > >>>>>>> time we ship it. We have to be willing to take risks but we also
> > >>> have
> > >>>>>>> to be honest with our users. It may help build confidence if a
> few
> > >>>>>>> major deployments use it (such as Netflix) and we are happy of
> > >>> course
> > >>>>>>> to provide that QA as best we can.
> > >>>>>>>
> > >>>>>>> -Joey
> > >>>>>>>
> > >>>>>>> On Tue, Apr 3, 2018 at 10:48 AM, Blake Eggleston
> > >>>>>>> <beggleston@xxxxxxxxx
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Hi dev@,
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> The question of the best way to schedule repairs came up on
> > >>>>>>>> CASSANDRA-14346, and I thought it would be good to bring up the
> > >>> idea
> > >>>>>>>> of an external tool on the dev list.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Cassandra lacks any sort of tools for automating routine tasks
> > >>> that
> > >>>>>>>> are required for running clusters, specifically repair. Regular
> > >>>>>>>> repair is a must for most clusters, like compaction. This means
> > >>>> that,
> > >>>>>>>> especially as far as eventual consistency is concerned,
> Cassandra
> > >>>>>>>> isn’t totally functional out of the box. Operators either need
> to
> > >>>>>>>> find a 3rd party solution or implement one themselves. Adding
> this
> > >>>> to
> > >>>>>>>> Cassandra would make it easier to use.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Is this something we should be doing? If so, what should it look
> > >>>> like?
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Personally, I feel like this is a pretty big gap in the project
> > >>> and
> > >>>>>>>> would like to see an out of process tool offered. Ideally,
> > >>> Cassandra
> > >>>>>>>> would just take care of itself, but writing a distributed repair
> > >>>>>>>> scheduler that you trust to run in production is a lot harder
> than
> > >>>>>>>> writing a single process management application that can
> failover.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Any thoughts on this?
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Blake
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>>> ------------------------------------------------------------
> > >>> ---------
> > >>>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > >>>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > >>>>>>
> > >>>>>>
> > >>>>>> ------------------------------------------------------------
> > >>> ---------
> > >>>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > >>>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Thank you & Best Regards,
> > >>> --Simon (Qingcun) Zhou
> > >>>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> >
> > --
> Ben Bromhead
> CTO | Instaclustr <https://www.instaclustr.com/>
> +1 650 284 9692
> Reliability at Scale
> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
>