[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposing an Apache Cassandra Management process

I too have built a framework over the last year similar to what cstar does
but for our purposes at smartthings. The intention is to OSS it, but it
needs a round of polish, since it really is more of a utility toolbox for
our small cassandra group.

It relies on ssh heavily and performing work on the nodes themselves/using
the installed software that is on the nodes. It also assumes a bash shell.
It is groovy based, uses jsch for the ssh'ing, and we don't use ssh config
(for better or worse). It is fairly aws-centric, although I would like to
abstract out the aws dependence. We use it for backups, restores, some data
collection/triage analysis. The backups can do incremental or fulls, where
incrementals compare the current sstable set against the previous and only
uploads the newer sstables along with manifests that link to the locations
of the already-uploaded sstables. This in particular is very aws/s3 centric
and would be a primary focus to get that dependency abstracted.

Our clusters run behind a variety of different modes, from bastions to
jumpboxes, and once upon a time direct global ssh access, and now global
IPs but with an internal backend access. We will also have ipv6 soon. We
run 2.1.x, 2.2.x, and 3.1 (even some dse community still), so the framework
attempts to deal with all the intricacies and headaches that entails. The
clusters have a lot of variance on what security has been enabled (ssl,
password, password files, etc). Not all operations have been done against
2.2 and especially 3.1, but our big project is a push to 3.x, and this will
be a big tool to enable a lot of that, so I hope to get a lot of the
idiosyncracies for those versions as we go through those upgrades. We will
be running kubernetes soon too, and I will look into abstracting the access
method to the nodes to use maybe kubectl commands once I get to know kuber

I have recently found out about cstar. I'm going to look at that and see if
their way of doing things is better than how we are doing it, especially
with regards to ssh connection maintenance and those types of things. One
thing that I have found is having a "registry" that stores all the
different cluster-specific idiosyncracies, so you can just do <command>
<environment> <cluster> for lots of things.

It isn't particularly efficient in some ways, especially not with
connection pooling/caching/conservation. It wastes a lot of reconnecting,
but slow but steady works ok for our backups and not disrupting the work
the clusters need to do. Parallelism helps a lot, but cstar may have a lot
of good ideas for throttling, parallelism, etc.

We also plan on using it to make some dashboards/UI too at some point.

On Thu, Oct 4, 2018 at 7:20 PM Mick Semb Wever <mck@xxxxxxxxxx> wrote:

> Dinesh / Sankalp,
> My suggestion was to document the landscape in hope and an attempt to
> better understand the requirements possible to a side-car.  It wasn't a
> suggestion to patchwork together everything. But rather as part of
> brainstorming, designing, and exercising an inclusive process to see what's
> been done and how it could have been done better.
> A well designed side-car could also be a valuable fundamental to some of
> these third-party solutions, not just our own designs and ideals. Maybe, I
> hope, that's already obvious.
> It would be really fantastic to see more explorative documentation in
> confluence. Part of that can be to list up all these external tools,
> listing their goals, state, and how a side-car might help them. Reaching
> out to their maintainers to be involved in the process would be awesome
> too. I can start something in the cwiki (but i'm on vacation this week),
> I've also given you write-access Dinesh.
> > I also haven't seen a process to propose & discuss larger changes to
> Cassandra. The Cassandra contribution[1] guide possibly needs to be
> updated. Some communities have a process which facilitate things. See Kafka
> Improvement Process[2], Spark Improvement Process[3].
> Bringing this up was gold, imho. I would love to see something like this
> exist in the C* community (also in cwiki), and the side-car brainstorming
> and design used to test and flesh it out.
> regards,
> Mick
> On Sun, 30 Sep 2018, at 05:19, Dinesh Joshi wrote:
> > > On Sep 27, 2018, at 7:35 PM, Mick Semb Wever <mck@xxxxxxxxxx> wrote:
> > >
> > > Reaper,
> >
> > I have looked at this already.
> >
> > > Priam,
> >
> > I have looked at this already.
> >
> > > Marcus Olsson's offering,
> >
> > This isn't OSS.
> >
> > > CStar,
> >
> > I have looked at this already.
> >
> > > OpsCenter.
> >
> > Latest release is only compatible with DSE and not Apache Cassandra[1]
> >
> > > Then there's a host of command line tools like:
> > > ic-tools,
> > > ctop (was awesome, but is it maintained anymore?),
> > > tablesnap.
> >
> > These are interesting tools and I don't think they do what we're
> > interested in doing.
> >
> > > And maybe it's worth including the diy approach people take…
> pssh/dsh/clusterssh/mussh/fabric, etc
> >
> > What's the point? You can definitely add this to the website as helpful
> > documentation.
> >
> > The proposal in the original thread was to create something that is
> > supported by the Apache Cassandra project learning from the tooling
> > we've all built over the years. The fact that everyone has a sidecar or
> > their own internal tooling is an indicator that the project has room to
> > grow. It will certainly help this project be more user friendly (at
> > least for operators).
> >
> > I, as a user and a developer, do not want to use a patchwork of
> > disparate tools. Does anybody oppose this on technical grounds? If you
> > do, please help me understand why would you prefer using a patchwork of
> > tools vs something that is part of the Cassandra project?
> >
> > Thanks,
> >
> > Dinesh
> >
> > [1]
> https://docs.datastax.com/en/opscenter/6.0/opsc/opscPolicyChanges.html
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> >
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx