[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposing an Apache Cassandra Management process

This is a good initiative. We have advocated for and run a sidecar for the
past 5+ years, and we've learned and improved it a lot. We look forward to
moving features from Priam (such as backup, HTTP -> JMX, etc) incrementally
to this sidecar as they make sense.

Vinay Chella

On Fri, Apr 13, 2018 at 7:01 AM, Eric Evans <john.eric.evans@xxxxxxxxx>

> On Thu, Apr 12, 2018 at 4:41 PM, Dinesh Joshi
> <dinesh.joshi@xxxxxxxxx.invalid> wrote:
> > Hey all -
> > With the uptick in discussion around Cassandra operability and after
> discussing potential solutions with various members of the community, we
> would like to propose the addition of a management process/sub-project into
> Apache Cassandra. The process would be responsible for common operational
> tasks like bulk execution of nodetool commands, backup/restore, and health
> checks, among others. We feel we have a proposal that will garner some
> discussion and debate but is likely to reach consensus.
> > While the community, in large part, agrees that these features should
> exist “in the database”, there is debate on how they should be implemented.
> Primarily, whether or not to use an external process or build on
> CassandraDaemon. This is an important architectural decision but we feel
> the most critical aspect is not where the code runs but that the operator
> still interacts with the notion of a single database. Multi-process
> databases are as old as Postgres and continue to be common in newer systems
> like Druid. As such, we propose a separate management process for the
> following reasons:
> >
> >    - Resource isolation & Safety: Features in the management process
> will not affect C*'s read/write path which is critical for stability. An
> isolated process has several technical advantages including preventing use
> of unnecessary dependencies in CassandraDaemon, separation of JVM resources
> like thread pools and heap, and preventing bugs from adversely affecting
> the main process. In particular, GC tuning can be done separately for the
> two processes, hopefully helping to improve, or at least not adversely
> affect, tail latencies of the main process.
> >
> >    - Health Checks & Recovery: Currently users implement health checks
> in their own sidecar process. Implementing them in the serving process does
> not make sense because if the JVM running the CassandraDaemon goes south,
> the healthchecks and potentially any recovery code may not be able to run.
> Having a management process running in isolation opens up the possibility
> to not only report the health of the C* process such as long GC pauses or
> stuck JVM but also to recover from it. Having a list of basic health checks
> that are tested with every C* release and officially supported will help
> boost confidence in C* quality and make it easier to operate.
> >
> >    - Reduced Risk: By having a separate Daemon we open the possibility
> to contribute features that otherwise would not have been considered before
> eg. a UI. A library that started many background threads and is operated
> completely differently would likely be considered too risky for
> CassandraDaemon but is a good candidate for the management process.
> Makes sense IMO.
> > What can go into the management process?
> >    - Features that are non-essential for serving reads & writes for eg.
> Backup/Restore or Running Health Checks against the CassandraDaemon, etc.
> >
> >    - Features that do not make the management process critical for
> functioning of the serving process. In other words, if someone does not
> wish to use this management process, they are free to disable it.
> >
> > We would like to initially build minimal set of features such as health
> checks and bulk commands into the first iteration of the management
> process. We would use the same software stack that is used to build the
> current CassandraDaemon binary. This would be critical for sharing code
> between CassandraDaemon & management processes. The code should live
> in-tree to make this easy.
> > With regards to more in-depth features like repair scheduling and
> discussions around compaction in or out of CassandraDaemon, while the
> management process may be a suitable host, it is not our goal to decide
> that at this time. The management process could be used in these cases, as
> they meet the criteria above, but other technical/architectural reasons may
> exists for why it should not be.
> > We are looking forward to your comments on our proposal,
> Sounds good to me.
> Personally, I'm a little less interested in things like
> health/availability checks and metrics collection, because there are
> already tools to solve this problem (and most places will already be
> using them).  I'm more interested in things like cluster status,
> streaming, repair, etc.  Something to automate/centralize
> database-specific command and control, and improve visibility.
> In-tree also makes sense (tools/ maybe?), but I would suggest working
> out of a branch initially, and seeking inclusion when there is
> something more concrete to discuss.
> --
> Eric Evans
> john.eric.evans@xxxxxxxxx
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx