Re: Proposing an Apache Cassandra Management process
Thank you all for the feedback. Nate made a Google doc with the proposal in it and is linked off of the ticket. I will try to flesh it out as I gather your input.
On Wednesday, April 18, 2018, 5:12:49 PM PDT, kurt greaves <kurt@xxxxxxxxxxxxxxx> wrote:
For anyone looking Dinesh made a ticket already: CASSANDRA-14395
On 18 April 2018 at 18:14, Vinay Chella <vinaykumarcse@xxxxxxxxx> wrote:
> This is a good initiative. We have advocated for and run a sidecar for the
> past 5+ years, and we've learned and improved it a lot. We look forward to
> moving features from Priam (such as backup, HTTP -> JMX, etc) incrementally
> to this sidecar as they make sense.
> Vinay Chella
> On Fri, Apr 13, 2018 at 7:01 AM, Eric Evans <john.eric.evans@xxxxxxxxx>
> > On Thu, Apr 12, 2018 at 4:41 PM, Dinesh Joshi
> > <email@example.com> wrote:
> > > Hey all -
> > > With the uptick in discussion around Cassandra operability and after
> > discussing potential solutions with various members of the community, we
> > would like to propose the addition of a management process/sub-project
> > Apache Cassandra. The process would be responsible for common operational
> > tasks like bulk execution of nodetool commands, backup/restore, and
> > checks, among others. We feel we have a proposal that will garner some
> > discussion and debate but is likely to reach consensus.
> > > While the community, in large part, agrees that these features should
> > exist “in the database”, there is debate on how they should be
> > Primarily, whether or not to use an external process or build on
> > CassandraDaemon. This is an important architectural decision but we feel
> > the most critical aspect is not where the code runs but that the operator
> > still interacts with the notion of a single database. Multi-process
> > databases are as old as Postgres and continue to be common in newer
> > like Druid. As such, we propose a separate management process for the
> > following reasons:
> > >
> > > - Resource isolation & Safety: Features in the management process
> > will not affect C*'s read/write path which is critical for stability. An
> > isolated process has several technical advantages including preventing
> > of unnecessary dependencies in CassandraDaemon, separation of JVM
> > like thread pools and heap, and preventing bugs from adversely affecting
> > the main process. In particular, GC tuning can be done separately for the
> > two processes, hopefully helping to improve, or at least not adversely
> > affect, tail latencies of the main process.
> > >
> > > - Health Checks & Recovery: Currently users implement health checks
> > in their own sidecar process. Implementing them in the serving process
> > not make sense because if the JVM running the CassandraDaemon goes south,
> > the healthchecks and potentially any recovery code may not be able to
> > Having a management process running in isolation opens up the possibility
> > to not only report the health of the C* process such as long GC pauses or
> > stuck JVM but also to recover from it. Having a list of basic health
> > that are tested with every C* release and officially supported will help
> > boost confidence in C* quality and make it easier to operate.
> > >
> > > - Reduced Risk: By having a separate Daemon we open the possibility
> > to contribute features that otherwise would not have been considered
> > eg. a UI. A library that started many background threads and is operated
> > completely differently would likely be considered too risky for
> > CassandraDaemon but is a good candidate for the management process.
> > Makes sense IMO.
> > > What can go into the management process?
> > > - Features that are non-essential for serving reads & writes for eg.
> > Backup/Restore or Running Health Checks against the CassandraDaemon, etc.
> > >
> > > - Features that do not make the management process critical for
> > functioning of the serving process. In other words, if someone does not
> > wish to use this management process, they are free to disable it.
> > >
> > > We would like to initially build minimal set of features such as health
> > checks and bulk commands into the first iteration of the management
> > process. We would use the same software stack that is used to build the
> > current CassandraDaemon binary. This would be critical for sharing code
> > between CassandraDaemon & management processes. The code should live
> > in-tree to make this easy.
> > > With regards to more in-depth features like repair scheduling and
> > discussions around compaction in or out of CassandraDaemon, while the
> > management process may be a suitable host, it is not our goal to decide
> > that at this time. The management process could be used in these cases,
> > they meet the criteria above, but other technical/architectural reasons
> > exists for why it should not be.
> > > We are looking forward to your comments on our proposal,
> > Sounds good to me.
> > Personally, I'm a little less interested in things like
> > health/availability checks and metrics collection, because there are
> > already tools to solve this problem (and most places will already be
> > using them). I'm more interested in things like cluster status,
> > streaming, repair, etc. Something to automate/centralize
> > database-specific command and control, and improve visibility.
> > In-tree also makes sense (tools/ maybe?), but I would suggest working
> > out of a branch initially, and seeking inclusion when there is
> > something more concrete to discuss.
> > --
> > Eric Evans
> > john.eric.evans@xxxxxxxxx
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx