[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Which approach should we use for exposing metrics through Virtual tables?

+1 on doing this on a case-by-case basis. The threadpool_metrics looks reasonable. It's best not to shoehorn all metrics into a single table with all possible columns.

    On Friday, June 22, 2018, 8:11:33 AM PDT, Chris Lohfink <clohfink@xxxxxxxxx> wrote:  
 I think this can really be case by case. In tpstats (I have patch for that by the way in CASSANDRA-14523 <https://issues.apache.org/jira/browse/CASSANDRA-14523>) is pretty intuitive in way you listed. Table metrics is another beast and we will likely have many tables for them, ie a table for viewing latencies, caches, on disk statistics... they can be discussed in their respective tickets.

Having a general table for viewing all metrics is I think an additional thing (ie like the table_stats below), not as a general use browsing thing but to provide alternative to JMX. The custom tables that expose things in a nice (attempted at least) intuitive manner wont have _all_ the metrics and its very likely that people will want them for reporting. Unfortunately the metrics are currently not something you can readily expose in a single table as there are type/scope on some while others have type/keyspace/scope, type/keyspace, and others type/path/scope so there will likely need to be some breakup here with things like "table_metrics", "client_metrics", "streaming_metrics" etc.

I agree with benedict that we should attempt to not expose the internal implementation details in the metrics for when there are changes again, there are always changes. However it is kinda necessary at some level for this "generalized" metrics. This is something the "custom" tables that expose data in the nodetool way don't have as much issues with, and what I personally have been working on first.


> On Jun 22, 2018, at 5:14 AM, Benjamin Lerer <benjamin.lerer@xxxxxxxxxxxx> wrote:
> Hi,
> I would like to start working on exposing the metrics through virtual
> tables in CASSANDRA-14537
> <https://issues.apache.org/jira/browse/CASSANDRA-14537>
> We had some long discussion already in CASSANDRA-7622 about which schema to
> use to expose the metrics, unfortunately in the end I was not truly
> convinced by any solution (including my own).
> I would like to expose the possible solutions and there limitations and
> advantages to find out which is the solution that people prefer or to see
> if somebody can come up with another solution.
> In CASSANDRA-7622, Chris Lohfink proposed to expose the table metric using
> the following schema:
> VIRTUAL TABLE table_stats (
>    keyspace_name TEXT,
>    table_name TEXT,
>    metric TEXT,
>    value DOUBLE,
>    fifteen_min_rate DOUBLE,
>    five_min_rate DOUBLE,
>    mean_rate DOUBLE,
>    one_min_rate DOUBLE,
>    p75th DOUBLE,
>    p95th DOUBLE,
>    p99th DOUBLE,
>    p999th DOUBLE,
>    min BIGINT,
>    max BIGINT,
>    mean DOUBLE,
>    std_dev DOUBLE,
>    median DOUBLE,
>    count BIGINT,
>    PRIMARY KEY( keyspace_name,  table_name , metric));
> This approach has some advantages:
>  - It is easy to use for all the metric categories that we have (http://
>  cassandra.apache.org/doc/latest/operating/metrics.html)
>  - The number of column is relatively small and fit in the cqlsh console.
> The main disadvantage that I see with that approach is that it might not
> always be super readable. Gauge or a Counter metric will have data for only
> one column and will return NULL for all the others. If you know precisely
> which metric is what and you only target that type of metric you can build
> your query in such a way that the output is nicely formatted.
> Unfortunately, I do not expect every user to know which metric is what.
> The output format can also be problematic for monitoring tools as they
> might have to use some extra logic to determine how to process each metric.
> My preferred approach was to use metrics has columns. For example for the
> threadpool metrics it will have given the following schema:
> VIRTUAL TABLE threadpool_metrics (
>    pool_name TEXT,
>    active INT,
>    pending INT,
>    completed BIGINT,
>    blocked BIGINT,
>    total_blocked BIGINT,
>    max_pool_size INT,
>    PRIMARY KEY( pool_name )
> )
> That approach provide an output similar to the one of the nodetool
> tpstats which will be, in my opinion, more readable that the previous
> approach.
> Unfortunately, it also has several serious drawbacks:
>  - It does work for small set of metrics but do not work well for the
>  table or keyspace metrics where we have more than 63 metrics. If you
>  split the histograms, meters and timers into multiple columns you easily
>  reach more than a hundred columns. As Chris pointed out in CASSANDRA-7622
>  it makes the all thing unusable.
>  - It also does not work properly for set of metrics like the commit log
>  metrics because you can not get a natural primary key and will have to
>  somehow create a fake one.
> Nodetool solved the table and keyspace metric problems by splitting them
> into subset (e.g. tablestats, tablehistograms). We could take a similar
> approach and group metrics in meaningful sub-groups and expose them using
> the second approach.
> I tried to put myself in the shoes of a user that has a limited knowlegde
> of the C* metrics but at the end of the day I am certainly not the best
> person to figure out what is the best solution here. So I would like to
> have your feedbacks on that problem.
> Chris if I was wrong on some part or forgot some stuff feel free to correct
> me.