docs/metrics.md: document additional "lables"
Recently we started to use more the concept of metric labels - several metrics which share the same name, but differ in the value of some label such a "group" (for different scheduling groups). This patch documents this feature in docs/metrics.md, gives the example of scheduling groups, and explains a couple more relevant Promethueus syntax tricks. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190909113803.15383-1-nyh@scylladb.com>
This commit is contained in:
@@ -57,6 +57,31 @@ sum of all shards and all nodes.
|
||||
The "type" label should be ignored - it appears for historic reasons
|
||||
(it was used by collectd) and is planned to be removed in the future.
|
||||
|
||||
## Additional metric labels
|
||||
In some cases, we have several metrics which measure the same thing but for
|
||||
different cases. For example, Scylla has about a dozen _scheduling groups_
|
||||
(see isolation.md), and we would like to get some statistics - e.g. the
|
||||
scheduler queue length - separately for each of these scheduling groups.
|
||||
|
||||
One option is to have a dozen different metrics with different names, e.g.,
|
||||
`scylla_scheduler_queue_length_main`, `scylla_scheduler_queue_length_statement`
|
||||
for the two scheduling groups called "main" and "statement".
|
||||
|
||||
However, there is a second option - which we chose in this case. The second
|
||||
option is to have just one metric name, and qualify it by a **label** with
|
||||
a value. In this case, we have one metric name `scylla_scheduler_queue_length`,
|
||||
and metrics on different scheduling groups differ by the `group` label:
|
||||
`scylla_scheduler_queue_length{group="main"}` and
|
||||
`scylla_scheduler_queue_length{group="statement"}`.
|
||||
|
||||
Each metric reported by Scylla often has multiple labels, e.g.,
|
||||
```
|
||||
scylla_scheduler_queue_length{group="main",shard="0",type="gauge"} 0.000000
|
||||
```
|
||||
This metric has the `group` label, saying to which scheduling group this
|
||||
measurement pertains, and also `shard` and `type` labels which we described
|
||||
in the previous section.
|
||||
|
||||
## Per-table metrics
|
||||
Most of Scylla's metrics are global (in each shard). Scylla also supports
|
||||
per-table metrics, which are maintained separately for each table in the
|
||||
@@ -179,8 +204,14 @@ a user asks to graph some metric `xyz` the result is a graph with multiple
|
||||
lines, one line for each shard and node. The syntax `xyz{instance="..."}`
|
||||
will limit the lines to all shards of just one node (given the node's IP
|
||||
address), and the syntax `xyz{instance="...",shard="0"}` will show only
|
||||
one shard of one node. The syntax `sum(xyz)` will plot just one line, with
|
||||
the total of the metric `xyz` over all shards in all nodes.
|
||||
one shard of one node. The syntax `xyz{group=~"memtable.*"}` will show
|
||||
only metrics where the `group` label matches the given regular expression.
|
||||
|
||||
The syntax `sum(xyz)` will plot just one line, with the total of the metric
|
||||
`xyz` over all shards in all nodes. It's also possible to plot partial sums -
|
||||
for example `sum(xyz) by (group)` generates a separate sum (and plot line)
|
||||
for each value of the label `group`.
|
||||
|
||||
The expression `irate(xyz[1m])` graphs the rate of change (i.e.,
|
||||
the derivative) of the metric `xyz`. In this last example, the "1m"
|
||||
selector is ignored by the `irate()` function, but some duration is required
|
||||
|
||||
Reference in New Issue
Block a user