messaging_service: Define metrics domain for client connections

Recent seastar update included RPC metrics (scylladb/seastar#1753). The
reported metrics groups together sockets based on their "metrics_domain"
configuration option. This patch makes use of this domain to make scylla
metrics sane.

The domain as this patch defines it includes two strings:

First, the datacenter the server lives in. This is because grouping
metrics for connections to different datacenters makes little sense for
several reasons. For example -- packet delays _will_ differ for local-DC
vs cross-DC traffic and mixing those latencies together is pointless.
Another example -- the amount of traffic may also differ for local- vs
cross-DC connections e.g. because of different usage of enryption and/or
compression.

Second, each verb-idx gets its own domain. That's to be able to analyze
e.g. query-related traffic from gossiper one. For that the existing
isolation cookie is taken as is.

Note, that the metrics is _not_ per-server node. So e.g. two gossiper
connections to two different nodes (in one DC) will belong to the same
domain and thus their stats will be summed when reported.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#15785
This commit is contained in:
Pavel Emelyanov
2023-10-20 18:23:59 +03:00
committed by Kamil Braun
parent efd65aebb2
commit 492b842929
2 changed files with 24 additions and 0 deletions

View File

@@ -294,6 +294,27 @@ bool messaging_service::is_same_rack(inet_address addr) const {
return topo.get_rack(addr) == topo.get_rack();
}
// The socket metrics domain defines the way RPC metrics are grouped
// for different sockets. Thus, the domain includes:
//
// - Target datacenter name, because it's pointless to merge networking
// statis for connections that are in advance known to have different
// timings and rates
// - The verb-idx to tell different RPC channels from each other. For
// that the isolation cookie suits very well, because these cookies
// are different for different indices and are more informative than
// plain numbers
sstring messaging_service::client_metrics_domain(unsigned idx, inet_address addr) const {
sstring ret = _scheduling_info_for_connection_index[idx].isolation_cookie;
if (_token_metadata) {
const auto& topo = _token_metadata->get()->get_topology();
if (topo.has_endpoint(addr)) {
ret += ":" + topo.get_datacenter(addr);
}
}
return ret;
}
future<> messaging_service::ban_host(locator::host_id id) {
return container().invoke_on_all([id] (messaging_service& ms) {
if (ms._banned_hosts.contains(id) || ms.is_shutting_down()) {
@@ -884,6 +905,7 @@ shared_ptr<messaging_service::rpc_protocol_client_wrapper> messaging_service::ge
opts.tcp_nodelay = must_tcp_nodelay;
opts.reuseaddr = true;
opts.isolation_cookie = _scheduling_info_for_connection_index[idx].isolation_cookie;
opts.metrics_domain = client_metrics_domain(idx, id.addr); // not just `addr` as the latter may be internal IP
assert(!must_encrypt || _credentials);

View File

@@ -528,6 +528,8 @@ private:
bool is_host_banned(locator::host_id);
sstring client_metrics_domain(unsigned idx, inet_address addr) const;
public:
// Return rpc::protocol::client for a shard which is a ip + cpuid pair.
shared_ptr<rpc_protocol_client_wrapper> get_rpc_client(messaging_verb verb, msg_addr id);