Commit Graph

673 Commits

Author SHA1 Message Date
Pavel Emelyanov
47958a4b37 storage_service: Re-gossiping snitch data in reconfiguration callback
Nowadays it's done inside snitch, and snitch needs to carry gossiper
refernece for that. There's an ongoing effort in de-globalizing snitch
and fixing its dependencies. This patch cuts this snitch->gossiper link
to facilitate the mentioned effort.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-23 14:31:55 +03:00
Pavel Emelyanov
9e7407ff91 replication_strategy: Construct temp tokens in place
Otherwise, the token_metadata object is default-initialized, then it's
move-assigned from another object.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-22 19:19:32 +03:00
Pavel Emelyanov
d540af2cb0 topology: Define copy-sonctructor with init-lists
Otherwise the topology is default-constructed, then its fields
are copy-assigned with the data from the copy-from reference.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-22 19:18:58 +03:00
Pavel Emelyanov
5edeecf39b token_metadata: Provide dc/rack for bootstrapping nodes
The token_metadata::calculate_pending_ranges_for_bootstrap() makes a
clone of itself and adds bootstrapping nodes to the clone to calculate
ranges. Currently added nodes lack the dc/rack which affects the
calculations the bad way.

Unfortunately, the dc/rack for those nodes is not available on topology
(yet) and needs pretty heavy patching to have. Fortunately, the only
caller of this method has gossiper at hand to provide the dc/rack from.

fixes: #11531

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #11596
2022-09-22 06:55:52 +03:00
Michał Chojnowski
47844689d8 token_metadata: make local_dc_filter a lambda, not a std::function
This std::function causes allocations, both on construction
and in other operations. This costs ~2200 instructions
for a DC-local query. Fix that.

Closes #11494
2022-09-09 18:05:46 +02:00
Pavel Emelyanov
42c9f35374 topology: Mark compare_endpoints() arguments as const
Continuation to debfcc0e (snitch: Move sort_by_proximity() to topology).
The passed addresses are not modified by the helper. They are not yet
const because the method was copy-n-pasted from snitch where it wasn't
such.

tests: unit(dev)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220906074708.29574-1-xemul@scylladb.com>
2022-09-06 11:03:13 +03:00
Avi Kivity
ae4b2ee583 locator: token_metadata: drop unused and dangerous accessors
The mutable get_datacenter_endpoints() and get_datacenter_racks() are
dangerous since they expose internal members without enforcing class
invariants. Fortunately they are unused, so delete them.

Closes #11454
2022-09-06 06:08:02 +03:00
Pavel Emelyanov
debfcc0eff snitch: Move sort_by_proximity() to topology
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-05 15:17:04 +03:00
Pavel Emelyanov
41973c5bf7 topology: Add "enable proximity sorting" bit
There's one corner case in nodes sorting by snitch. The simple snitch
code overloads the call and doesn't sort anything. The same behavior
should be preserved by (future) topology implementation, but it doesn't
know the snitch name. To address that the patch adds a boolean switch on
topology that's turned off by main code when it sees the snitch is
"simple" one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-05 15:15:07 +03:00
Pavel Emelyanov
b6fdea9a79 code: Call sort_endpoints_by_proximity() via topology
The method is about to be moved from snitch to topology, this patch
prepares the rest of the code to use the latter to call it. The
topology's method just calls snitch, but it's going to change in the
next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-05 15:14:01 +03:00
Pavel Emelyanov
4184091f1c snitch, code: Remove get_sorted_list_by_proximity()
There are two sorting methods in snitch -- one sorts the list of
addresses in place, the other one creates a sorted copy of the passed
const list (in fact -- the passed reference is not const, but it's not
modified by the method). However, both callers of the latter anyway
create their own temporary list of address, so they don't really benefit
from snitch generating another copy.

So this patch leaves just one sorting method -- the in-place one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-05 15:11:37 +03:00
Pavel Emelyanov
642e50f3e3 snitch: Move is_worth_merging_for_range_query to proxy
Proxy is the only place that calls this method. Also the method name
suggests it's not something "generic", but rather an internal logic of
proxy's query processing.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-05 15:10:46 +03:00
Avi Kivity
61769d3b21 Merge "Make messaging service use topology for DC/RACK" from Pavel E
"
Messaging needs to know DC/RACK for nodes to decide whether it needs to
do encryption or compression depending on the options. As all the other
services did it still uses snitch to get it, but simple switch to use
topology needs extra care.

The thing is that messaging can use internal IP instead of endpoints.
Currently it's snitch who tries har^w somehow to resolve this, in
particular -- if the DC/RACK is not found for the given argument it
assumes that it might be internal IP and calls back messaging to convert
it to the endpoint. However, messaging does know when it uses which
address and can do this conversion itself.

So this set eliminates few more global snitch usages and drops the
knot tieing snitch, gossiper and messaging with each-other.
"

* 'br-messaging-use-topology-1.2' of https://github.com/xemul/scylla:
  messaging: Get DC/RACK from topology
  messaging, topology: Keep shared_token_metadata* on messaging
  messaging: Add is_same_{dc|rack} helpers
  snitch, messaging: Dont relookup dc/rack on internal IP
2022-09-04 13:54:34 +03:00
Pavel Emelyanov
6dedc69608 topology: Do not add bootstrapping nodes to topology
Recent change in topology (commit 4cbe6ee9 titled
"topology: Require entry in the map for update_normal_tokens()")
made token_metadata::update_normal_tokens() require the entry presense
in the embedded topology object. Respectively, the commit in question
equipped most callers of update_normal_tokens() with preceeding
topology update call to satisfy the requirement.

However, tokens are put into token_metadata not only for normal state,
but also for bootstrapping, and one place that added bootstrapping
tokens errorneously got topology update. This is wrong -- node must
not be present in the topology until switching into normal state. As
the result several tests with bootstrapping nodes started to fail.

The fix removes topology update for bootstrapping nodes, but this
change reveals few other places that piggy-backed this mistaken
update, so noy _they_ need to update topology themselves.

tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/2040/
       update_cluster_layout_tests.py::test_simple_add_new_node_while_schema_changes_with_repair
       update_cluster_layout_tests.py::test_simple_kill_new_node_while_bootstrapping_with_parallel_writes_in_multidc
       repair_based_node_operations_test.py::test_lcs_reshape_efficiency

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220902082753.17827-1-xemul@scylladb.com>
2022-09-04 13:53:38 +03:00
Pavel Emelyanov
c08c370c2c snitch, messaging: Dont relookup dc/rack on internal IP
When getting dc/rack snitch may perform two lookups -- first time it
does it using the provided IP, if nothing is found snitch assumes that
the IP is internal one, gets the corresponding public one and searches
again.

The thing is that the only code that may come to snitch with internal
IP is the messaging service. It does so in two places: when it tries
to connect to the given endpoing and when it accepts a connection.

In the former case messaging performs public->internal IP conversion
itself and goes to snitch with the internal IP value. This place can get
simpler by just feeding the public IP to snich, and converting it to the
internal only to initiate the connection.

In the latter case the accepted IP can be either, but messaging service
has the public<->private map onboard and can do the conversion itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-01 11:32:34 +03:00
Pavel Emelyanov
6405aba748 toplogy: Use the provided dc/rack info
Previous patches made all the callers of topology.update_endpoint()
(via token_metadata.update_topology()) provide correct dc/rack info
for the endpoint. It's now possible to stop using global snitch by
topology and just rely on the dc/rack argument.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-26 10:02:00 +03:00
Pavel Emelyanov
c043f6fa96 topology: Some renames after previous patch
The topology::update_endpoint() is now a plain wrapper over private
::add_endpoint() method of the same class. It's simpler to merge them

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-26 09:46:26 +03:00
Pavel Emelyanov
4cbe6ee9f4 topology: Require entry in the map for update_normal_tokens()
The method in question tries to be on the safest side and adds the
enpoint for which it updates the tokens into the topology. From now on
it's up to the caller to put the endpoint into topology in advance.

So most of what this patch does is places topology.update_endpoint()
into the relevant places of the code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-26 09:44:08 +03:00
Pavel Emelyanov
5fc9854eae topology: Make update_endpoint() accept dc-rack info
The method in question populates topology's internal maps with endpoint
vs dc/rack relations. As for today the dc/rack values are taken from the
global snitch object (which, in turn, goes to gossiper, system keyspace
and its internal non-updateable cache for that).

This patch prepares the ground for providing the dc/rack externally via
argument. By now it's just and argument with empty strings, but next
patches will populate it with real values (spoiler: in 99% it's storage
service that calls this method and each call will know where to get it
from for sure)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-26 09:41:09 +03:00
Pavel Emelyanov
7305061674 replication_strategy: Accept dc-rack as get_pending_address_ranges argument
The method creates a copy of token metadata and pushes an endpoint (with
some tokens) into it. Next patches will require providing dc/rack info
together with the endpoint, this patch prepares for that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-26 09:39:44 +03:00
Avi Kivity
df87949241 Merge "Remove batch tokens update helper" from Pavel E
"
On token_metadata there are two update_normal_tokens() overloads --
one updates tokens for a single endpoint, another one -- for a set
(well -- std::map) of them. Other than updating the tokens both
methods also may add an endpoint to the t.m.'s topology object.

There's an ongoing effort in moving the dc/rack information from
snitch to topology, and one of the changes made in it is -- when
adding an entry to topology, the dc/rack info should be provided
by the caller (which is in 99% of the cases is the storage service).
The batched tokens update is extremely unfriendly to the latter
change. Fortunately, this helper is only used by tests, the core
code always uses fine-grained tokens updating.
"

* 'br-tokens-update-relax' of https://github.com/xemul/scylla:
  token_metadata: Indentation fix after prevuous patch
  token_metadata: Remove excessive empty tokens check
  token_metadata: Remove batch tokens updating method
  tests: Use one-by-one tokens updating method
2022-08-25 12:01:58 +02:00
Pavel Emelyanov
d8c5044eee token_metadata: Indentation fix after prevuous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-24 08:24:21 +03:00
Pavel Emelyanov
8238c38e9f token_metadata: Remove excessive empty tokens check
After the previous patch empty passed tokens make the helper co_return
early, so this if is the dead code

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-24 08:24:21 +03:00
Pavel Emelyanov
056d21c050 token_metadata: Remove batch tokens updating method
No users left.

The endpoint_tokens.empty() check is removed, only tests could trigger
it, but they didn't and are patched out.

Indentation is left broken

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-24 08:24:21 +03:00
Pavel Emelyanov
18fa5038b1 replication_strategy: Remove unused method
The get_pending_address_ranges() accepting a single token is not in use,
its peer that accepts a set of tokens is

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #11358
2022-08-23 20:23:50 +02:00
Benny Halevy
d295d8e280 everywhere: define locator::host_id as a strong tagged_uuid type
So it can be distinguished from other uuid-based
identifiers in the system.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #11276
2022-08-12 06:01:44 +03:00
Benny Halevy
9167b857e9 abstract_replication_strategy: calculate_effective_replication_map: optimize for static replication strategies
For replication strategies like "everywhere"
and "local" that return the same set of endpoints
for all tokens, we can call rs->calculate_natural_endpoints
one once and reuse the result for all token.

Note that ideally the replication_map could contain only
a single token range for this case, but that does't seem to work yet.

Add maybe_yield() calls to the tight loop
to prevent reactor stalls on large clusters when copying
a long vector returned by everywhere_replication_strategy
to potentially 1000's of tokens in the map.

Nicholas Peshek wrote in
https://github.com/scylladb/scylladb/issues/10337#issuecomment-1211152370

about similar patch by Geoffrey Beausire:
994c6ecf3c

> Yep. That dropped our startup from 3000+ seconds to about 40.

Fixes #10337

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-11 10:35:29 +03:00
Benny Halevy
eb678e723b abstract_replication_strategy: add has_uniform_natural_endpoints
So that using calaculate_natural_endpoints can be optimized
for strategies that return the same endpoints for all tokens,
namely everywhere_replication_strategy and local_strategy.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-11 10:34:14 +03:00
Benny Halevy
91ab8ee1c3 effective_replication_map: make get_range_addresses asynchronous
So it may yield, preenting reactor stalls as seen in #11005.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 17:31:01 +03:00
Benny Halevy
e541009f65 effective_replication_map: add get_replication_strategy
And use it in storage_service::get_changed_ranges_for_leaving.

A following patch will pass the e_r_m to
storage_service::get_changed_ranges_for_leaving, rather than
getting it there.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 17:31:00 +03:00
Benny Halevy
6794e15163 effective_replication_map: get_range_addresses: use the precalculated replication_map
There is no need to call get_natural_endpoints for every token
in sorted_tokens order, since we can just get the precalculated
per-token endpoints already in the _replication_map member.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 17:31:00 +03:00
Benny Halevy
1d4aea4441 abstract_replication_strategy: get_pending_address_ranges: prevent extra vector copies
Reduce large allocations and reactor stalls seen in #11005
by open coding `get_address_ranges` and using std::vector::insert
to efficiently appending the ranges returned by `get_primary_ranges_for`
onto the returned token_range_vector in contrast to building
an unordered_multimap<inet_address, dht::token_range> first in
`get_address_ranges` and traversing it and adding one token_range
at a time.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 17:31:00 +03:00
Benny Halevy
7811b0d0aa abstract_replication_strategy: reindent 2022-08-08 17:31:00 +03:00
Benny Halevy
ebe1edc091 utils: sequenced_set: expose set and contains method
And use that in sights using the endpoint set
returned by abstract_replication_strategy::calculate_natural_endpoints.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 17:31:00 +03:00
Benny Halevy
7017ad6822 abstract_replication_strategy: calculate_natural_endpoints: return endpoint_set
So it could be used also for easily searching for an endpoint.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 17:31:00 +03:00
Botond Dénes
fbbe2529c1 Merge "Remove global snitch usage from consistency_level.cc" from Pavel Emelyanov
"
There are several helpers in this .cc file that need to get datacenter
for endpoints. For it they use global snitch, because there's no other
place out there to get that data from.

The whole dc/rack info is now moving to topology, so this set patches
the consistency_level.cc to get the topology. This is done two ways.
First, the helpers that have keyspace at hand may get the topology via
ks's effective_replication_map.

Two difficult cases are db::is_local() and db.count_local_endpoints()
because both have just inet_address at hand. Those are patched to be
methods of topology itself and all their callers already mess with
token metadata and can get topology from it.
"

* 'br-consistency-level-over-topology' of https://github.com/xemul/scylla:
  consistency_level: Remove is_local() and count_local_endpoints()
  storage_proxy: Use topology::local_endpoints_count()
  storage_proxy: Use proxy's topology for DC checks
  storage_proxy: Keep shared_ptr<proxy> on digest_read_resolver
  storage_proxy: Use topology local_dc_filter in its methods
  storage_proxy: Mark some digest_read_resolver methods private
  forwarding_service: Use topology local_dc_filter
  storage_service: Use topology local_dc_filter
  consistency_level: Use topology local_dc_filter
  consitency-level: Call count_local_endpoints from topology
  consistency_level: Get datacenter from topology
  replication_strategy: Remove hold snitch reference
  effective_replication_map: Get datacenter from topology
  topology: Add local-dc detection shugar
2022-08-05 13:31:55 +03:00
Pavel Emelyanov
f84ee8f0fb consistency_level: Get datacenter from topology
In some of db/consistency_level.cc helpers the topology can be
obtained from keyspace's effective replication map

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-05 12:19:47 +03:00
Pavel Emelyanov
00f166809e replication_strategy: Remove hold snitch reference
When the strategy is constructed there's no place to get snitch from
so the global instance is used. However, after previous patch the
replication strategy no longer needs snitch, so this dependency can
be dropped

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-05 12:19:43 +03:00
Pavel Emelyanov
298213f27f effective_replication_map: Get datacenter from topology
Now it gets it from snitch, but the dc/rack info is being relocated
onto topology. The topology is in turn already there

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-05 12:19:31 +03:00
Botond Dénes
df203a48af Merge "Remove reconnectable_snitch_helper" from Pavel Emelyanov
"
The helper is in charge of receiving INTERNAL_IP app state from
gossiper join/change notifications, updating system.peers with it
and kicking messaging service to update its preferred ip cache
along with initiating clients reconnection.

Effectively this helper duplicates the topology tracking code in
storage-service notifiers. Removing it makes less code and drops
a bunch of unwanted cross-components dependencies, in particular:

- one qctx call is gone
- snitch (almost) no longer needs to get messaging from gossiper
- public:private IP cache becomes local to messaging and can be
  moved to topology at low cost

Some nice minor side effect -- this helper was left unsubscribed
from gossiper on stop and snitch rename. Now its all gone.
"

* 'br-remove-reconnectible-snitch-helper-2' of https://github.com/xemul/scylla:
  snitch: Remove reconnectable snitch helper
  snitch, storage_service: Move reconnect to internal_ip kick
  snitch, storage_service: Move system.peers preferred_ip update
  snitch: Export prefer-local
2022-08-04 13:06:05 +03:00
Benny Halevy
0dfd92d0b3 token_metadata: allow update_normal_token_owners to yield
Given #11146, we see a 10ms stall when calculate_natural_endpoints
calls get_all_endpoints that up until this patch performed a
similar loop on the `_token_to_endpoint_map`, so to prevent such
a stall with large number of tokens, turn update_normal_token_owners
async, and allow yielding in the per-token tight loop.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-02 10:49:32 +03:00
Benny Halevy
4f8ccef2c1 token_metadata: get_all_endpoints: return const unordered_set<inet_address>&
There's no need to transform it into a vector.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-02 10:49:08 +03:00
Benny Halevy
a980f94d85 token_metadata: impl: keep the set of normal token owners as a member
We don't need to recalculate the unique set of normal token
everytime we change `_token_to_endpoint_map`.
Similarly, this doesn't have to be done in `get_all_endpoints`.

Instead we can maintain it inexpensively in
`remove_endpoint`, and let `count_normal_token_owners`
just return its size and `get_all_endpoints` just return
the saved set.

Note that currently topology is not updated accurately
in update_normal_token() and it may contain endpoint
that do no longer own any tokens.

If we did update topology accurately there, we
could use its locations map instead as its keys are equivalent
to the unordered_set<inet_address> we implement here.

Closes #11128

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-02 10:49:07 +03:00
Pavel Emelyanov
ee0828b506 topology: Add local-dc detection shugar
It's often needed to check if an endpoint sits in the same DC as the
current node. It can be done by

    topo.get_datacenter() == topo.get_datacenter(endpoint)

but in some cases a RAII filter function can be helpful.

Also there's a db::count_local_endpoints() that is surprisingly in use,
so add it to topology as well. Next patches will make use of both.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-07-30 17:58:45 +03:00
Benny Halevy
cf47db2bdb token_metadata: document that update_normal_tokens is unsafe
Currently, if token_metadata_impl::update_normal_tokens
throws an exception before it's done, it leaves the
token_metadata_impl members partially updated
and we have no way of recovering from that.

The existing use cases take that into account
and always call it on a cloned, temporary copy of the token
metadata, so if it throws, the temporary copy is tossed away
without being applied back.

So just cement this, by adding cautions in the token_metadata
class declaration.

Closes #11127

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220728144821.130518-1-bhalevy@scylladb.com>
2022-07-29 05:38:56 +03:00
Asias He
6152f5b858 locator: Speed up abstract_replication_strategy::get_address_ranges
To get the list of tokens for a given node, we loop through all the
tokens and calculate the nodes that are responsible for the token.

In case of the everywhere_topology, we know any node that is part of the
the ring will be responsible for all tokens.

This patch adds a fast path for everywhere_topology to avoid calculating
natural endpoints.

Refs #10337
Refs #10817
Refs #10836
Refs #10837
2022-07-26 18:53:09 +08:00
Asias He
9a8a80527b locator: Speed up simple_strategy::calculate_natural_endpoint
If the number of nodes in the cluster is smaller than the desired
replication factor we should return the loop when endpoints already
contains all the nodes in the cluster because no more nodes could be
added to endpoints lists

Refs #10337
Refs #10817
Refs #10836
Refs #10837
2022-07-26 18:53:09 +08:00
Asias He
4c714dfe3b token_metadata: Speed up count_normal_token_owners
Currently, a set of nodes is built from _token_to_endpoint_map to get
the number of nodes in _token_to_endpoint_map.

To make it faster so we can call it on a fast path in the following
patch, a _nr_normal_token_owners member is introduced to track the
number.

Refs #10337
Refs #10817
Refs #10836
Refs #10837
2022-07-26 18:53:09 +08:00
Pavel Emelyanov
40d6ea973c snitch: Remove reconnectable snitch helper
It's now no-op

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-07-26 13:51:05 +03:00
Pavel Emelyanov
b91f7e9ec4 snitch, storage_service: Move reconnect to internal_ip kick
The same thing as in previous patch -- when gossiper issues
on_join/_change notification, storage service can kick messaging
service to update its internal_ip cache and reconnect to the peer.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-07-26 13:48:46 +03:00