Commit Graph

163 Commits

Author SHA1 Message Date
Tomasz Grabiec
f2fdf37415 token_metadata: Add non-const getter of tablet_metadata
Needed for tests.
2023-07-25 21:08:51 +02:00
Tomasz Grabiec
e110167a2a locator: Store node shard count in topology
Will be needed by tablet allocator.
2023-06-21 00:58:25 +02:00
Petr Gusev
f6b019c229 raft topology: add fence_version
It's stored outside of topology table,
since it's updated not through RAFT, but
with a new 'fence' raft command.
The current value is cached in shared_token_metadata.
An initial fence version is loaded in main
during storage_service initialisation.
2023-06-15 15:48:00 +04:00
Petr Gusev
4f99302c2b raft_topology: add barrier_and_drain cmd
We use utils::phased_barrier. The new phase
is started each time the version is updated.
We track all instances of token_metadata,
when an instance is destroyed the
corresponding phased_barrier::operation is
released.
2023-06-15 15:48:00 +04:00
Petr Gusev
253d8a8c65 token_metadata: add topology version
It's stored in as a static column in topology table,
will be updated at various steps of the topology
change state machine.

The initial value is 1, zero means that topology
versions are not yet supported, will be
used in RPC handling.
2023-06-15 15:48:00 +04:00
Petr Gusev
5976277c2c token_metadata: drop has_pending_ranges and migration_info
Use the new erm::has_pending_ranges function, drop the
old implementation from token_metadata.
2023-05-21 13:17:42 +04:00
Petr Gusev
8cb709d3d6 token_metadata: drop update_pending_ranges
The function storage_service::update_pending_ranges is
turned to update_topology_changes_info.
The pending_endpoints and read_endpoints will be
computed later, when the erms are rebuilt.
2023-05-21 13:17:42 +04:00
Petr Gusev
87307781c4 effective_replication_map: use new get_pending_endpoints and get_endpoints_for_reading
We already use the new pending_endpoints from erm though
the get_pending_ranges virtual function, in this commit
we update all the remaining places to use the new
implementation in erm, as well as remove the old implementation
in token_metadata.
2023-05-21 13:17:42 +04:00
Petr Gusev
10bf8c7901 token_metadata: introduce topology_change_info
We plan to move pending_endpoints and read_endpoints, along
with their computation logic, from token_metadata to
vnode_effective_replication_map. The vnode_effective_replication_map
seems more appropriate for them since it contains functionally
similar _replication_map and we will be able to reuse
pending_endpoints/read_endpoints across keyspaces
sharing the same factory_key.

At present, pending_endpoints and read_endpoints are updated in the
update_pending_ranges function. The update logic comprises two
parts - preparing data common to all keyspaces/replication_strategies,
and calculating the migration_info for specific keyspaces. In this commit,
we introduce a new topology_change_info structure to hold the first
part's data add create an update_topology_change_info function to
update it. This structure will later be used in
vnode_effective_replication_map to compute pending_endpoints
and read_endpoints. This enables the reuse of topology_change_info
across all keyspaces, unlike the current update_pending_ranges
implementation, which is another benefit of this refactoring.

The update_topology_change_info implementation is mostly derived from
update_pending_ranges, there are a few differences though:
* replacing async and thread with plain co_awaits;
* adding a utils::clear_gently call for the previous value
to mitigate reactor stalls if target_token_metadata grows large;
* substituting immediately invoked lambdas with simple variables and
blocks to reduce noise, as lambdas would need to be converted into coroutines.

The original update_pending_ranges remains unchanged, and will be
removed entirely upon transitioning to the new implementation.
Meanwhile, we add an update_topology_change_info call to
storage_service::update_pending_ranges so that we can
iteratively switch the system to the new implementation.
2023-05-19 19:04:43 +04:00
Petr Gusev
51e80691ef token_metadata: replace set_topology_transition_state with set_read_new
This helps isolate topology::transition_state dependencies,
token_metadata doesn't need the entire enum, just  this
boolean flag.
2023-05-19 19:04:43 +04:00
Petr Gusev
0e4e2df657 token_metadata: add endpoints for reading
In this patch we add
token_metadata::set_topology_transition_state method.
If the current state is
write_both_read_new update_pending_ranges
will compute new ranges for read requests. The default value
of topology_transition_state is null, meaning no read
ranges are computed. We will add the appropriate
set_topology_transition_state calls later.

Also, we add endpoints_for_reading method to get
read endpoints based on the computed ranges.
2023-05-09 18:41:59 +04:00
Petr Gusev
56c2b3e893 token_metadata_impl: refactor update_pending_ranges
Now update_pending_ranges is quite complex, mainly
because it tries to act efficiently and update only
the affected intervals. However, it uses the function
abstract_replication_strategy::get_ranges, which calls
calculate_natural_endpoints for every token
in the ring anyway.

Our goal is to start reading from the new replicas for
ranges in write_both_read_new state. In the current
code structure this is quite difficult to do, so
in this commit we first simplify update_pending_ranges.

The main idea of the refactoring is to build a new version
of token_metadata based on all planned changes
(join, bootstrap, replace) and then for each token
range compare the result of calculate_natural_endpoints on
the old token_metadata and on the new one.
Those endpoints that are in the new version and
are not in the old version should be added to the pending_ranges.

The add_mapping function is extracted for the
future - we are going to use it to handle read mappings.

Special care is taken when replacing with the same IP.
The coordinator employs the
get_natural_endpoints_without_node_being_replaced function,
which excludes such endpoints from its result. If we compare
the new (merged) and current token_metadata configurations, such
endpoints will also be absent from pending_endpoints since
they exist in both. To address this, we copy the current
token_metadata and remove these endpoints prior to comparison.
This ensures that nodes being replaced are treated
like those being deleted.
2023-05-09 13:56:28 +04:00
Tomasz Grabiec
e6b76ac4b9 dht: token_metadata: Introduce get_my_id() 2023-04-24 10:49:37 +02:00
Tomasz Grabiec
fceb5f8cf6 locator: Introduce tablet_metadata
token_metadata now stores tablet metadata with information about
tablets in the system.
2023-04-24 10:49:37 +02:00
Tomasz Grabiec
34a9c62ae5 locator: token_metadata: Fix confusing comment on ring_range()
It could be interpreted to mean that the search token is excluded.
2023-04-24 10:49:36 +02:00
Tomasz Grabiec
e4865bd4d1 dht, storage_proxy: Abstract token space splitting
Currently, scans are splitting partition ranges around tokens. This
will have to change with tablets, where we should split at tablet
boundaries.

This patch introduces token_range_splitter which abstracts this
task. It is provided by effective_replication_map implementation.
2023-04-24 10:49:36 +02:00
Benny Halevy
e635aa30d6 token_metadata: get endpoint to node map from topology
Don't maintain a "shadow" endpoint_to_host_id_map in token_metadata_impl.

Instead, get the nodes_by_endpoint map from topology
and use it to build the endpoint_to_host_id_map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-11 15:48:30 +03:00
Benny Halevy
c17df1759e topology: add node state
Add a simple node state model with:
`joining`, `normal`, `leaving`, and `left` states
to help managing nodes during replace
with the the same ip address.

Later on, this could also help prevent nodes
that were decommissioned, removed, or replaced
from rejoining the cluster.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-02 20:18:31 +03:00
Benny Halevy
f3d5df5448 locator: add class node
And keep per node information (idx, host_id, endpoint, dc_rack, is_pending)
in node objects, indexed by topology on several indices like:
idx, host_id, endpoint, current/pending, per dc, per dc/rack.

The node index is a shorthand identifier for the node.

node* and index are valid while the respective topology instance is valid.
To be used, the caller must hold on to the topology / token_metadata object
(e.g. via a token_metadata_ptr or effective_replication_map)

Refs #6403

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

topology: add node idx

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-02 20:13:02 +03:00
Benny Halevy
fd1a2591b5 shared_token_metadata: mutate_token_metadata: replicate to all shards
storage_service::replicate_to_all_cores has a sophisticated way
to mutate the token_metadata and effective_replication_map
on shard 0 and cloning those to all other shards, applying
the changes only mutate and clone succeeded on all shards
so we don't end up with only some of the shards with the mutated
copy if an error happend mid-way (and then we would need to
roll-back the change for exception safety).

shared_token_metadata::mutate_token_metadata is currently only called from
a unit test that needs to mutate the token metadata only on shard 0,
but a following patch will require doing that on all shards.

This change adds this capbility by enforcing the call to be
on shard 0m mutating the token_metdata into a temporary pending copy
and cloning it on all other shards.  Only then, when all shard
succeeded, set the modified token_metadata on all shards.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-02 20:07:17 +03:00
Benny Halevy
68141d0aac topology: get rid of pending state
Now, with a44ca06906,
is_normal_token_owner that replaced is_member
does not rely anymore on the pending status
of endpoints in topology.

With that we can get rid of this state and just keep
all endpoints we know about in the topology.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-13 14:17:18 +02:00
Asias He
4571fcf9e7 token_metadata: Rename is_member to is_normal_token_owner
The name is_normal_token_owner is more clear than is_member.
The is_normal_token_owner reflects what it really checks.
2022-11-18 09:29:20 +08:00
Asias He
965097cde5 token_metadata: Add docs for is_member
Make it clear, is_member checks if a node is part of the token ring and
checks nothing else.
2022-11-18 09:28:56 +08:00
Avi Kivity
76be6402ed Merge 'repair: harden effective replication map' from Benny Halevy
As described in #11993 per-shard repair_info instances get the effective_replication_map on their own with no centralized synchronization.

This series ensures that the effective replication maps used by repair (and other associated structures like the token metadata and topology) are all in sync with the one used to initiate the repair operation.

While at at, the series includes other cleanups in this area in repair and view that are not fixes as the calls happen in synchronous functions that do not yield.

Fixes #11993

Closes #11994

* github.com:scylladb/scylladb:
  repair: pass erm down to get_hosts_participating_in_repair and get_neighbors
  repair: pass effective_replication_map down to repair_info
  repair: coroutinize sync_data_using_repair
  repair: futurize do_repair_start
  effective_replication_map: add global_effective_replication_map
  shared_token_metadata: get_lock is const
  repair: sync_data_using_repair: require to run on shard 0
  repair: require all node operations to be called on shard 0
  repair: repair_info: keep effective_replication_map
  repair: do_repair_start: use keyspace erm to get keyspace local ranges
  repair: do_repair_start: use keyspace erm for get_primary_ranges
  repair: do_repair_start: use keyspace erm for get_primary_ranges_within_dc
  repair: do_repair_start: check_in_shutdown first
  repair: get_db().local() where needed
  repair: get topology from erm/token_metdata_ptr
  view: get_view_natural_endpoint: get topology from erm
2022-11-17 13:29:02 +02:00
Benny Halevy
2c677e294b shared_token_metadata: get_lock is const
The lock is acquired using an a function that
doesn't modify the shared_token_metadata object.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 07:58:21 +02:00
Benny Halevy
d0bd305d16 locator: refactor topology out of token_metadata
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-16 21:55:54 +02:00
Benny Halevy
297a4de4e4 locator: add types.hh
To export low-level types that are used by oher modules
for the locator interfaces.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-16 21:53:05 +02:00
Benny Halevy
0c94ffcc85 topology: delete copy constructor
Topology is copied only from token_metadata_impl::clone_only_token_map
which copies the token_metadata_impl with yielding to prevent reactor
stalls.  This should apply to topology as well, so
add a clone_gently function for cloning the topology
from token_metadata_impl::clone_only_token_map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-16 15:27:28 +02:00
Benny Halevy
b74807cb8a locator: token_metadata: add parse_host_id_and_endpoint
To be used for specifying nodes either by their
host_id or ip address and using the token_metadata
to resolve the mapping.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-28 07:38:13 +03:00
Benny Halevy
340a5a0c94 api: storage_service: remove_node: validate host_id
The node to be removed must be identified by its host_id.
Validate that at the api layer and pass the parsed host_id
down to storage_service::removenode.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-28 07:38:13 +03:00
Pavel Emelyanov
b6061bb97d topology: Move all post-configuration to topology::config
Because of snitch ex-dependencies some bits on topology were initialized
with nasty post-start calls. Now it all can be removed and the initial
topology information can be provided by topology::config

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-11 05:18:31 +03:00
Pavel Emelyanov
b61bd6cf56 topology: Put local dc/rack on topology early
Startup code needs to know the dc/rack of the local node early, way
before nodes starts any communication with the ring. This information is
available when snitch activates, but it starts _after_ token-metadata,
so the only way to put local dc/rack in topology is via a startup-time
special API call. This new init_local_endpoint() is temporary and will
be removed later in this set

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-11 05:17:08 +03:00
Pavel Emelyanov
da75552e1f topology: Add pending locations collection
Nowadays the topology object only keeps info about nodes that are normal
members of the ring. Nodes that are joining or bootstrapping or leaving
are out of it. However, one of the goals of this patchset is to make
topology object provide dc/rack info for _all_ nodes, even those in
transitive state.

The introduced _pending_locations is about to hold the dc/rack info for
transitive endpoints. When a node becomes member of the ring it is moved
from pending (if it's there) to current locations, when it leaves the
ring it's moved back to pending.

For now the new collection is just added and the add/remove/get API is
extended to maintain it, but it's not really populated. It will come in
the next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-11 05:17:08 +03:00
Pavel Emelyanov
d60ebc5ace token_metadata: Add config, spread everywhere
Next patches will need to provide some early-start data for topology.
The standard way of doing it is via service config, so this patch adds
one. The new config is empty in this patch, to be filled later

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-11 05:17:08 +03:00
Pavel Emelyanov
5edeecf39b token_metadata: Provide dc/rack for bootstrapping nodes
The token_metadata::calculate_pending_ranges_for_bootstrap() makes a
clone of itself and adds bootstrapping nodes to the clone to calculate
ranges. Currently added nodes lack the dc/rack which affects the
calculations the bad way.

Unfortunately, the dc/rack for those nodes is not available on topology
(yet) and needs pretty heavy patching to have. Fortunately, the only
caller of this method has gossiper at hand to provide the dc/rack from.

fixes: #11531

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #11596
2022-09-22 06:55:52 +03:00
Michał Chojnowski
47844689d8 token_metadata: make local_dc_filter a lambda, not a std::function
This std::function causes allocations, both on construction
and in other operations. This costs ~2200 instructions
for a DC-local query. Fix that.

Closes #11494
2022-09-09 18:05:46 +02:00
Pavel Emelyanov
42c9f35374 topology: Mark compare_endpoints() arguments as const
Continuation to debfcc0e (snitch: Move sort_by_proximity() to topology).
The passed addresses are not modified by the helper. They are not yet
const because the method was copy-n-pasted from snitch where it wasn't
such.

tests: unit(dev)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220906074708.29574-1-xemul@scylladb.com>
2022-09-06 11:03:13 +03:00
Avi Kivity
ae4b2ee583 locator: token_metadata: drop unused and dangerous accessors
The mutable get_datacenter_endpoints() and get_datacenter_racks() are
dangerous since they expose internal members without enforcing class
invariants. Fortunately they are unused, so delete them.

Closes #11454
2022-09-06 06:08:02 +03:00
Pavel Emelyanov
debfcc0eff snitch: Move sort_by_proximity() to topology
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-05 15:17:04 +03:00
Pavel Emelyanov
41973c5bf7 topology: Add "enable proximity sorting" bit
There's one corner case in nodes sorting by snitch. The simple snitch
code overloads the call and doesn't sort anything. The same behavior
should be preserved by (future) topology implementation, but it doesn't
know the snitch name. To address that the patch adds a boolean switch on
topology that's turned off by main code when it sees the snitch is
"simple" one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-05 15:15:07 +03:00
Pavel Emelyanov
b6fdea9a79 code: Call sort_endpoints_by_proximity() via topology
The method is about to be moved from snitch to topology, this patch
prepares the rest of the code to use the latter to call it. The
topology's method just calls snitch, but it's going to change in the
next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-05 15:14:01 +03:00
Pavel Emelyanov
c043f6fa96 topology: Some renames after previous patch
The topology::update_endpoint() is now a plain wrapper over private
::add_endpoint() method of the same class. It's simpler to merge them

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-26 09:46:26 +03:00
Pavel Emelyanov
5fc9854eae topology: Make update_endpoint() accept dc-rack info
The method in question populates topology's internal maps with endpoint
vs dc/rack relations. As for today the dc/rack values are taken from the
global snitch object (which, in turn, goes to gossiper, system keyspace
and its internal non-updateable cache for that).

This patch prepares the ground for providing the dc/rack externally via
argument. By now it's just and argument with empty strings, but next
patches will populate it with real values (spoiler: in 99% it's storage
service that calls this method and each call will know where to get it
from for sure)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-26 09:41:09 +03:00
Pavel Emelyanov
056d21c050 token_metadata: Remove batch tokens updating method
No users left.

The endpoint_tokens.empty() check is removed, only tests could trigger
it, but they didn't and are patched out.

Indentation is left broken

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-24 08:24:21 +03:00
Benny Halevy
d295d8e280 everywhere: define locator::host_id as a strong tagged_uuid type
So it can be distinguished from other uuid-based
identifiers in the system.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #11276
2022-08-12 06:01:44 +03:00
Botond Dénes
fbbe2529c1 Merge "Remove global snitch usage from consistency_level.cc" from Pavel Emelyanov
"
There are several helpers in this .cc file that need to get datacenter
for endpoints. For it they use global snitch, because there's no other
place out there to get that data from.

The whole dc/rack info is now moving to topology, so this set patches
the consistency_level.cc to get the topology. This is done two ways.
First, the helpers that have keyspace at hand may get the topology via
ks's effective_replication_map.

Two difficult cases are db::is_local() and db.count_local_endpoints()
because both have just inet_address at hand. Those are patched to be
methods of topology itself and all their callers already mess with
token metadata and can get topology from it.
"

* 'br-consistency-level-over-topology' of https://github.com/xemul/scylla:
  consistency_level: Remove is_local() and count_local_endpoints()
  storage_proxy: Use topology::local_endpoints_count()
  storage_proxy: Use proxy's topology for DC checks
  storage_proxy: Keep shared_ptr<proxy> on digest_read_resolver
  storage_proxy: Use topology local_dc_filter in its methods
  storage_proxy: Mark some digest_read_resolver methods private
  forwarding_service: Use topology local_dc_filter
  storage_service: Use topology local_dc_filter
  consistency_level: Use topology local_dc_filter
  consitency-level: Call count_local_endpoints from topology
  consistency_level: Get datacenter from topology
  replication_strategy: Remove hold snitch reference
  effective_replication_map: Get datacenter from topology
  topology: Add local-dc detection shugar
2022-08-05 13:31:55 +03:00
Benny Halevy
4f8ccef2c1 token_metadata: get_all_endpoints: return const unordered_set<inet_address>&
There's no need to transform it into a vector.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-02 10:49:08 +03:00
Pavel Emelyanov
ee0828b506 topology: Add local-dc detection shugar
It's often needed to check if an endpoint sits in the same DC as the
current node. It can be done by

    topo.get_datacenter() == topo.get_datacenter(endpoint)

but in some cases a RAII filter function can be helpful.

Also there's a db::count_local_endpoints() that is surprisingly in use,
so add it to topology as well. Next patches will make use of both.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-07-30 17:58:45 +03:00
Benny Halevy
cf47db2bdb token_metadata: document that update_normal_tokens is unsafe
Currently, if token_metadata_impl::update_normal_tokens
throws an exception before it's done, it leaves the
token_metadata_impl members partially updated
and we have no way of recovering from that.

The existing use cases take that into account
and always call it on a cloned, temporary copy of the token
metadata, so if it throws, the temporary copy is tossed away
without being applied back.

So just cement this, by adding cautions in the token_metadata
class declaration.

Closes #11127

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220728144821.130518-1-bhalevy@scylladb.com>
2022-07-29 05:38:56 +03:00
Pavel Emelyanov
b6f7c8da8b topology: Add get_rack/_datacenter methods
For now they just forward the request to snitch. Once topology is
properly updated boot-time dc/rack info and knows internal IP
it will be able to serve request on its own.

For convenience overloads without arguments return dc/rack for
current node.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-06-22 11:47:26 +03:00