scylla

Author	SHA1	Message	Date
Benny Halevy	b71f229fc2	topology: node: update_node: do not override internal changed flag by state option Currently, opt_st overrides the internal `changed` flag by setting it with the opt_st changed status. Instead, it should use `\|=` to keep it true if it is already so. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #13502	2023-04-13 17:46:59 +02:00
Benny Halevy	7b76369ffc	topology: add for_each_node To eventually replace token_metadata::get_endpoint_to_host_id_map_for_reading Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-11 15:55:39 +03:00
Benny Halevy	e635aa30d6	token_metadata: get endpoint to node map from topology Don't maintain a "shadow" endpoint_to_host_id_map in token_metadata_impl. Instead, get the nodes_by_endpoint map from topology and use it to build the endpoint_to_host_id_map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-11 15:48:30 +03:00
Benny Halevy	c17df1759e	topology: add node state Add a simple node state model with: `joining`, `normal`, `leaving`, and `left` states to help managing nodes during replace with the the same ip address. Later on, this could also help prevent nodes that were decommissioned, removed, or replaced from rejoining the cluster. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-02 20:18:31 +03:00
Benny Halevy	027f188a97	topology: remove dead code Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-02 20:13:04 +03:00
Benny Halevy	f3d5df5448	locator: add class node And keep per node information (idx, host_id, endpoint, dc_rack, is_pending) in node objects, indexed by topology on several indices like: idx, host_id, endpoint, current/pending, per dc, per dc/rack. The node index is a shorthand identifier for the node. node* and index are valid while the respective topology instance is valid. To be used, the caller must hold on to the topology / token_metadata object (e.g. via a token_metadata_ptr or effective_replication_map) Refs #6403 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> topology: add node idx Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-02 20:13:02 +03:00
Benny Halevy	006e02410f	topology: rename update_endpoint to add_or_update_endpoint To reflect what it does, Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-02 20:08:03 +03:00
Benny Halevy	df1c92649e	topology: define get_{rack,datacenter} inline Define get_location() that gets the location for the local node, and use either this entry point or get_location(inet_address) to get the respective dc or rack. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-02 20:07:49 +03:00
Benny Halevy	fd1a2591b5	shared_token_metadata: mutate_token_metadata: replicate to all shards storage_service::replicate_to_all_cores has a sophisticated way to mutate the token_metadata and effective_replication_map on shard 0 and cloning those to all other shards, applying the changes only mutate and clone succeeded on all shards so we don't end up with only some of the shards with the mutated copy if an error happend mid-way (and then we would need to roll-back the change for exception safety). shared_token_metadata::mutate_token_metadata is currently only called from a unit test that needs to mutate the token metadata only on shard 0, but a following patch will require doing that on all shards. This change adds this capbility by enforcing the call to be on shard 0m mutating the token_metdata into a temporary pending copy and cloning it on all other shards. Only then, when all shard succeeded, set the modified token_metadata on all shards. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-02 20:07:17 +03:00
Benny Halevy	9cce01a12c	locator: endpoint_dc_rack: refactor default_location Refactor the thread_local default_location out of topology::get_location so it can be used elsewhere. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-02 20:06:53 +03:00
Benny Halevy	5ba5371631	locator: endpoint_dc_rack: define default operator== and get rid of the ad-hoc implementation in network_topology_strategy. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-02 20:06:52 +03:00
Avi Kivity	6aa91c13c5	Merge 'Optimize topology::compare_endpoints' from Benny Halevy The code for compare_endpoints originates at the dawn of time (`bc034aeaec`) and is called on the fast path from storage_proxy via `sort_by_proximity`. This series considerably reduces the function's footprint by: 1. carefully coding the many comparisons in the function so to reduce the number of conditional banches (apparently the compiler isn't doing a good enough job at optimizing it in this case) 2. avoid sstring copy in topology::get_{datacenter,rack} Closes #12761 * github.com:scylladb/scylladb: topology: optimize compare_endpoints to_string: add print operators for std::{weak,partial}_ordering utils: to_sstring: deinline std::strong_ordering print operator move to_string.hh to utils/ test: network_topology: add test_topology_compare_endpoints	2023-03-07 15:17:19 +02:00
Kefu Chai	c5d1a69859	build: cmake: link couple libraries as whole archive turns out we are using static variables to register entries in global registries, and these variables are not directly referenced, so linker just drops them when linking the executables or shared libraries. to address this problem, we just link the whole archive. another option would be create a linker script or pass --undefined=<symbol> to linker. neither of them is straightforward. a helper function is introduced to do this, as we cannot use CMake 3.24 as yet. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-04 13:11:25 +08:00
Kefu Chai	563fbb2d11	build: cmake: extract more subsystem out into its own CMakeLists.txt namely, cdc, compaction, dht, gms, lang, locator, mutation_writer, raft, readers, replica, service, tools, tracing and transport. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-02 10:15:25 +08:00
Kefu Chai	410035f03d	abstract_replication_strategy: remove unnecessary `virtual` specifier `effective_replication_map` is not a base class of any other class. so there is no need to mark any of its member function as `virtual`. this change should address following waring from Clang: ``` /home/kefu/dev/scylladb/seastar/include/seastar/core/shared_ptr.hh:205:9: error: delete called on non-final 'locator::effective_replication_map' that has virtual functions but non-virtual destructor [-Werror,-Wdelete-non-abstract-non-virtual-dtor] delete value_ptr; ^ /home/kefu/dev/scylladb/seastar/include/seastar/core/shared_ptr.hh:202:9: note: in instantiation of member function 'seastar::internal::lw_shared_ptr_accessors_esft<locator::effective_replication_map>::dispose' requested here dispose(static_cast<T*>(counter)); ^ /home/kefu/dev/scylladb/seastar/include/seastar/core/shared_ptr.hh:317:27: note: in instantiation of member function 'seastar::internal::lw_shared_ptr_accessors_esft<locator::effective_replication_map>::dispose' requested here accessors<T>::dispose(_p); ^ /home/kefu/dev/scylladb/locator/abstract_replication_strategy.hh:263:12: note: in instantiation of member function 'seastar::lw_shared_ptr<locator::effective_replication_map>::~lw_shared_ptr' requested here return make_lw_shared<effective_replication_map>(std::move(rs), std::move(tmptr), std::move(replication_map), replication_factor); ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12992	2023-02-26 19:16:28 +02:00
Kefu Chai	0cb842797a	treewide: do not define/capture unused variables these warnings are found by Clang-17 after removing `-Wno-unused-lambda-capture` and '-Wno-unused-variable' from the list of disabled warnings in `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-15 22:57:18 +02:00
Benny Halevy	bb36237cf4	topology: optimize compare_endpoints This function is called on the fast data path from storage_proxy when sorting multiple endpoints by proximity. This change calculates numeric node diff metrics based on each address proximity to a given node (by <dc, rack, same node>) to eliminate logic branches in the function and reduce its footprint. based on objdump -d output, compare_endpoints footprint was reduced by 58.5% (3632 / 8752 bytes) with clang version 15.0.7 (Fedora 15.0.7-1.fc37) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-15 11:48:24 +02:00
Benny Halevy	e7af35a64d	test: network_topology: add test_topology_compare_endpoints Add a regression unit test for topology::compare_endpoint before it's optimized in the following patches. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-15 11:09:02 +02:00
Benny Halevy	b2d3c1fcc2	abstract_replication_strategy: add for_each_natural_endpoint_until Currently, effective_replication_map::do_get_ranges accepts a functor that traverses the natural endpoints of each token to decide whether a token range should be returned or not. This is done by copying the natural endpoints vector for each token. However, other than special strategies like everywhere and local, the functor can be called on the precalculated inet_address_vector_replica_set in the replication_map and there's no need to copy it for each call. for_each_natural_endpoint_until passes a reference to the function down to the abstract replication strategy to let it work either on the precalculated inet_address_vector_replica_set or on a ad-hoc vector prepared by the replication strategy. The function returns stop_iteration::yes when a match or mismatch are found, or stop_iteration::no while it has no definite result. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12737	2023-02-13 16:30:24 +02:00
Michał Chojnowski	5edf965526	locator: token_metadata: unify get_address_ranges() and get_ranges() get_address_ranges() and get_ranges() perform almost the same computation. They return the same ranges -- the only difference is that get_address_ranges() returns them in unspecified order, while get_ranges() returns them in sorted order. Therefore the result of get_ranges() is also a valid result for get_address_ranges(), and the two functions can be unified to avoid code duplication. This patch does just that.	2023-02-04 22:55:08 +01:00
Michał Chojnowski	9e57b21e0c	locator: token_metadata: get rid of a quadratic behaviour in get_address_ranges() Some callees of update_pending_ranges use the variant of get_address_ranges() which builds a hashmap of all <endpoint, owned range> pairs. For everywhere_topology, the size of this map is quadratic in the number of endpoints, making it big enough to cause contiguous allocations of tens of MiB for clusters of realistic size, potentially causing trouble for the allocator (as seen e.g. in #12724). This deserves a correction. This patch removes the quadratic variant of get_address_ranges() and replaces its uses with its linear counterpart. Refs #10337 Refs #10817 Refs #10836 Refs #10837 Fixes #12724	2023-02-04 22:38:04 +01:00
Kefu Chai	ccc03dd1ec	cql3, locator: call fmt::format_to() explicitly since format_to() is defined included by both fmt and std namepaces, without specifying which one to use, we'd fail to build with the standard library which implements std::format_to(). yes, we are `using namespace std` somewhere. this change should address the FTBFS with GCC-13. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-01-30 21:50:11 +08:00
Pavel Emelyanov	f6e8b64334	snitch: Use set_my_dc_and_rack() on all shards Most of snitch drivers set _my_dc and _my_rack with direct assignment thus skipping the sanity checks for dc/rack being empty. On other shards they call set_my_dc_and_rack() helper which warns the empty value and replaces it with some defaults. It's better to use the helper on all shards in order to have the same dc/rack values everywhere. refs: #12185 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12524	2023-01-24 14:17:06 +02:00
Pavel Emelyanov	e20738cd7d	azure_snitch: Handle empty zone returned from IMDS Azure metadata API may return empty zone sometimes. If that happens shard-0 gets empty string as its rack, but propagates UNKNOWN_RACK to other shards. Empty zones response should be handled regardless. refs: #12185 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12274	2023-01-09 11:57:45 +02:00
Gleb Natapov	6f104982e1	topology: use std::erase_if on std::map instead of ad-hoc loop There is std::erase_if since c++20. We can use it here. Message-Id: <20221228144944.3299711-6-gleb@scylladb.com>	2023-01-02 11:45:52 +02:00
Pavel Emelyanov	d561495f0d	Merge 'topology: get rid of pending state' from Benny Halevy Now, with `a44ca06906`, is_normal_token_owner that replaced is_member does not rely anymore on the pending status of endpoints in topology. With that we can get rid of this state and just keep all endpoints we know about in the topology. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12294 * github.com:scylladb/scylladb: topology: get rid of pending state topology: debug log update and remove endpoint	2022-12-14 19:28:35 +03:00
Pavel Emelyanov	fe4cf231bc	snitch: Check http response codes to be OK Several snitch drivers make http requests to get region/dc/zone/rack/whatever from the cloud provider. They blindly rely on the response being successfull and read response body to parse the data they need from. That's not nice, add checks for requests finish with http OK statuses. refs: #12185 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12287	2022-12-13 14:49:18 +02:00
Benny Halevy	68141d0aac	topology: get rid of pending state Now, with `a44ca06906`, is_normal_token_owner that replaced is_member does not rely anymore on the pending status of endpoints in topology. With that we can get rid of this state and just keep all endpoints we know about in the topology. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-13 14:17:18 +02:00
Benny Halevy	f2753eba30	topology: debug log update and remove endpoint Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-13 14:17:13 +02:00
Michał Jadwiszczak	dd46a92e23	storage_service, locator: extract describe_ring() `describe_ring()` was implemented as a method of `storage_service`. This patch extracts it from there to a standalone helper function in `locator/util.hh`.	2022-12-10 12:51:05 +01:00
Pavel Emelyanov	ae79669fd2	topology: Be less restrictive about missing endpoints Recent changes in topology restricted the get_dc/get_rack calls. Older code was trying to locate the endpoint in gossiper, then in system keyspace cache and if the endpoint was not found in both -- returned "default" location. New code generates internal error in this case. This approach already helped to spot several BUGs in code that had been eventually fixed, but echoes of that change still pop up. This patch relaxes the "missing endpoint" case by printing a warning in logs and returning back the "default" location like old code did. tests: update_cluster_layout_tests.py::* hintedhandoff_additional_test.py::TestHintedHandoff::test_hintedhandoff_rebalance bootstrap_test.py::TestBootstrap::test_decommissioned_wiped_node_can_join bootstrap_test.py::TestBootstrap::test_failed_bootstap_wiped_node_can_join materialized_views_test.py::TestMaterializedViews::test_decommission_node_during_mv_insert_4_nodes refs: #11900 refs: #12054 fixes: #11870 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12067	2022-11-28 22:01:09 +02:00
Benny Halevy	996eac9569	topology: add get_datacenters Returns an unordered set of datacenter names to be used by network_topology_replication_strategy and for ks_prop_defs. The set is kept in sync with _dc_endpoints. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12023	2022-11-23 18:39:36 +02:00
Asias He	4571fcf9e7	token_metadata: Rename is_member to is_normal_token_owner The name is_normal_token_owner is more clear than is_member. The is_normal_token_owner reflects what it really checks.	2022-11-18 09:29:20 +08:00
Asias He	965097cde5	token_metadata: Add docs for is_member Make it clear, is_member checks if a node is part of the token ring and checks nothing else.	2022-11-18 09:28:56 +08:00
Asias He	a495b71858	token_metadata: Do not use topology info for is_member check Since commit `a980f94` (token_metadata: impl: keep the set of normal token owners as a member), we have a set, _normal_token_owners, which contains all the nodes in the ring. We can use _normal_token_owners to check if a node is part of the ring directly instead of going through the _toplogy indirectly. Fixes #11935	2022-11-18 09:28:56 +08:00
Asias He	f2ca790883	token_metadata: Check node is part of the topology instead of the ring update_normal_tokens is the way to add a new node into the ring. We should not require a new node to already be in the ring to be able to add it to the ring. The current code works accidentally because is_member is checking if a node is in the topology We should use _topology.has_endpoint to check if a node is part of the topology explicitly.	2022-11-18 09:28:56 +08:00
Pavel Emelyanov	a396c27efc	Merge 'message: messaging_service: fix topology_ignored for pending endpoints in get_rpc_client' from Kamil Braun `get_rpc_client` calculates a `topology_ignored` field when creating a client which says whether the client's endpoint had topology information when this client was created. This is later used to check if that client needs to be dropped and replaced with a new client which uses the correct topology information. The `topology_ignored` field was incorrectly calculated as `true` for pending endpoints even though we had topology information for them. This would lead to unnecessary drops of RPC clients later. Fix this. Remove the default parameter for `with_pending` from `topology::has_endpoint` to avoid similar bugs in the future. Apparently this fixes #11780. The verbs used by decommission operation use RPC client index 1 (see `do_get_rpc_client_idx` in message/messaging_service.cc). From local testing with additional logging I found that by the time this client is created (i.e. the first verb in this group is used), we already know the topology. The node is pending at that point - hence the bug would cause us to assume we don't know the topology, leading us to dropping the RPC client later, possibly in the middle of a decommission operation. Fixes: #11780 Closes #11942 * github.com:scylladb/scylladb: message: messaging_service: check for known topology before calling is_same_dc/rack test: reenable test_topology::test_decommission_node_add_column test/pylib: util: configurable period in wait_for message: messaging_service: fix topology_ignored for pending endpoints in get_rpc_client message: messaging_service: topology independent connection settings for GOSSIP verbs	2022-11-17 20:14:32 +03:00
Avi Kivity	76be6402ed	Merge 'repair: harden effective replication map' from Benny Halevy As described in #11993 per-shard repair_info instances get the effective_replication_map on their own with no centralized synchronization. This series ensures that the effective replication maps used by repair (and other associated structures like the token metadata and topology) are all in sync with the one used to initiate the repair operation. While at at, the series includes other cleanups in this area in repair and view that are not fixes as the calls happen in synchronous functions that do not yield. Fixes #11993 Closes #11994 * github.com:scylladb/scylladb: repair: pass erm down to get_hosts_participating_in_repair and get_neighbors repair: pass effective_replication_map down to repair_info repair: coroutinize sync_data_using_repair repair: futurize do_repair_start effective_replication_map: add global_effective_replication_map shared_token_metadata: get_lock is const repair: sync_data_using_repair: require to run on shard 0 repair: require all node operations to be called on shard 0 repair: repair_info: keep effective_replication_map repair: do_repair_start: use keyspace erm to get keyspace local ranges repair: do_repair_start: use keyspace erm for get_primary_ranges repair: do_repair_start: use keyspace erm for get_primary_ranges_within_dc repair: do_repair_start: check_in_shutdown first repair: get_db().local() where needed repair: get topology from erm/token_metdata_ptr view: get_view_natural_endpoint: get topology from erm	2022-11-17 13:29:02 +02:00
Pavel Emelyanov	2add9ba292	Merge 'Refactor topology out of token_metadata' from Benny Halevy This series moves the topology code from locator/token_metadata.{cc,hh} out to localtor/topology.{cc,hh} and introduces a shared header file: locator/types.hh contains shared, low level definitions, in anticipation of https://github.com/scylladb/scylladb/pull/11987 While at it, the token_metadata functions are turned into coroutines and topology copy constructor is deleted. The copy functionality is moved into an async `clone_gently` function that allows yielding while copying the topology. Closes #12001 * github.com:scylladb/scylladb: locator: refactor topology out of token_metadata locator: add types.hh topology: delete copy constructor token_metadata: coroutinize clone functions	2022-11-17 13:55:34 +03:00
Benny Halevy	1e2741d2fe	abstract_replication_strategy: recognized_options: return unordered_set An unordered_set is more efficient and there is no need to return an ordered set for this purpose. This change facilitates a follow-up change of adding topology::get_datacenters(), returning an unordered_set of datacenter names. Refs #11987 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12003	2022-11-17 11:27:05 +02:00
Benny Halevy	4b9269b7e2	effective_replication_map: add global_effective_replication_map Class to hold a coherent view of a keyspace effective replication map on all shards. To be used in a following patch to pass the sharded keyspace e_r_m:s to repair. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 08:07:01 +02:00
Benny Halevy	2c677e294b	shared_token_metadata: get_lock is const The lock is acquired using an a function that doesn't modify the shared_token_metadata object. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	d0bd305d16	locator: refactor topology out of token_metadata Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-16 21:55:54 +02:00
Benny Halevy	297a4de4e4	locator: add types.hh To export low-level types that are used by oher modules for the locator interfaces. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-16 21:53:05 +02:00
Benny Halevy	0c94ffcc85	topology: delete copy constructor Topology is copied only from token_metadata_impl::clone_only_token_map which copies the token_metadata_impl with yielding to prevent reactor stalls. This should apply to topology as well, so add a clone_gently function for cloning the topology from token_metadata_impl::clone_only_token_map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-16 15:27:28 +02:00
Benny Halevy	4f4fc7fe22	token_metadata: coroutinize clone functions Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-16 15:27:28 +02:00
Kamil Braun	1bd2471c19	message: messaging_service: fix topology_ignored for pending endpoints in get_rpc_client `get_rpc_client` calculates a `topology_ignored` field when creating a client which says whether the client's endpoint had topology information when topology was created. This is later used to check if that client needs to be dropped and replaced with a new client which uses the correct topology information. The `topology_ignored` field was incorrectly calculated as `true` for pending endpoints even though we had topology information for them. This would lead to unnecessary drops of RPC clients later. Fix this. Remove the default parameter for `with_pending` from `topology::has_endpoint` to avoid similar bugs in the future. Apparently this fixes #11780. The verbs used by decommission operation use RPC client index 1 (see `do_get_rpc_client_idx` in message/messaging_service.cc). From local testing with additional logging I found that by the time this client is created (i.e. the first verb in this group is used), we already know the topology. The node is pending at that point - hence the bug would cause us to assume we don't know the topology, leading us to dropping the RPC client later, possibly in the middle of a decommission operation. Fixes: #11780	2022-11-16 14:01:50 +01:00
Botond Dénes	cbf9be9715	Merge 'Avoid 0.0.0.0 (and :0) as preferred IP' from Pavel Emelyanov Despite docs discourage from using INADDR_ANY as listen address, this is not disabled in code. Worse -- some snitch drivers may gossip it around as the INTERNAL_IP state. This set prevents this from happening and also adds a sanity check not to use this value if it somehow sneaks in. Closes #11846 * github.com:scylladb/scylladb: messaging_service: Deny putting INADD_ANY as preferred ip messaging_service: Toss preferred ip cache management gossiping_property_file_snitch: Dont gossip INADDR_ANY preferred IP gossiping_property_file_snitch: Make _listen_address optional	2022-11-16 08:30:42 +02:00
Benny Halevy	ff5527deb1	topology: copy _sort_by_proximity in copy constructor Fixes #11962 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11965	2022-11-14 13:59:56 +03:00
Benny Halevy	b74807cb8a	locator: token_metadata: add parse_host_id_and_endpoint To be used for specifying nodes either by their host_id or ip address and using the token_metadata to resolve the mapping. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-28 07:38:13 +03:00

1 2 3 4 5 ...

648 Commits