scylla

Author	SHA1	Message	Date
Petr Gusev	ef534ac876	rebuild_with_repair, replace_with_repair: use new token_metadata Just mechanical changes to the new token_metadata. All the boost and topology tests pass with this change.	2023-12-12 23:19:53 +04:00
Petr Gusev	93263bf9e7	bootstrap: use new token_metadata Just mechanical changes to the new token_metadata. All the boost and topology tests pass with this change.	2023-12-12 23:19:53 +04:00
Petr Gusev	d9283bd025	tablets: switch to token_metadata2 locator_topology_test, network_topology_strategy_test and tablets_test are fully switched to the host_id-based token_metadata, meaning they no longer populate the old token_metadata. All the boost and topology tests pass with this change.	2023-12-12 23:19:53 +04:00
Petr Gusev	f5038f6c72	calculate_effective_replication_map: use new token_metadata In this commit we switch the function calculate_effective_replication_map to use the new token_metadata. We do this by employing our new helper calculate_natural_ips function. We can't use this helper for current_endpoints/target_endpoints though, since in that case we won't add the IP to the pending_endpoints in the replace-with-same-ip scenario The token_metadata_test is migrated to host_ids in the same commit to make it pass. Other tests work because they fill both versions of the token_metadata, but for this test it was simpler to just migrate it straight away. The test constructs the old token_metadata over the new token_metadata, this means only the get_new() method will work on it. That's why we also need to switch some other functions (maybe_remove_node_being_replaced, do_get_natural_endpoints, get_replication_factor) to the new version in the same commit. All the boost and topology tests pass with this change.	2023-12-12 23:19:53 +04:00
Petr Gusev	fe3c543c4e	calculate_natural_endpoints: fix formatting	2023-12-12 23:19:53 +04:00
Petr Gusev	d5b4b02b28	abstract_replication_strategy: calculate_natural_endpoints: make it work with both versions of token_metadata We've updated all the places where token_metadata is mutated, and now we can progress to the next stage of the refactoring - gradually switching the read code paths. The calculate_natural_endpoints function is at the core of all of them. It decides to what nodes the given token should be replicated to for the given token_metadata. It has a lot of usages in various contexts, we can't switch them all in one commit, so instead we allowed the function to behave in both ways. If use_host_id parameter is false, the function uses the provided token_metadata as is and returns endpoint_set as a result. If it's true, it uses get_new() on the provided token_metadata and returns host_id_set as a result. The scope of the whole refactoring is limited to the erm data structure, its interface will be kept inet_address based for now. This means we'll often need to resolve host_ids to inet_address-es as soon as we got a result from calculated_natural_endpoints. A new calculate_natural_ips function is added for convenience. It uses the new token_metadata and immediately resolves returned host_id-s to inet_address-es. The auxiliary declarations natural_ep_type, set_type, vector_type, get_self_id, select_tm are introduced only for the sake of migration, they will be removed later.	2023-12-12 23:19:53 +04:00
Petr Gusev	e4253776a1	locator::topology: allow being_replaced and replacing nodes to have the same IP When we're replacing a node with the same IP address, we want the following behavior: * host_id -> IP mapping should work and return the same IP address for two different host_ids - old and new. * the IP -> host_id mapping should return the host_id of the old (replaced) host. This variant is most convenient for preserving the current behavior of the code, especially the functions maybe_remove_node_being_replaced, erm::get_natural_endpoints_without_node_being_replaced, erm::get_pending_endpoints. The 'being_replaced' node will be properly removed in maybe_remove_node_being_replaced and 'replacing' node will be added to the pending_endpoints.	2023-12-11 12:51:34 +04:00
Petr Gusev	5a1418fdba	token_metadata: get_endpoint_for_host_id -> get_endpoint_for_host_id_if_known This commit fixes an inconsistency in method names: get_host_id and get_host_id_if_known are (internal_error, returns null), but there was only one method for the opposite conversion - get_endpoint_for_host_id, and it returns null. In this commit we change it to on_internal_error if it can't find the argument and add another method get_endpoint_for_host_id_if_known which returns null in this case. We can't use get_endpoint_for_host_id/get_host_id in host_id_or_endpoint::resolve since it's called from storage_service::parse_node_list -> token_metadata::parse_host_id_and_endpoint, and exceptions are caught and handled in `storage_service::parse_node_list`.	2023-12-11 12:51:34 +04:00
Petr Gusev	08b47d645a	token_metadata: get_host_id: exception -> on_internal_error It's a bug to use get_host_id on a non-existent endpoint, so on_internal_error is more appropriate. Also, it's easier to debug since it provides a backtrace. If a missing inet_address is expected, get_host_id_if_known should be used instead. We update one such case in storage_service::force_remove_completion. Other usages of get_host_id are correct.	2023-12-11 12:51:34 +04:00
Petr Gusev	39bbe5f457	token_metadata: add get_all_ips method This is convenient for migrating code that uses get_all_endpoints.	2023-12-11 12:51:34 +04:00
Petr Gusev	9edf0709e6	token_metadata: support host_id-based version In this commit we enhance token_metadata with a pointer to the new host_id-based generic_token_metadata specialisation (token_metadata2). The idea is that in the following commits we'll go over all token_metadata modifications and make the corresponding modifications to its new host_id-based alternative. The pointer to token_metadata2 is stored in the generic_token_metadata::_new_value field. The pointer can be mutable, immutable, or absent altogether (std::monostate). It's mutable if this generic_token_metadata owns it, meaning it was created using the generic_token_metadata(config cfg) constructor. It's immutable if the generic_token_metadata(lw_shared_ptr<const token_metadata2> new_value); constructor was used. This means this old token_metadata is a wrapper for new token_metadata and we can only use the get_new() method on it. The field _new_value is empty for the new host_id-based token_metadata version. The generic_token_metadata(std::unique_ptr<token_metadata_impl<NodeId>> impl, token_metadata2 new_value); constructor is used for clone methods. We clone both versions, and we need to pass a cloned token_metadata2 into constructor. There are two overloads of get_new, for mutable and immutable generic_token_metadata. Both of them throws an exception if they can't get the appropriate pointer. There is also a get_new_strong method, which returns an immutable owning pointer. This is convenient since a lot of API's want an owning pointer. We can't make the get_new/get_new_strong API simpler and use get_new_strong everywhere since it mutate the original generic_token_metadata by incrementing the reference counter and this causes raises when it's passed between shards in replicate_to_all_cores.	2023-12-11 12:51:34 +04:00
Petr Gusev	63f64f3303	token_metadata: make it a template with NodeId=inet_address/host_id NodeId is used in all internal token_metadata data structures, that previously used inet_address. We choose topology::key_kind based on the value of the template parameter. generic_token_metadata::update_topology overload with host_id parameter is added to make update_topology_change_info work, it now uses NodeId as a parameter type. topology::remove_endpoint(host_id) is added to make generic_token_metadata::remove_endpoint(NodeId) work. pending_endpoints_for and endpoints_for_reading are just removed - they are not used and not implemented. The declarations were left by mistake from a refactoring in which these methods were moved to erm. generic_token_metadata_base is extracted to contain declarations, common to both token_metadata versions. Templates are explicitly instantiated inside token_metadata.cc, since implementation part is also a template and it's not exposed to the header. There are no other behavioral changes in this commit, just syntax fixes to make token_metadata a template.	2023-12-11 12:51:34 +04:00
Petr Gusev	c9fbe3d377	locator: make dc_rack_fn a template In the next commits token_metadata will be made a template with NodeId=inet_address\|host_id parameter. This parameter will be passed to dc_rack_fn function, so it also should be made a template.	2023-12-11 12:51:33 +04:00
Piotr Dulikowski	5227b71363	locator/topology: add key_kind parameter For the host_id-based token_metadata we want host_id to be the main node key, meaning it should be used in add_or_update_endpoint to find the node to update. For the inet_address-based token_metadata version we want to retain the old behaviour during transition period. In this commit we introduce key_kind parameter and use key_kind::inet_address in all current topology usages. Later we'll use key_kind::host_id for the new token_metadata. In the last commits of the series, when the new token_metadata version is used everywhere, we will remove key_kind enum.	2023-12-11 12:51:33 +04:00
Petr Gusev	2f137776c3	token_metadata: topology_change_info: change field types to token_metadata_ptr In subsequent commits we'll need the following api for token_metadata: token_metadata(token_metadata2_ptr); get_new() -> token_metadata2* where token_metadata2 is the new version of token_metadata, based on host_id. In other words: * token_metadata knows the new version of itself and returns a pointer to it through get_new() * token_metadata can be constructed based solely on the new version, without its own implementation. In this case the only method we can use on it is get_new. This allows to pass token_metadata2 to API's with token_metadata in method signature, if these APIs are known to only use the get_new method on the passed token_metadata. And back to topology_change_info - if we got it from the new token_metadata we want to be able to construct token_metadata from token_metadata2 contained in it, and this requires it to be a ptr, not value.	2023-12-11 12:51:33 +04:00
Petr Gusev	f21f23483c	token_metadata: drop unused method get_endpoint_to_token_map_for_reading	2023-12-11 12:51:22 +04:00
Alexander Turetskiy	f30b5473ab	cql: Reject empty options while altering a keyspace Reject ALTER KEYSPACE request for NetworkTopologyStrategy when replication options are missed. Also reject CREATE KEYSPACE with no replication factor options. Cassandra has a default_keyspace_rf configuration that may allow such CREATE KEYSPACE commands, but Scylla doesn't have this option (refs #16028). fixes #10036 Closes scylladb/scylladb#16221	2023-12-10 17:44:35 +02:00
Avi Kivity	9c0f05efa1	Merge 'Track tablet streaming under global sessions to prevent side-effects of failed streaming' from Tomasz Grabiec Tablet streaming involves asynchronous RPCs to other replicas which transfer writes. We want side-effects from streaming only within the migration stage in which the streaming was started. This is currently not guaranteed on failure. When streaming master fails (e.g. due to RPC failing), it can be that some streaming work is still alive somewhere (e.g. RPC on wire) and will have side-effects at some point later. This PR implements tracking of all operations involved in streaming which may have side-effects, which allows the topology change coordinator to fence them and wait for them to complete if they were already admitted. The tracking and fencing is implemented by using global "sessions", created for streaming of a single tablet. Session is globally identified by UUID. The identifier is assigned by the topology change coordinator, and stored in system.tablets. Sessions are created and closed based on group0 state (tablet metadata) by the barrier command sent to each replica, which we already do on transitions between stages. Also, each barrier waits for sessions which have been closed to be drained. The barrier is blocked only if there is some session with work which was left behind by unsuccessful streaming. In which case it should not be blocked for long, because streaming process checks often if the guard was left behind and stops if it was. This mechanism of tracking is fault-tolerant: session id is stored in group0, so coordinator can make progress on failover. The barriers guarantee that session exists on all replicas, and that it will be closed on all replicas. Closes scylladb/scylladb#15847 * github.com:scylladb/scylladb: test: tablets: Add test for failed streaming being fenced away error_injection: Introduce poll_for_message() error_injection: Make is_enabled() public api: Add API to kill connection to a particular host range_streamer: Do not block topology change barriers around streaming range_streamer, tablets: Do not keep token metadata around streaming tablets: Fail gracefully when migrating tablet has no pending replica storage_service, api: Add API to disable tablet balancing storage_service, api: Add API to migrate a tablet storage_service, raft topology: Run streaming under session topology guard storage_service, tablets: Use session to guard tablet streaming tablets: Add per-tablet session id field to tablet metadata service: range_streamer: Propagate topology_guard to receivers streaming: Always close the rpc::sink storage_service: Introduce concept of a topology_guard storage_service: Introduce session concept tablets: Fix topology_metadata_guard holding on to the old erm docs: Document the topology_guard mechanism	2023-12-07 16:29:02 +02:00
Tomasz Grabiec	d1c1b59236	storage_service, api: Add API to disable tablet balancing Load balancing needs to be disabled before making a series of manual migrations so that we don't fight with the load balancer. Also will be used in tests to ensure tablets stick to expected locations.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	1f57d1ea28	storage_service, api: Add API to migrate a tablet Will be used in tests, or for hot fixes in production.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	5381792401	tablets: Add per-tablet session id field to tablet metadata range_streamer will pick it up when creating topology_guard. It's materialized in memory only for migrating tablets in tablet_transition_info.	2023-12-06 18:36:17 +01:00
Botond Dénes	d2a88cd8de	Merge 'Typos: fix typos in code' from Yaniv Kaul Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors. Refs: https://github.com/scylladb/scylladb/issues/16255 Closes scylladb/scylladb#16289 * github.com:scylladb/scylladb: Update unified/build_unified.sh Update main.cc Update dist/common/scripts/scylla-housekeeping Typos: fix typos in code	2023-12-06 07:36:41 +02:00
Yaniv Kaul	ae2ab6000a	Typos: fix typos in code Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors. Refs: https://github.com/scylladb/scylladb/issues/16255	2023-12-05 15:18:11 +02:00
Benny Halevy	4d461fc788	locator: replication strategies: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	86716b2048	locator: topology: add helpers to retrieve this host_id and address And respective `is_me()` predicates, to prepare for getting rid of fb_utilities. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	52412087b7	snitch: pass broadcast_address in snitch_config To untangle snitch from fb_utilities. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	94fc8e2a9a	snitch: add optional get_broadcast_address method and set broadcast_address / broadcast_rpc_address in main to remove this dependency of snitch on fb_utilities. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	1d0e71308b	locator: ec2_multi_region_snitch: keep local public address as member To be used in the next patch to retrieve the broadcast_address. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	90af71ffa7	ec2_multi_region_snitch: reindent load_config Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	fecb597ad6	ec2_multi_region_snitch: coroutinize load_config Now that ec2_snitch::load_config is a coroutine there's no need for a seastar thread here either. Refs #16241 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	cb7e096a59	ec2_snitch: reindent load_config Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	1c1a048d3f	ec2_snitch: coroutinize load_config Fixes #16241 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:48 +02:00
Yaniv Kaul	c658bdb150	Typos: fix typos in comments Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2023-12-02 22:37:22 +02:00
Tomasz Grabiec	b06a0078fb	Merge 'Support for sending tablet info to the drivers' from Sylwia Szunejko There is a need for sending tablet info to the drivers so they can be tablet aware. For the best performance we want to get this info lazily only when it is needed. The info is send when driver asks about the information that the specific tablet contains and it is directed to the wrong node/shard so it could use that information for every subsequent query. If we send the query to the wrong node/shard, we want to send the RESULT message with additional information about the tablet (replicas and token range) in custom_payload. Mechanism for sending custom_payload added. Sending custom_payload tested using three node cluster and cqlsh queries. I used RF=1 so choosing wrong node was testable. I also manually tested it with the python-driver and confirmed that the tablet info can be deserialized properly. Automatic tests added. Closes scylladb/scylladb#15410 * github.com:scylladb/scylladb: docs: add documentation about sending tablet info to protocol extensions Add tests for sending tablet info cql3: send tablet if wrong node/shard is used during modification statement cql3: send tablet if wrong node/shard is used during select statement locator: add function to check locality locator: add function to check if host is local transport: add function to add tablet info to the result_message transport: add support for setting custom payload	2023-11-22 17:44:07 +02:00
sylwiaszunejko	954d51389c	locator: add function to check locality	2023-11-22 09:23:43 +01:00
sylwiaszunejko	a0c8531875	locator: add function to check if host is local	2023-11-21 15:15:20 +01:00
Kefu Chai	efd65aebb2	build: cmake: add check-header target to have feature parity with `configure.py`. we won't need this once we migrate to C++20 modules. but before that day comes, we need to stick with C++ headers. we generate a rule for each .hh files to create a corresponding .cc and then compile it, in order to verify the self-containness of that header. so the number of rule is quite large, to avoid the unnecessary overhead. the check-header target is enabled only if `Scylla_CHECK_HEADERS` option is enabled. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15913	2023-11-13 10:27:06 +02:00
Benny Halevy	a1acf6854b	everywhere: reduce dependencies on i_partitioner.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-05 20:47:44 +02:00
Benny Halevy	6de1cc2993	locator: resolve the dependency of token_metadata.hh on token_range_splitter.hh define token_metadata_ptr in token_metadata_fwd.hh So that the declaration of `make_splitter` can be moved to token_range_splitter.hh, where it belongs, and so token_metadata.hh won't have to include it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-05 20:01:29 +02:00
Tomasz Grabiec	d5539e080d	tablets: Implement cleanup step This change adds a stub for tablet cleanup on the replica side and wires it into the tablet migration process. The handling on replica side is incomplete because it doesn't remove the actual data yet. It only flushes the memtables, so that all data is in sstables and none requires a memtable flush. This patch is necessary to make decommission work. Otherwise, a memtable flush would happen when the decommissioned node is put in the drained state (as in nodetool drain) and it would fail on missing host id mapping (node is no longer in topology), which is examined by the tablet sharder when producing sstable sharding metadata. Leading to abort due to failed memtable flush.	2023-09-14 12:45:10 +02:00
Tomasz Grabiec	6a62aca3a9	locator: Introduce tablet_metadata_guard Will be used to synchronize long-running tablet operations with topology coordinator. It blocks barriers like erm_ptr, but refreshes if change is irrelevant, so behaves as if the erm_ptr's scope was narrowed down to a single tablet.	2023-09-14 12:45:10 +02:00
Tomasz Grabiec	532ec84210	locator, replica: Add a way to wait for table's effective_replication_map change	2023-09-14 12:08:54 +02:00
Benny Halevy	7119c1d8cc	token_metadata: update_topology: make endpoint_dc_rack arg optional It's better to pass a disengaged optional when the caller doesn't have the information rather than passing the default dc_rack location so the latter will never implicitly override a known endpoint dc/rack location. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #15300	2023-09-11 16:16:19 +02:00
Botond Dénes	b062b245ad	Merge 'Don't cache dc:rack on system keyspace local cache' from Pavel Emelyanov The local node's dc:rack pair is cached on system keyspace on start. However, most of other code don't need it as they get dc:rack from topology or directly from snitch. There are few places left that still mess with sysks cache, but they are easy to patch. So after this patch all the core code uses two sources of dc:rack -- topology / snitch -- instead of three. Closes #15280 * github.com:scylladb/scylladb: system_keyspace: Don't require snitch argument on start system_keyspace: Don't cache local dc:rack pair system_keyspace: Save local info with explicit location storage_service: Get endpoint location from snitch, not system keyspace snitch: Introduce and use get_location() method repair: Local location variables instead of system keyspace's one repair: Use full endpoint location instead of datacenter part	2023-09-11 10:26:26 +03:00
Benny Halevy	574c7e349a	locator: topology: is_configured_this_node: delete spurious semicolumn Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-06 12:24:09 +03:00
Benny Halevy	115462be17	locator: topology: is_configured_this_node: compare host_id first Since `5d1f60439a` we have this node's host_id in topology config, so it can be used to determine this node when adding it. Prepare for extending the token_metadata interface to provide host_id in update_topology. We would like to compare the host_id first to be able to distinguish this node from a node we're replacing that may have the same ip address (but different host_id). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-06 12:24:09 +03:00
Pavel Emelyanov	d2bd203cba	snitch: Introduce and use get_location() method There are some places out there that generate locator::endpoint_dc_rack pair out of snitch's get_datacenter() and get_rack() calls. Generalize those with snitch's new method. It will also be used by next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-05 12:52:30 +03:00
Benny Halevy	5afc242814	token_metadata: get_endpoint_to_host_id_map_for_reading: just inform that normal node has null host_id It is too early to require that all nodes in normal state have a non-null host_id. The assertion was added in `44c14f3e2b` but unfortunately there are several call sites where we add the node as normal, but without a host_id and we patch it in later on. In the future we should be able to require that once we identify nodes by host_id over gossiper and in token_metadata. Fixes scylladb/scylladb#15181 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #15184	2023-08-28 21:40:55 +03:00
Botond Dénes	e7af2a7de8	Merge 'token_metadata::get_endpoint_to_host_id_map_for_reading: restrict to token owners' from Benny Halevy And verify the they returned host_id isn't null. Call on_internal_error_noexcept in that case since all token owners are expected to have their host_id set. Aborting in testing would help fix issues in this area. Fixes scylladb/scylladb#14843 Refs scylladb/scylladb#14793 Closes #14844 * github.com:scylladb/scylladb: api: storage_service: improve description of /storage_service/host_id token_metadata: get_endpoint_to_host_id_map_for_reading: restrict to token owners	2023-08-23 13:55:14 +03:00
Kamil Braun	cdc3cd2b79	Merge 'raft: add fencing tests' from Petr Gusev In this PR a simple test for fencing is added. It exercises the data plane, meaning if it somehow happens that the node has a stale topology version, then requests from this node will get an error 'stale topology'. The test just decrements the node version manually through CQL, so it's quite artificial. To test a more real-world scenario we need to allow the topology change fiber to sometimes skip unavailable nodes. Now the algorithm fails and retries indefinitely in this case. The PR also adds some logs, and removes one seemingly redundant topology version increment, see the commit messages for details. Closes #14901 * github.com:scylladb/scylladb: test_fencing: add test_fence_hints test.py: output the skipped tests test.py: add skip_mode decorator and fixture test.py: add mode fixture hints: add debug log for dropped hints hints: send_one_hint: extend the scope of file_send_gate holder pylib: add ScyllaMetrics hints manager: add send_errors counter token_metadata: add debug logs fencing: add simple data plane test random_tables.py: add counter column type raft topology: don't increment version when transitioning to node_state::normal	2023-08-22 16:28:21 +02:00

1 2 3 4 5 ...

689 Commits