scylla

Author	SHA1	Message	Date
Anna Stuchlik	bca39b2a93	doc: remove Serverless from the Drivers page This commit removes the information about ScyllaDB Cloud Serverless, which is no longer valid. Closes scylladb/scylladb#16700	2024-01-15 15:36:51 +02:00
Botond Dénes	66bef6e961	cql3: cluster_describe_statement: don't produce range ownership for tablet keyspaces Tablet keyspaces have per/table range ownership, which cannot currently be expressed in a DESC CLUSTER statement, which describes range ownership in the current keyspace (if set). Until we figure out how to represent range ownership (tablets) of all tables of a keyspace, we disable range ownership for tablet keyspaces. Fixes: #16483 Closes scylladb/scylladb#16713	2024-01-15 14:03:54 +01:00
Patryk Wrobel	aec0db1b96	cql_auth_query_test.cc: do not rely on templated operator<< This change is intended to remove the dependency to operator<<(std::ostream&, const std::unordered_set<seastar::sstring>&) from test/boost/cql_auth_query_test.cc. It prepares the test for removal of the templated helpers. Such removal is one of goals of the referenced issue that is linked below. Refs: #13245 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#16758	2024-01-15 13:30:05 +02:00
Kefu Chai	ece2bd2f6e	service: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16764	2024-01-15 13:29:33 +02:00
Kefu Chai	fc97d91f1a	auth: add fmt::format for auth::resource and friends before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we * define a formatter for `auth::resource` and friends, * update their callers of `operator<<` to use `fmt::print()`. * drop `operator<<`, as they are not used anymore. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16765	2024-01-15 13:26:39 +02:00
Kefu Chai	f344e13066	types: add formatter for data_value before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for data_value, but its its operator<<() is preserved as we are still using the generic homebrew formatter for formatting std::vector, which in turn uses operator<< of the element type. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16767	2024-01-15 13:18:23 +02:00
Kefu Chai	218334eaf5	test/nodetool: use build/$CMAKE_BUILD_TYPE when appropriate because the CMake-generated build.ninja is located under build/, and it puts the `scylla` executable at build/$CMAKE_BUILD_TYPE/scylla, instead of at build/$scylla_build_mode/scylla, so let's adapt to this change accordingly. we will promote this change to a shared place if we have similar needs in other tests as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16775	2024-01-15 12:52:35 +02:00
Pavel Emelyanov	dd892b0d8a	code: Enable tablets if cluster feature is enabled If the TABLETS map is missing in the CREATE KEYSPACE statement the tablets are anyway enabled if the respective cluster feature is enabled. To opt-out keyspaces one may use TABLETS = { 'enabled': false } syntax. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	4838eeb201	test: Turn off tablets feature by default Next patches will make per-keyspace initial_tables option really optional and turn tablets ON when the feature is ON. This will break all other tests' assumptions, that they are testing vnodes replication. So turn the feature off by default, tests that do need tables will need to explicitly enable this feature on their own Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	ae7da54f88	test: Move test_tablet_drain_failure_during_decommission to another suite In its current location it will be started with 3 pre-created scylla nodes with default features ON. Next patch will exclude `tablets` from the default list, so the test needs to create servers on its own Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	46b36d8c07	test/tablets: Enable tables for real on test keyspace When started cql_test_env creates a test keyspace. Some tablets test cases create a table in this keyspace, but misuse the whole feature. The thing is that while tablets feature is ON in those test cases, the keyspace itself doesn _not_ have the initial_tables option and thus tablets are not enabled for the ks' table for real. Currently test cases work just because this table is only used as a transparent table ID placeholder. If turning on tablets for the keyspace, several test cases would get broken for two reasons. First, the tables map will no longer be empty on test start. Second, applying changes to tablet metadata may not be visible, becase test case uses "ranom" timestamp, that can be less that the initial metadata mutations' timestamp. This patch fixes all three places: 1. enables tables for the test keyspace 2. removes assumption that the initial metadata is empty 3. uses large enough timestamp for subsequent mutations Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	2376b699e0	test/tablets: Make timestamp local Just to make next patching simpler Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	f3a69bfaca	cql3: Add feature service to as_ks_metadata_update() To call prepare_options() with tablets feature state later Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	4dede19e4f	cql3: Add feature service to ks_prop_defs::as_ks_metadata() To call prepare_options() with tablets feature state later Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	267770bf0f	cql3: Add feature service to get_keyspace_metadata() To be passed down to ks_prop_defs::as_ks_metadata() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	6cb3055059	cql: Add tablets on/off switch to CREATE KEYSPACE Now the user can do CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false } to turn tablets off. It will be useful in the future to opt-out keyspace from tablets when they will be turned on by default based on cluster features only. Also one can do just CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true } and let Scylla select the initial tablets value by its own Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:11 +03:00
Pavel Emelyanov	941f6d8fca	cql: Move initial_tablets from REPLICATION to TABLETS in DDL This patch changes the syntax of enabling tablets from CREATE KEYSPACE ... WITH REPLICATION = { ..., 'initial_tablets': <int> } to be CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> } and updates all tests accordingly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:04:48 +03:00
Pavel Emelyanov	4c4a9679d8	network_topology_strategy: Estimate initial_tablets if 0 is set If user configured zero initial tablets (spoiler: or this value was set automagically when enabling tablets begind the scenes) we still need some value to start with and this patch calculates one. The math is based on topology and RF so that all shards are covered: initial_tablets = max(nr_shards_in(dc) / RF_in(dc) for dc in datacenters) The estimation is done when a table is created, not when the keyspace is created. For that, the keyspace is configured with zero initial tabled, and table-creation time zero is converted into auto-estimated value. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:04:48 +03:00
Kamil Braun	423234841e	Merge 'add automatic sstable cleanup to the topology coordinator' from Gleb For correctness sstable cleanup has to run between (some) topology changes. Sometimes even a failed topology change may require running the cleanup. The series introduces automatic sstable cleanup step to the topology change coordinator. Unlike other operations it is not represented as a global transition state, but done by each node independently which allows cleanup to run without locking the topology state machine so tablet code can run in parallel with the cleanup. It is done by having a cleanup state flag for each node in the topology. The flag is a tri state: "clean" - the node is clean, "needed" - cleanup is needed (but not running), "running" - cleanup is running. No topology operation can proceed if there is a node in "running" state, but some operation can proceed even if there are nodes in "needed" state. If the coordinator needs to perform a topology operation that cannot run while there are nodes that need cleanup the coordinator will start one automatically and continue only after cleanup completes. There is also a possibility to kick cleanup manually through the new RAFT API call. * 'cleanup-needed-v8' of https://github.com/gleb-cloudius/scylla: test: add test for automatic cleanup procedure test: add test for topology requests queue management storage_service: topology coordinator: add error injection point to be able to pause the topology coordinator storage_service: topology coordinator: add logging to removenode and decommission storage_service: topology_coordinator: introduce cleanup REST API integrated with the topology coordinator storage_service: topology coordinator: manage cluster cleanup as part of the topology management storage_service: topology coordinator: provide a version of get_excluded_nodes that does not need node_to_work_on as a parameter test: use servers_see_each_other when needed test: add servers_see_each_other helper storage_service: topology coordinator: make topology coordinator lifecycle subscriber system_keyspace: raft topology: load ignore nodes parameter together with removenode topology request storage_service: topology coordinator: introduce sstable cleanup fiber storage_proxy: allow to wait for all ongoing writes storage_service: topology coordinator: mark nodes as needing cleanup when required storage_service: add mark_nodes_as_cleanup_needed function vnode_effective_replication_map: add get_all_pending_nodes() function vnode_effective_replication_map: pre calculate dirty endpoints during topology change raft topology: add cleanup state to the topology state machine	2024-01-14 18:54:02 +01:00
Gleb Natapov	f8b90aeb14	test: add test for automatic cleanup procedure The test runs two bootstraps and checks that there is no cleanup in between. Then it runs a decommission and checks that cleanup runs automatically and then it runs one more decommission and checks that no cleanup runs again. Second part checks manual cleanup triggering. It adds a node, triggers cleanup through the REST API, checks that is runs, decommissions a node and check that the cleanup did not run again.	2024-01-14 15:45:53 +02:00
Gleb Natapov	5882855669	test: add test for topology requests queue management This test creates a 5 node cluster with 2 down nodes (A and B). After that it creates a queue of 3 topology operation: bootstrap, removenode A and removenode B with ignore_nodes=A. Check that all operation manage to complete. Then it downs one node and creates a queue with two requests: bootstrap and decommission. Since none can proceed both should be canceled.	2024-01-14 15:45:53 +02:00
Gleb Natapov	ba7aa0d582	storage_service: topology coordinator: add error injection point to be able to pause the topology coordinator	2024-01-14 15:45:53 +02:00
Gleb Natapov	1afc891bd5	storage_service: topology coordinator: add logging to removenode and decommission Add some useful logging to removenode and decommission to be used by tests later.	2024-01-14 15:45:53 +02:00
Gleb Natapov	97ab3f6622	storage_service: topology_coordinator: introduce cleanup REST API integrated with the topology coordinator Introduce new REST API "/storage_service/cleanup_all" that, when triggered, instructs the topology coordinator to initiate cluster wide cleanup on all dirty nodes. It is done by introducing new global command "global_topology_request::cleanup".	2024-01-14 15:45:53 +02:00
Gleb Natapov	0adb3904d8	storage_service: topology coordinator: manage cluster cleanup as part of the topology management Sometimes it is unsafe to start a new topology operation before cleanup runs on dirty nodes. This patch detects the situation when the topology operation to be executed cannot be run safely until all dirty nodes do cleanup and initiates the cleanup automatically. It also waits for cleanup to complete before proceeding with the topology operation. There can be a situation that nodes that needs cleanup dies and will never clear the flag. In this case if a topology operation that wants to run next does not have this node in its ignore node list it may stuck forever. To fix this the patch also introduces the "liveness aware" request queue management: we do not simple choose _a_ request to run next, but go over the queue and find requests that can proceed considering the nodes liveness situation. If there are multiple requests eligible to run the patch introduces the order based on the operation type: replace, join, remove, leave, rebuild. The order is such so to not trigger cleanup needlessly.	2024-01-14 15:45:50 +02:00
Nadav Har'El	2d04070120	Update seastar submodule * seastar 0ffed835...8b9ae36b (4): > net/posix: Track ap-server ports conflict Fixes #16720 > include/seastar/core: do not include unused header > build: expose flag like -std=c++20 via seastar.pc > src: include used headers for C++ modules build Closes scylladb/scylladb#16769	2024-01-14 14:51:11 +02:00
Gleb Natapov	c9b7bd5a33	storage_service: topology coordinator: provide a version of get_excluded_nodes that does not need node_to_work_on as a parameter Needed by the next patch.	2024-01-14 14:44:07 +02:00
Gleb Natapov	0e68073b22	test: use servers_see_each_other when needed In the next patch we want to abort topology operations if there is no enough live nodes to perform them. This will break tests that do a topology operation right after restarting a node since a topology coordinator may still not see the restarted node as alive. Fix all those tests to wait between restart and a topology operation until UP state propagates.	2024-01-14 14:44:07 +02:00
Gleb Natapov	455ffaf5d8	test: add servers_see_each_other helper The helper makes sure that all nodes in the cluster see each other as alive.	2024-01-14 14:44:07 +02:00
Gleb Natapov	067267ff76	storage_service: topology coordinator: make topology coordinator lifecycle subscriber We want to change the coordinator to consider nodes liveness when processing the topology operation queue. If there is no enough live nodes to process any of the ops we want to cancel them. For that to work we need to be able to kick the coordinator if liveness situation changes.	2024-01-14 14:44:07 +02:00
Gleb Natapov	a4ac64a652	system_keyspace: raft topology: load ignore nodes parameter together with removenode topology request Next patch will need ignore nodes list while processing removenode request. Load it.	2024-01-14 14:44:07 +02:00
Gleb Natapov	f70c4127c6	storage_service: topology coordinator: introduce sstable cleanup fiber Introduce a fiber that waits on a topology event and when it sees that the node it runs on needs to perform sstable cleanup it initiates one for each non tablet, non local table and resets "cleanup" flag back to "clean" in the topology.	2024-01-14 14:44:07 +02:00
Gleb Natapov	5b246920ae	storage_proxy: allow to wait for all ongoing writes We want to be able to wait for all writes started through the storage proxy before a fence is advanced. Add phased_barrier that is entered on each local write operation before checking the fence to do so. A write will be either tracked by the phased_barrier or fenced. This will be needed to wait for all non fenced local writes to complete before starting a cleanup.	2024-01-14 14:44:07 +02:00
Gleb Natapov	b2ba77978c	storage_service: topology coordinator: mark nodes as needing cleanup when required A cleanup needs to run when a node loses an ownership of a range (during bootstrap) or if a range movement to an normal node failed (removenode, decommission failure). Mark all dirty node as "cleanup needed" in those cases.	2024-01-14 14:43:59 +02:00
Gleb Natapov	dbededb1a6	storage_service: add mark_nodes_as_cleanup_needed function The function creates a mutation that sets cleanup to "needed" for each normal node that, according to the erm, has data it does not own after successful or unsuccessful topology operation.	2024-01-14 14:43:33 +02:00
Gleb Natapov	23a27ccc24	vnode_effective_replication_map: add get_all_pending_nodes() function Add a function that returns all nodes that have vnode been moved to them during a topology change operation. Needed to know which nodes need to do cleanup in case of failed topology change operation.	2024-01-14 14:37:16 +02:00
Gleb Natapov	a8f11852da	vnode_effective_replication_map: pre calculate dirty endpoints during topology change Some topology change operations causes some nodes loose ranges. This information is needed to know which nodes need to do cleanup after topology operation completes. Pre calculate it during erm creation.	2024-01-14 14:11:19 +02:00
Gleb Natapov	cc54796e23	raft topology: add cleanup state to the topology state machine The patch adds cleanup state to the persistent and in memory state and handles the loading. The state can be "clean" which means no cleanup needed, "needed" which means the node is dirty and needs to run cleanup at some point, "running" which means that cleanup is running by the node right now and when it will be completed the state will be reset to "clean".	2024-01-14 13:30:54 +02:00
Nadav Har'El	1bcaeb89c7	view: revert cleanup filter that doesn't work with tablets This patch reverts commit `10f8f13b90` from November 2022. That commit added to the "view update generator", the code which builds view updates for staging sstables, a filter that ignores ranges that do not belong to this node. However, 1. I believe this filter was never necessary, because the view update code already silently ignores base updates which do not belong to this replica (see get_view_natural_endpoint()). After all, the view update needs to know that this replica is the Nth owner of the base update to send its update to the Nth view replica, but if no such N exists, no view update is sent. 2. The code introduced for that filter used a per-keyspace replication map, which was ok for vnodes but no longer works for tablets, and causes the operation using it to fail. 3. The filter was used every time the "view update generator" was used, regardless of whether any cleanup is necessary or not, so every such operation would fail with tablets. So for example the dtest test_mvs_populating_from_existing_data fails with tablets: * This test has view building in parallel with automatic tablet movement. * Tablet movement is streaming. * When streaming happens before view building has finished, the streamed sstables get "view update generator" run on them. This causes the problematic code to be called. Before this patch, the dtest test_mvs_populating_from_existing_data fails when tablets are enabled. After this patch, it passes. Fixes #16598 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-14 13:24:44 +02:00
Nadav Har'El	0fe40f729e	mv: sleep a bit before view-update-generator restart The "view update generator" is responsible for generating view updates for staging sstables (such as coming from repair). If the processing fails, the code retries - immediately. If there is some persistent bug, such as issue #16598, we will have a tight loop of error messages, potentially a gigabyte of identical messages every second. In this patch we simply add a sleep of one second after view update generation fails before retrying. We can still get many identical error messages if there is some bug, but not more than one per second. Refs #16598. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-14 13:13:52 +02:00
Kamil Braun	4e18f8b453	Merge 'topology_state_load: stop waiting for IP-s' from Petr Gusev The loop in `id2ip` lambda makes problems if we are applying an old raft log that contains long-gone nodes. In this case, we may never receive the `IP` for a node and stuck in the loop forever. In this series we replace the loop with an if - we just don't update the `host_id <-> ip` mapping in the `token_metadata.topology` if we don't have an `IP` yet. The PR moves `host_id -> IP` resolution to the data plane, now it happens each time the IP-based methods of `erm` are called. We need this because IPs may not be known at the time the erm is built. The overhead of `raft_address_map` lookup is added to each data plane request, but it should be negligible. In this PR `erm/resolve_endpoints` continues to treat missing IP for `host_id` as `internal_error`, but we plan to relax this in the follow-up (see this PR first comment). Closes scylladb/scylladb#16639 * github.com:scylladb/scylladb: raft ips: rename gossiper_state_change_subscriber_proxy -> raft_ip_address_updater gossiper_state_change_subscriber_proxy: call sync_raft_topology_nodes storage_service: topology_state_load: remove IP waiting loop storage_service: sync_raft_topology_nodes: add target_node parameter storage_service: sync_raft_topology_nodes: move loops to the end storage_service: sync_raft_topology_nodes: rename extract process_left_node and process_transition_node storage_service: sync_raft_topology_nodes: rename add_normal_node -> process_normal_node storage_service: sync_raft_topology_nodes: move update_topology up storage_service: topology_state_load: remove clone_async/clear_gently overhead storage_service: fix indentation storage_service: extract sync_raft_topology_nodes storage_service: topology_state_load: move remove_endpoint into mutate_token_metadata address_map: move gossiper subscription logic into storage_service topology_coordinator: exec_global_command: small refactor, use contains + reformat storage_service: wait_for_ip for new nodes storage_service.idl.hh: fix raft_topology_cmd.command declaration erm: for_each_natural_endpoint_until: use is_vnode == true erm: switch the internal data structures to host_id-s erm: has_pending_ranges: switch to host_id	2024-01-12 18:46:51 +01:00
Petr Gusev	e24bee545b	raft ips: rename gossiper_state_change_subscriber_proxy -> raft_ip_address_updater	2024-01-12 18:29:22 +04:00
Petr Gusev	6e7bbc94f4	gossiper_state_change_subscriber_proxy: call sync_raft_topology_nodes When a node changes its IP we need to store the mapping in system.peers and update token_metadata.topology and erm in-memory data structures. The test_change_ip was improved to verify this new behaviour. Before this patch the test didn't check that IPs used for data requests are updated on IP change. In this commit we add the read/write check. It fails on insert with 'node unavailable' error without the fix.	2024-01-12 18:28:57 +04:00
Petr Gusev	6d6e1ba8fb	storage_service: topology_state_load: remove IP waiting loop The loop makes problems if we are applying an old raft log that contains long-gone nodes. In this case, we may never receive the IP for a node and stuck in the loop forever. The idea of the patch is to replace the loop with an if - we just don't update the host_id <-> ip mapping in the token_metadata.topology if we don't have an IP yet. When we get the mapping later, we'll call sync_raft_topology_nodes again from gossiper_state_change_subscriber_proxy.	2024-01-12 15:37:50 +04:00
Petr Gusev	260874c860	storage_service: sync_raft_topology_nodes: add target_node parameter If it's set, instead of going over all the nodes in raft topology, the function will update only the specified node. This parameter will be used in the next commit, in the call to sync_raft_topology_nodes from gossiper_state_change_subscriber_proxy.	2024-01-12 15:37:50 +04:00
Petr Gusev	a9d58c3db5	storage_service: sync_raft_topology_nodes: move loops to the end	2024-01-12 15:37:50 +04:00
Petr Gusev	d1bce3651b	storage_service: sync_raft_topology_nodes: rename extract process_left_node and process_transition_node	2024-01-12 15:37:50 +04:00
Petr Gusev	aa37b6cfd3	storage_service: sync_raft_topology_nodes: rename add_normal_node -> process_normal_node	2024-01-12 15:37:50 +04:00
Petr Gusev	a508d7ffc5	storage_service: sync_raft_topology_nodes: move update_topology up In this and the following commits we prepare sync_raft_topology_nodes to handle target_node parameter - the single host_id which should be updated.	2024-01-12 15:37:50 +04:00
Petr Gusev	1b12f4b292	storage_service: topology_state_load: remove clone_async/clear_gently overhead Before the patch we used to clone the entire token_metadata and topology only to immediately drop everything in clear_gently. This is a sheer waste.	2024-01-12 15:37:50 +04:00

1 2 3 4 5 ...

40738 Commits