scylla

Author	SHA1	Message	Date
Tomasz Grabiec	61532eb53b	topology_state_machine: Introduce lock transition Will be used in load balancer tests to prevent concurrent topology operations, in particular background load balancing. load balancer will be invoked explicitly by the test. Disabling load balancer in topology is not a solution, because we want the explicit call to perform the load balancing.	2025-02-07 16:09:21 +01:00
Ferenc Szili	a59618e83d	truncate: create session during request handling Currently, the session ID under which the truncate for tablets request is running is created during the request creation and queuing. This is a problem because this could overwrite the session ID of any ongoing operation on system.topology#session This change moves the creation of the session ID for truncate from the request creation to the request handling. Fixes #22613 Closes scylladb/scylladb#22615	2025-02-04 22:11:24 +01:00
Ferenc Szili	9fa254e9a8	truncate: trigger truncate logic from transition state instead of global request handler Before this change, the logic of truncate for tablets was triggered from topology_coordinator::handle_global_request(). This was done without using a topology transition state which remained empty throughout the truncate handler's execution. This change moves the truncate logic to a new method topology_coordinator::handle_truncate_table(). This method is now called as a handler of the truncate_table topology transition state instead of a handler of the trunacate_table global topology request.	2025-01-22 11:08:26 +01:00
Ferenc Szili	29ead7014e	truncate: add truncate_table transition state Truncate table for tablets is implemented as a global topology operation. However, it does not have a transition state associated with it, and performs the truncate logic in handle_global_request() while topology::tstate remains empty. This creates problems because topology::is_busy() uses transition_state to determine if the topology state machine is busy, and will return false even though a truncate operation is ongoing. This change adds a new transition state: truncate_table	2025-01-22 10:44:36 +01:00
Tomasz Grabiec	c7f78edc78	Merge 'repair: Wire repair_time in system.tablets for tombstone gc' from Asias He The repair_time in system.tablets will be updated when repair runs successfully. We can now use it to update the repair time for tombstone gc, i.e, when the system.tablets.repair_time is propagated, call gc_state.update_repair_time() on the node that is the owner of the tablet. Since `b3b3e880d3` ("repair: Reduce hints and batchlog flush"), the repair time that could be used for tombstone gc might be smaller than when the repair is started, so the actual repair time for tombstone gc is returned by the repair rpc call from the repair master node. Fixes #17507 New feature. No backport is needed. Closes scylladb/scylladb#21896 * github.com:scylladb/scylladb: repair: Stop using rpc to update repair time for repairs scheduled by scheduler repair: Wire repair_time in system.tablets for tombstone gc test: Disable flush_cache_time for two tablet repair tests test: Introduce guarantee_repair_time_next_second helper repair: Return repair time for repair_service::repair_tablet service: Add tablet_operation.hh	2025-01-20 18:08:49 +01:00
Botond Dénes	47989b1503	Merge 'tasks: add tablet resize virtual task' from Aleksandra Martyniuk In this change, tablet_virtual_task starts supporting tablet resize (i.e. split and merge). Users can see running resize tasks - finished tasks are not presented with the task manager API. A new task state "suspended" is added. If a resize was revoked, it will appear to users as suspended. We assume that the resize was revoked when the tablet number didn't change. Fixes: #21366. Fixes: #21367. No backport, new feature Closes scylladb/scylladb#21891 * github.com:scylladb/scylladb: test: boost: check resize_task_info in tablet_test.cc test: add tests to check revoked resize virtual tasks test: add tests to check the list of resize virtual tasks test: add tests to check spilt and merge virtual tasks status test: test_tablet_tasks: generalize functions replica: service: add split virtual task's children replica: service: pass parent info down to storage_group::split tasks: children of virtual tasks aren't internal by default tasks: initialize shard in task_info ctor service: extend tablet_virtual_task::abort service: retrun status_helper struct from tablet_virtual_task::get_status_helper service: extend tablet_virtual_task::wait tasks: add suspended task state service: extend tablet_virtual_task::get_status service: extend tablet_virtual_task::contains service: extend tablet_virtual_task::get_stats service: add service::task_manager_module::get_nodes tasks: add task_manager::get_nodes tasks: drop noexcept from module::get_nodes replica: service: add resize_task_info static column to system.tablets locator: extend tablet_task_info to cover resize tasks	2025-01-17 14:24:07 +02:00
Asias He	53e6025aa6	repair: Wire repair_time in system.tablets for tombstone gc The repair_time in system.tablets will be updated when repair runs successfully. We can now use it to update the repair time for tombstone gc, i.e, when the system.tablets.repair_time is propagated, call gc_state.update_repair_time() on the node that is the owner of the tablet. Since `b3b3e880d3` ("repair: Reduce hints and batchlog flush"), the repair time that could be used for tombstone gc might be smaller than when the repair is started, so the actual repair time for tombstone gc is returned by the repair rpc call from the repair master node. Fixes #17507	2025-01-17 16:12:05 +08:00
Gleb Natapov	f5fa4d9742	topology coordinator: drop get_endpoint_for_host_id_if_known usage Now that we have gossiper::get_endpoint_state_ptr that works on host ids there is no need to translate id to ip at all.	2025-01-15 16:30:29 +02:00
Kefu Chai	7215d4bfe9	utils: do not include unused headers these unused includes were identifier by clang-include-cleaner. after auditing these source files, all of the reports have been confirmed. please note, because quite a few source files relied on `utils/to_string.hh` to pull in the specialization of `fmt::formatter<std::optional<T>>`, after removing `#include <fmt/std.h>` from `utils/to_string.hh`, we have to include `fmt/std.h` directly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2025-01-14 07:56:39 -05:00
Aleksandra Martyniuk	18b829add8	replica: service: add resize_task_info static column to system.tablets Add resize_task_info static column to system.tablets. Set or delete resize_task_info value when the resize_decision is changed. Reflect the column content in tablet_map.	2025-01-10 10:03:07 +01:00
Emil Maskovsky	115005d863	raft: refactor the voters api to allow enabling voters The raft voters api implementation only allowed to make a node to be a non-voter, but for the "limited voters" feature we need to also have the option to make the node a voter (from within the topology coordinator). Modifying the api to allow both adding and removing voters. This in particular tries to simplify the API by not having to add another set of new functions to make a voter, but having a single setter that allows to modify the node configuration to either become a voter or a non-voter. Fixes: scylladb/scylladb#21914 Refs: scylladb/scylladb#18793 Closes scylladb/scylladb#21899	2025-01-07 15:25:50 +01:00
Asias He	935dcd69fa	repair: Remove repair_task_info only when repair is finished In case of error, repair will be moved into the end_repair stage. We should not remove repair_task_info in this case because the repair task requested by the user is not finished yet. To fix, we should remove repair_task_info at the end of repair stage. Tests are added to ensure failed repair is not reported as finished. Closes scylladb/scylladb#21973	2025-01-07 16:19:40 +02:00
Botond Dénes	69150f0680	Merge 'Fix edge case issues related to tablet draining ' from Tomasz Grabiec Main problem: If we're draining the last node in a DC, we won't have a chance to evaluate candidates and notice that constraints cannot be satisfied (N < RF). Draining will succeed and node will be removed with replicas still present on that node. This will cause later draining in the same DC to fail when we will have 2 replicas which need relocaiton for a given tablet. The expected behvior is for draining to fail, because we cannot keep the RF in the DC. This is consistent, for example, with what happens when removing a node in a 2-node cluster with RF=2. Fixes #21826 Secondary problem: We allowed tablet_draining transition to be exited with undrained nodes, leaving replicas on nodes in the "left" state. Third problem: We removed DOWN nodes from the candidate node set, even when draining. This is not safe because it may lead to overload. This also makes the "main problem" more likely by extending it to the scenario when the DC is DOWN. The overload part in not a problem in practice currently, since migrations will block on global topology barrier if there are DOWN nodes. Closes scylladb/scylladb#21928 * github.com:scylladb/scylladb: tablets: load_balancer: Fail when draining with no candidate nodes tablets: load_balancer: Ignore skip_list when draining tablets: topology_coordinator: Keep tablet_draining transition if nodes are not drained	2025-01-07 13:04:00 +02:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Aleksandra Martyniuk	d0cda8ebef	replica: check enabled features in tablet_map_to_mutation Before adding a value to a new column in tablet_map_to_mutation check if the column is supported by the whole cluster. Closes scylladb/scylladb#21941	2024-12-17 07:02:11 +02:00
Tomasz Grabiec	2de3c079b2	tablets: topology_coordinator: Keep tablet_draining transition if nodes are not drained Empty plan with nodes to drain meant that we can exit tablet_draining transition and move to the next stage of decommission/removenode. In case tablet scheduler creates an empty plan for some reason but there are still underained tablets, that could put topology in an invalid state. For example, this can currently happen if there are no non-draining nodes in a DC. This patch adds a safety net in the topology coordinator which prevents moving forward with undrained tablets.	2024-12-16 16:54:59 +01:00
Avi Kivity	fe9fcdfe30	task_manager.hh: replace boost ranges with std ranges Standardize on one range library to reduce dependency load. Unfortunately, std::views::concat (the replacement for boost::join), is C++26 only. We use two separate inserts to the result vector to compensate, and rationalize it by saying that boost::join() is likely slow due to the need for type-erasure. Closes scylladb/scylladb#21834	2024-12-16 13:08:02 +02:00
Botond Dénes	5880a1b90b	Merge 'tasks: add tablet migration virtual task' from Aleksandra Martyniuk In this change, tablet_virtual_task starts supporting tablet migration, in addition to tablet repair. Both tablet operations reuse the same virtual_task because their task data is retrieved similarly. However, it changes nothing from the task manager API users' perspective. They can list running migrations or check their statuses all the same as if migration had its own virtual_task. Users can see running migration tasks - finished tasks are not presented with the task manager API. However, the result of the migration (whether it succeeded or failed) would be presented to users, if they use wait API. If a migration was reverted, it will appear to users as failed. We assume that the migration was reverted, when its destination does not contain a tablet replica. Fixes: https://github.com/scylladb/scylladb/issues/21365. No backport, new feature Closes scylladb/scylladb#21729 * github.com:scylladb/scylladb: test: boost: check migration_task_info in tablet_test.cc replica: add repair related fields to tablet_map_to_mutation test: add tests to check the failed migration virtual tasks test: add tests to check the list of migration virtual tasks test: add tests to check migration virtual tasks status test: topology_tasks: generalize repair task functions service: extend tablet_virtual_task::abort service: extend tablet_virtual_task::wait service: extend tablet_virtual_task::get_status_helper service: extend tablet_virtual_task::contains service: extend tablet_virtual_task::get_stats service: tasks: make get_table_id a method of virtual_task_hint service: tasks: extend virtual_task_hint replica: service: add migration_task_info column to system.tablets locator: extend tablet_task_info to cover migration tasks locator: rename tablet_task_info methods	2024-12-13 10:54:03 +02:00
muthu90tech	e49381119d	locator: topology: use node& instead of node* This change goes thru locator:topology to use node& instead of node* where nullptr is not possible. There are places where the node object is used in unordered_set, in those cases the node is wrapped in std::reference_wrapper. Fixes scylladb/scylladb#20357 Closes scylladb/scylladb#21863	2024-12-12 13:22:55 +01:00
Aleksandra Martyniuk	bc17535427	test: add tests to check the failed migration virtual tasks	2024-12-11 15:17:16 +01:00
Aleksandra Martyniuk	b473efbefd	test: add tests to check migration virtual tasks status	2024-12-11 15:17:15 +01:00
Aleksandra Martyniuk	9fad3a621a	replica: service: add migration_task_info column to system.tablets Add migration_task_info column to system.tablets. Set migration_task_info value on migration request if the feature is enabled in the cluster. Reflect the column content in tablet_metadata.	2024-12-11 12:07:36 +01:00
Aleksandra Martyniuk	dee6404aa4	locator: rename tablet_task_info methods	2024-12-11 12:07:36 +01:00
Tomasz Grabiec	8e60a0b831	Merge 'truncate: make TRUNCATE TABLE safe with tablets' from Ferenc Szili Currently truncating a table works by issuing an RPC to all the nodes which call `database::truncate_table_on_all_shards()`, which makes sure that older writes are dropped. It works with tablets, but is not safe. A concurrent replication process may bring back old data. This change makes makes TRUNCATE TABLE a topology operation, so that it excludes with other processes in the system which could interfere with it. More specifically, it makes TRUNCATE a global topology request. Backporting is not needed. Fixes #16411 Closes scylladb/scylladb#19789 * github.com:scylladb/scylladb: docs: docs: topology-over-raft: Document truncate_table request storage_proxy: fix indentation and remove empty catch/rethrow test: add tests for truncate with tablets storage_proxy: use new TRUNCATE for tablets truncate: make TRUNCATE a global topology operation storage_service: move logic of wait_for_topology_request_completion() RPC: add truncate_with_tablets RPC with frozen_topology_guard feature_service: added cluster feature for system.topology schema change system.topology_requests: change schema storage_proxy: propagate group0 client and TSM dependency	2024-12-10 17:50:50 +01:00
Ferenc Szili	e65a235fd5	test: add tests for truncate with tablets This patch adds the unit tests for truncate with tablets. test_truncate_while_migration() triggers a tablet migration, then runs a TRUNCATE TABLE for the table containing the tablet being migrated. test_truncate_with_concurrent_drop() starts a truncate, then attempts to drop the table while it is being truncated. test_truncate_while_node_restart() validates the case where a replica node is restarted while truncate is running. test_truncate_with_coordinator_crash() validates if truncate is correctly completed in cases where the topology coordinator has crashed or restarted after the truncate session is cleared, but before the truncate request is finalized.	2024-12-09 16:38:50 +01:00
Ferenc Szili	93cfeb9160	truncate: make TRUNCATE a global topology operation This commit adds the code needed to create a TRUNCATE global topology request. It also adds the handler for this request to the topology coordinator. The execution of the truncate operation is not canceled on a timeout, but the query coordinator side will return a timeout error.	2024-12-09 16:38:37 +01:00
Tomasz Grabiec	7e2875d648	Merge 'Add tablet merge support' from Raphael Raph Carvalho The goal of merge is to reduce the tablet count for a shrinking table. Similar to how split increases the count while the table is growing. The load balancer decision to merge is implemented today (came with infrastructure introduced for split), but it wasn't handled until now. Initial tablet count is respected while the table is in "growing mode". For example, the table leaves it if there was a need to split above the initial tablet count. After the table leaves the mode, the average size can be trusted to determine that the table is shrinking. Merge decision is emitted if the average tablet size is 50% of the target. Hysteresis is applied to avoid oscillations between split and merges. Similar to split, the decision to merge is recorded in tablet map's resize_type field with the string "merge". This is important in case of coordinator failover, so new coordinator continues from where the old left off. Unlike split, the preparation phase during merge is not done by the replica (with split compactions), but rather by the coordinator by co-locating sibling tablets in the same node's shard. We can define sibling tablets as tablets that have contiguous range and will become one after merge. The concept is based on the power-of-two constraint and token contiguity. For example, in a table with 4 tablets, tablets of ids 0 and 1 are siblings, 2 and 3 are also siblings. The algorithm for co-locating sibling tablets is very simple. The balancer is responsible for it, and it will emit migrations so that "odd" tablet will follow the "even" one. For example, tablet 1 will be migrated to where tablet 0 lives. Co-location is low in priority, it's not the end of the world to delay merge, but it's not ideal to delay e.g. decommission or even regular load balancing as that can translate into temporary unbalancing, impacting the user activities. So co-location migrations will happen when there is no more important work to do. While regular balancing is higher in priority, it will not undo the co-location work done so far. It does that by treating co-located tablets as if they were already merged. The load inversion convergence check was adjusted so balancer understand when two tablets are being migrated instead of one, to avoid oscillations. When balancer completes co-location work for a table undergoing merge, it will put the id of the table into the resize_plan, which is about communicating with the topology coordinator that a table is ready for it. With all sibling tablets co-located, the coordinator can resize the tablet map (reduce it by a factor of 2) and record the new map into group0. All the replicas will react to it (on token metadata update) by merging the storage (memtable(s) + sstables) of sibling tablets into one. Fixes #18181. system test details: test: https://github.com/pehala/scylla-cluster-tests/blob/tablets_split_merge/tablets_split_merge_test.py yaml file: https://github.com/pehala/scylla-cluster-tests/blob/tablets_split_merge/test-cases/features/tablets/tablets-split-merge-test.yaml instance type: i3.8xlarge nodes: 3 target tablet size: 0.5G (scaled down by 10, to make it easier to trigger splits and merges) description: multiple cycles of growing and shrinking the data set in order to trigger splits and merges. data_set_size: ~100G initial_tablets: 64, so it grew to 128 tablets on split, and back to 64 on merge. latency of reads and writes that happened in parallel to split and merge: ``` $ for i in scylla-bench; do cat $i \| grep "Mode\\|99th:\\|99\.9th:"; done Mode: write 99.9th: 3.145727ms 99th: 1.998847ms 99.9th: 3.145727ms 99th: 2.031615ms Mode: read 99.9th: 3.145727ms 99th: 2.031615ms 99.9th: 3.145727ms 99th: 2.031615ms Mode: write 99.9th: 3.047423ms 99th: 1.933311ms 99.9th: 3.047423ms 99th: 1.933311ms Mode: read 99.9th: 3.145727ms 99th: 1.900543ms 99.9th: 3.145727ms 99th: 1.900543ms Mode: write 99.9th: 5.079039ms 99th: 3.604479ms 99.9th: 35.389439ms 99th: 25.624575ms Mode: write 99.9th: 3.047423ms 99th: 1.998847ms 99.9th: 3.047423ms 99th: 1.998847ms Mode: read 99.9th: 3.080191ms 99th: 2.031615ms 99.9th: 3.112959ms 99th: 2.031615ms ``` Closes scylladb/scylladb#20572 github.com:scylladb/scylladb: docs: Document tablet merging tests/boost: Add test to verify correctness of balancer decisions during merge tests/topology_experimental_raft: Add tablet merge test service: Handle exception when retrying split service: Co-locate sibling tablets for a table undergoing merge gms: Add cluster feature for tablet merge service: Make merge of resize plan commutative replica: Implement merging of compaction groups on merge completion replica: Handle tablet merge completion service: Implement tablet map resize for merge locator: Introduce merge_tablet_info() service: Rename topology::transition_state::tablet_split_finalization service: Respect initial_tablet_count if table is in growing mode service: Wire migration_tablet_set into the load balancer locator: Add tablet_map::sibling_tablets() service: Introduce sorted_replicas_for_tablet_load() locator/tablets: Extend tablet_replica equality comparator to three-way service: Introduce alias to per-table candidate map type service: Add replication constraint check variant for migration_tablet_set service: Add convergence check variant for migration_tablet_set service: Add migration helpers for migration_tablet_set service/tablet_allocator: Introduce migration_tablet_set service: Introduce migration_plan::add(migrations_vector) locator/tablets: Introduce tablet_map::for_each_sibling_tablets() locator/tablets: Introduce tablet_map::needs_merge() locator/tablets: Introduce resize_decision::initial_decision() locator/tablets: Fix return type of three-way comparison operators service: Extract update of node load on migrations service: Extract converge check for intra-node migration service: Extract erase of tablet replicas from candidate list scripts/tablet-mon: Allow visualization of tablet id	2024-12-06 18:06:20 +01:00
Abhinav	6c90a25014	Fix gossiper orphan node floating problem by adding a remover fiber In the current scenario, if during startup, a node crashes after initiating gossip and before joining group0, then it keeps floating in the gossiper forever because the raft based gossiper purging logic is only effective once node joins group0. This orphan node hinders the successor node from same ip to join cluster since it collides with it during gossiper shadow round. This commit intends to fix this issue by adding a background thread which periodically checks for such orphan entries in gossiper and removes them. A test is also added in to verify this logic. This test fails without this background thread enabled, hence verifying the behavior. Fixes: scylladb/scylladb#20082 Closes scylladb/scylladb#21600	2024-12-06 10:45:07 +01:00
Piotr Dulikowski	def51e252d	Merge 'service/topology_coordinator: migrate view builder only if all nodes are up' from Michał Jadwiszczak The migration process is doing read with consistency level ALL, requiring all nodes to be alive. Fixes scylladb/scylladb#20754 The PR should be backported to 6.2, this version has view builder on group0. Closes scylladb/scylladb#21708 * github.com:scylladb/scylladb: test/topology_custom/test_view_build_status: add reproducer service/topology_coordinator: migrate view builder only if all nodes are up	2024-12-06 09:07:07 +01:00
Ferenc Szili	36d35d2297	RPC: add truncate_with_tablets RPC with frozen_topology_guard This change introduces a new truncate_with_tablets RPC with a parameter of type service::frozen_topology_guard. This is materialized on replica nodes into a topology_guard which guarantees that truncate is performed under a global session, which, in turn, makes sure that we don't execute truncate as a result of stale RPCs. Also, this RPC does not have a timeout. Timeout will be handled on the coordinator side, and the truncate operation will not be allowed to time out.	2024-12-04 11:30:07 +01:00
Raphael S. Carvalho	e00798f1b1	service: Rename topology::transition_state::tablet_split_finalization This transition state will be reused by merge completion, so let's rename it to tablet_resize_finalization. The completion handling path will also be reused, so let's rename functions involved similarly. The old name "tablet split finalization" is deprecated but still recognized and points to the correct transition. Otherwise, the reverse lookup would fail when populating topology system table which last state was split finalization. NOTE: I thought of adding a new tablet_merge_finalization, but it would complicate things since more than one table could be ready for either split or merge, so you need a generic transition state for handling resize completion. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-12-03 20:45:20 -03:00
Gleb Natapov	96309224ff	raft_address_map: remove raft address map It is no longer used.	2024-12-02 10:31:14 +02:00
Gleb Natapov	cbb6148a36	topology coordinator: drop expiring entries in gossiper address map on error injections since raft one is no longer used	2024-12-02 10:31:13 +02:00
Gleb Natapov	2c17fa6370	topology coordinator: drop raft_address_map dependency raft_address_map is not used by the coordinator code any longer.	2024-12-02 10:31:11 +02:00
Gleb Natapov	aba4ae0ca1	topology coordinator: rename wait_for_ip to wait_for_gossiper and drop raft address map usage What wait_for_ip is actually does is waiting for a node to appear in the gossiper since this is when it is added to the raft address map. Drop the usage of the address map and check the gossiper directly.	2024-12-02 10:31:11 +02:00
Gleb Natapov	414ec6d5bb	topology coordinator: get rid of host id to ip translations Now we have enough functionality in the gossiper and messaging service to get rid of ip2id function in the topology coordinator. We can use hos ids directly.	2024-12-02 10:31:11 +02:00
Gleb Natapov	be5caec54e	service: make address_map raft independent We want to start using address map class outside for raft, so lets make it work on host_id instead of raft::servers_id and move is outside of raft.	2024-12-01 12:12:29 +02:00
Kefu Chai	f436edfa22	mutation: remove unused "#include"s these unused includes are identified by clang-include-cleaner. after auditing the source files, all of the reports have been confirmed. please note, because `mutation/mutation.hh` does not include `seastar/coroutine/maybe_yield.hh` anymore, and quite a few source files were relying on this header to bring in the declaration of `maybe_yield()`, we have to include this header in the places where this symbol is used. the same applies to `seastar/core/when_all.hh`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-11-29 14:01:44 +08:00
Michał Jadwiszczak	66071d8097	service/topology_coordinator: migrate view builder only if all nodes are up The migration process is doing read with consistency level ALL, requiring all nodes to be alive. This patch also adds the topology state machine notification when a node is up.	2024-11-28 12:11:08 +01:00
Kefu Chai	a5ee0c896b	treewide: migrate from boost::adaptors::filtered to std::views::filter Modernize the codebase by replacing Boost range adaptors with C++23 standard library views, reducing external dependencies and leveraging modern C++ language features. Key Changes: - Replace `boost::adaptors::filtered` with `std::views::filter` - Remove `#include <boost/range/adaptor/filtered.hpp>` - Utilize standard library range views Motivation: - Reduce project's external dependency footprint - Leverage standard library's range and view capabilities - Improve long-term code maintainability - Align with modern C++ best practices Implementation Challenges and Considerations: 1. Range Conversion and Move Semantics - `std::ranges::to` adaptor requires rvalue references - Necessitated updates to variable and parameter constness - Example: `cql3/restrictions/statement_restrictions.cc` modified to remove `const` from `common` to enable efficient range conversion 2. Range Iteration and Mutation - Range views may mutate internal state during iteration - Cannot pass ranges by const reference in some scenarios - Solution: Pass ranges by rvalue reference to explicitly indicate state invalidation Limitations: - One instance of `boost::adaptors::filtered` temporarily preserved due to lack of a C++23 alternative for `boost::join()` - A comprehensive replacement will be addressed in a follow-up change This change is part of our ongoing effort to modernize the codebase, reducing external dependencies and adopting modern C++ practices. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21648	2024-11-26 14:26:50 +02:00
Asias He	b71a563030	repair: Add core tablet repair scheduler support This adds a new tablet migration kind: repair. It allows tablet repair scheduler to use this migration kind to schedule repair jobs. The current repair scheduler implementation does the following: - A tablet is picked to be repaired when the time since last repair is bigger than a threshold (auto repair mode) or it is requested by user (manual repair mode) - The tablet repair can be scheduled along with tablet migration and rebuild. It runs in the tablet_migration track. - Repair jobs are scheduled in a smart way so that at any point in time, there are no more than configured jobs per shard, which is similar to scylla manager's control. In this patch, both the manual repair and the auto repair are not enabled yet.	2024-11-20 09:42:41 +08:00
Pavel Emelyanov	39cb93be3c	treewide,error_injection: Use inject(wait_for_message) and fix tests This is continuation of previous patch, this time also update tests that wait for specific message in logs (to make sure injection handler was called and paused the code execution). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-30 16:53:33 +03:00
Kamil Braun	101c1d50f0	Merge 'fix nodetool status to show zero-token nodes' from Abhinav Kumar Jha In the current scenario, the nodetool status doesn’t display information regarding zero token nodes. For example, if 5 nodes are spun by the administrator, out of which, 2 nodes are zero token nodes, then nodetool status only shows information regarding the 3 non-zero token nodes. This commit intends to fix this issue by leveraging the “/storage_service/host_id ” API and adding appropriate logic in scylla-nodetool.cc to support zero token nodes. A test is also added in nodetool/test_status.py to verify this logic. This test fails without this commit’s zero token node support logic, hence verifying the behavior. This PR fixes a bug. Hence we need to backport it. Backporting needs to be done only to 6.2 version, since earlier versions don't support zero token nodes. Fixes: scylladb/scylladb#19849 Fixes: scylladb/scylladb#17857 Closes scylladb/scylladb#20909 * github.com:scylladb/scylladb: fix nodetool status to show zero-token nodes test: move `wait_for_first_completed` to pylib/util.py token_metadata: rename endpoint_to_host_id_map getter and add support for joining nodes	2024-10-28 12:19:36 +01:00
Abhinav	c00d40b239	fix nodetool status to show zero-token nodes In the current scenario, the nodetool status doesn’t display information regarding zero token nodes. For example, if 5 nodes are spun by the administrator, out of which, 2 nodes are zero token nodes, then nodetool status only shows information regarding the 3 non-zero token nodes. This commit intends to fix this issue by leveraging the “/storage_service/host_id ” API and adding appropriate logic in scylla-nodetool.cc to support zero token nodes. Robust topology tests are added, which spins up scylla nodes and confirm nodetool status output for various cases, providing good coverage. A test is also added in nodetool/test_status.py to verify this logic. These tests fail without this commit’s zero token node support logic, hence verifying the behavior. The test `test_status_keyspace_joining_node` has been removed. This test is based on case where host_id=None, which is impossible. Since we now use host_id_map for node discovery in nodetool, the nodes with "host_id=None" go undetected. Since this case is anyway impossible, we can get rid of this. This PR fixes a bug. Hence we need to backport it. Backporting needs to be done only to 6.2 version, since earlier versions dont support zero token nodes. Fixes: scylladb/scylladb#19849	2024-10-25 13:28:09 +05:30
Kamil Braun	f5c60e538d	Merge 'cql/tablets: fix retrying ALTER tablets KEYSPACE' from Piotr Smaron ALTER tablets-enabled KEYSPACES (KS) may fail due to `group0_concurrent_modification`, in which case it's repeated by a `for` loop surrounding the code. But because raft's `add_entry` consumes the raft's guard (by `std::move`'ing the guard object), retries of ALTER KS will use a moved-from guard object, which is UB, potentially a crash. The fix is to remove the before mentioned `for` loop altogether and rethrow the exception, as the `rf_change` event will be repeated by the topology state machine if it receives the concurrent modification exception, because the event will remain present in the global requests queue, hence it's going to be executed as the very next event. Note: refactor is implemented in the follow-up commit. Fixes: scylladb/scylladb#21102 Should be backported to every 6.x branch, as it may lead to a crash. Closes scylladb/scylladb#21121 * github.com:scylladb/scylladb: test: add UT to test retrying ALTER tablets KEYSPACE cql/tablets: fix indentation in `rf_change` event handler cql/tablets: fix retrying ALTER tablets KEYSPACE	2024-10-23 10:01:21 +02:00
Piotr Smaron	522bede8ec	test: add UT to test retrying ALTER tablets KEYSPACE The newly added testcase is based on the already existing `test_alter_dropped_tablets_keyspace`. A new error injection is created, which stops the ALTER execution just before the changes are submitted to RAFT. In the meantime, a new schema change is performed using the 2nd node in the cluster, thus causing the 1st node to retry the ALTER statement.	2024-10-22 18:22:01 +02:00
Piotr Smaron	3f4c8a30e3	cql/tablets: fix indentation in `rf_change` event handler Just moved the code that previously was under a `for` loop by 1 tab, i.e. 4 spaces, to the left.	2024-10-22 18:22:01 +02:00
Piotr Smaron	de511f56ac	cql/tablets: fix retrying ALTER tablets KEYSPACE ALTER tablets-enabled KEYSPACES (KS) may fail due to `group0_concurrent_modification`, in which case it's repeated by a `for` loop surrounding the code. But because raft's `add_entry` consumes the raft's guard (by `std::move`'ing the guard object), retries of ALTER KS will use a moved-from guard object, which is UB, potentially a crash. The fix is to remove the before mentioned `for` loop altogether and rethrow the exception, as the `rf_change` event will be repeated by the topology state machine if it receives the concurrent modification exception, because the event will remain present in the global requests queue, hence it's going to be executed as the very next event. `topology_coordinator::handle_topology_coordinator_error` handling the case of `group0_concurrent_modification` has been extended with logging in order not to write catch-log-throw boilerplate. Note: refactor is implemented in the follow-up commit. Fixes: scylladb/scylladb#21102	2024-10-22 18:22:00 +02:00
Kefu Chai	5cd619a60c	treewide: s/boost::adaptors::map_keys/std::views::keys/ now that we are allowed to use C++23. we now have the luxury of using `std::views::keys`. in this change, we: - replace `boost::adaptors::map_keys` with `std::views::keys` - update affected code to work with `std::views::keys` to reduce the dependency to boost for better maintainability, and leverage standard library features for better long-term support. this change is part of our ongoing effort to modernize our codebase and reduce external dependencies where possible. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21198	2024-10-21 12:47:52 +03:00
Piotr Smaron	e0c1a51642	cql/tablets: handle MVs in ALTER tablets KEYSPACE ALTERing tablets-enabled KEYSPACES (KS) didn't account for materialized views (MV), and only produced tablets mutations changing tables. With this patch we're producing tablets mutations for both tables and MVs, hence when e.g. we change the replication factor (RF) of a KS, both the tables' RFs and MVs' RFs are updated along with tablets replicas. The `test_tablet_rf_change` testcase has been extended to also verify that MVs' tablets replicas are updated when RF changes. Fixes: #20240 Closes scylladb/scylladb#21007	2024-10-09 10:51:18 +02:00

1 2 3 4

177 Commits