scylla

Author	SHA1	Message	Date
Tomasz Grabiec	f1bda8d4c1	tablets: load_balancer: Scale down tablet count to respect per-shard tablet count goal The limit is enforced by controlling average per-shard tablet replica count in a given DC, which is controlled by per-table tablet count. This is effective in respecting the limit on individual shards as long as tablet replicas are distributed evenly between shards. There is no attempt to move tablets around in order to enforce limits on individual shards in case of imbalance between shards. If the average per-shard tablet count exceeds the limit, all tables which contribute to it (have replicas in the DC) are scaled down by the same factor. Due to rounding up to the nearest power of 2, we may overshoot the per-shard goal by at most a factor of 2. If different DCs want different scale factors of a given table, the lowest scale factor is chosen for a given table. The limit is configurable. It's a global per-cluster config which controls how many tablet replicas per shard in total we consider to be still ok. It controls tablet allocator behavior, when choosing initial tablet count. Even though it's a per-node config, we don't support different limits per node. All nodes must have the same value of that config. It's similar in that regard to other scheduler config items like tablets_initial_scale_factor and target_tablet_size_in_bytes.	2025-02-19 16:29:07 +01:00
Tomasz Grabiec	94b5165ac7	tablets: Use scheduler's make_sizing_plan() to decide about tablet count of a new table This makes decisions made by the scheduler consistent with decisions made on table creation, with regard to tablet count. We want to avoid over-allocation of tablets when table is created, which would then be reduced by the scheduler's scaling logic. Not just to avoid wasteful migrations post table creation, but to respect the per-shard goal. To respect the per-shard goal, the algorithm will no longer be as simple as looking at hints, and we want to share the algorithm between the scheduler and initial tablet allocator. So invoke the scheduler to get the tablet count when table is created.	2025-02-19 14:40:07 +01:00
Tomasz Grabiec	dd68c1e526	tablets: load_balancer: Determine desired count from size separately from count from options For debugging purposes. Later we will want to know which rule determined the count.	2025-02-19 14:40:07 +01:00
Tomasz Grabiec	e4c5e2ab55	tablets: load_balancer: Determine resize decision from target tablet count The flow is simpler this way, since the decision cannot now be mismatched with target tablet count.	2025-02-19 14:40:07 +01:00
Tomasz Grabiec	35192e2d6f	tablets: load_balancer: Allow splits even if table stats not available This is in preparation for using the sizing plan during table creation where we never have size stats, and hints are the only determining factor for target tablet count.	2025-02-19 14:40:07 +01:00
Tomasz Grabiec	d3ffea77e6	tablets: load_balancer: Extract make_sizing_plan() Resize plan making will now happen in two stages: 1) Determine desired tablet counts per table (sizing plan) 2) Schedule resize decisions We need intermediate step in the resize plan making, which gives us the planned tablet counts, so that we can plug this part of the algorithm into initial tablet allocation on table construction. We want decisisons made by the scheduler to be consistent with decisions made on table creation. We want to avoid over-allocation of tablets when table is created, which would then be reduced by the scheduler. Not just to avoid wasteful migrations post table creation, but to respect the per-shard goal. To respect the per-shard goal, the algorithm will no longer be as simple as looking at hints, and we want to share the algorithm between the scheduler and initial tablet allocator. Also, this sizing plan will be later plugged into a virtual table for observability.	2025-02-19 14:40:06 +01:00
Tomasz Grabiec	33db0d4fea	tablets: Add formatter for resize_decision::way_type	2025-02-19 14:39:40 +01:00
Tomasz Grabiec	b7e5919fdd	tablets: load_balancer: Simplify resize_urgency_cmp() Logic is preserved since target tablet size is constant for all tables. Dropping d.target_max_tablet_size() will allow us to move it to the load_balancer scope.	2025-02-19 14:39:40 +01:00
Tomasz Grabiec	997007a2df	tablets: load_balancer: Keep config items as instance members It fits preexisting pattern for other config items, and makes the code less cluttered because we don't have to carry config items across calls.	2025-02-19 14:39:39 +01:00
Tomasz Grabiec	ce959818a3	locator: network_topology_strategy: Simplify calculate_initial_tablets_from_topology()	2025-02-19 14:38:50 +01:00
Tomasz Grabiec	f043c83ba5	tablets: Change the meaning of initial_scale to mean min-avg-tablets-per-shard Currently the scale is applied post rounding up of tablet count so that tablet count per shard is at least 1. In order to be able to use the scale to increase tablet count per shard, we need to apply it prior to division by RF, otherwise we will overshoot per-shard tablet replica count. Example: 4 nodes, -c1, rf=3, initial_tablets_scale=10 Before: initial_tablet_count=20, tablet-per-shard=15 After: initial_tablet_count=14, tablets-per-shard=10.5	2025-02-19 14:38:50 +01:00
Tomasz Grabiec	2463e524ed	tablets: Set default initial tablet count scale to 10 This will result in new tables having at least 10 tablet replicas per shard by default. We want this to reduce tablet load imbalance due to differences in tablet count per shard, where some shards have 1 tablet and some shards have 2 tablets. With higher tablet count per shard, this difference-by-one is less relevant. Fixes #21967 In some tests, we explicity set the initial scale to 1 as some of the existing tests assume 1 compaction group per shard. test.py uses a lower default. Having many tablets per shard slows down certain topology operations like decommission/replace/removenode, where the running time is proportional to tablet count, not data size, because constant cost (latency) of migration dominates. This latency is due to group0 operations and barriers. This is especially pronounced in debug mode. Scheduler allows at most 2 migrations per shard, so this latency becomes a determining factor for decommission speed. To avoid this problem in tests, we use lower default for tablet count per shard, 2 in debug/dev mode and 4 in release mode. Alternatively, we could compensate by allowing more concurrency when migrating small tablets, but there's no infrastructure for that yet. I observed that with 10 tablets per shard, debug-mode topology_custom.mv/test_mv_topology_change starts to time-out during removenode (30 s).	2025-02-19 14:38:50 +01:00
Tomasz Grabiec	8eedb551b5	tablets: network_topology_stragy: Coroutinize calculate_initial_tablets_from_topology() To insert preemption points later.	2025-02-19 14:38:49 +01:00
Tomasz Grabiec	eef18d879c	tablets: load_balancer: Extract get_schema_and_rs() For better readability.	2025-02-19 14:38:49 +01:00
Tomasz Grabiec	9d600dd783	tablets: load_balancer: Drop test_mode tablets_test is now creating proper schema in the database, so test_mode is no longer needed.	2025-02-19 14:38:48 +01:00
yangpeiyu2_yewu	0de232934a	mutation_writer/multishard_writer.cc: wrap writer into futurize_invoke wrapped writer in seastar::futurize_invoke to make sure that the close() for the mutation_reader can be executed before destruction. Fixes #22790 Closes scylladb/scylladb#22812	2025-02-19 13:00:45 +02:00
Pavel Emelyanov	d79eec2e76	sstable: Unfriend sstable_directory class It was only needed there for create_pending_deletion_log() method to get private "_storage" from sstable. Now it's all gone and friendship can be broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-02-19 13:09:04 +03:00
Pavel Emelyanov	96a867c869	sstable_directory: Move sstable_directory::pending_delete_result ... to where it belongs -- to the filesystem storage driver itself. Continuation of the previous patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-02-19 13:09:04 +03:00
Pavel Emelyanov	f6de6d6887	sstable_directory: Calculate prefixes outside of create_pending_deletion_log() The method in question walks the list of sstables and accumulates sstables' prefixes into a set on pending_delete_result object. The set in question is not used at all in this method and is in fact alien to it -- the p.d._result object is used by the filesystem storage driver as atomic deletion prepare/commit transparent context. Said that, move the whole pending_delete_result to where it belongs and relax the create_pending_deletion_log() to only return the log directory path string. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-02-19 13:09:04 +03:00
Pavel Emelyanov	b0c1a77528	sstable_directory: Introduce local pending_delete_log variable This is simply to reduce the churn in the next patch, nothing special here. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-02-19 13:09:04 +03:00
Pavel Emelyanov	5b92c4549e	sstable_directory: Relax toc file dumping to deletion log The current code takes sstable prefix() (e.g. the /foo/bar string), then trims from its fron the basedir (e.g. the /foo/ string) and then writes the remainder, a slash and TOC component name (e.g. the xxx-TOC.txt string). The final result is "bar/xxx-TOC.txt" string. The taking into account sstable.toc_filename() renders into sstable.prefix + \slash + component-name, the above result can be achieved by trimming basedir directory from toc_filename(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-02-19 13:09:04 +03:00
Botond Dénes	820f196a49	replica/database: setup_scylla_memory_diagnostics_producer() un-static semaphore dump lambda The lambda which dumps the diagnostics for each semaphore, is static. Considering that said lambda captures a local (writeln) by reference, this is wrong on two levels: * The writeln captured on the shard which happens to initialize this static, will be used on all shards. * The writeln captured on the first dump, will be used on later dumps, possibly triggering a segfault. Drop the `static` to make the lambda local and resolve this problem. Fixes: scylladb/scylladb#22756 Closes scylladb/scylladb#22776	2025-02-19 12:22:16 +03:00
Nadav Har'El	a7bf36831c	test: remove spammy deprecation warnings Recently, when running Alternator tests we get hundreds of warnings like the following from basically all test files: /usr/lib/python3.12/site-packages/botocore/crt/auth.py:59: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC). /usr/local/lib/python3.12/site-packages/pytest_elk_reporter.py:299: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC). These warnings all come from two libraries that we use in the tests - botocore is used by Alternator tests, and elk reporter is a plugin that we don't actually use, but it is installed by dtest and we often see it in our runs as well. These warnings have zero interest to us - not only do we not care if botocore uses some deprecated Python APIs and will need to be updated in the future, all these warnings are hiding real warnings about deprecated things we actually use in our own test code. The patch modifies test/pytest.ini (used by all our Python tests, including but not limited to Alternator tests) to ignore deprecation warnings from inside these two libraries, botocore and elk_reporter. After this patch, test/alternator/run finishes without any warnings at all. test/cqlpy does still have a few warnings left, which earlier were hidden by the thousands of spammy warning eliminated in this patch. We fix one of these warnings in this patch: ResultSet indexing support will be removed in 4.0. Consider using ResultSet.one() by doing exactly what the warning recommended. Some deprecation warnings in test/cqlpy remain in calls to get_query_trace(). The "blame" for these warning is misplaced - this function is part of the cassandra driver, but Python seems to think it's part of our test code so I can't avoid them with the pytest.ini trick, I'm not sure why. So I don't know yet how to eliminate these last warnings. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#22881	2025-02-19 12:15:51 +03:00
Avi Kivity	45b2026209	service: raft: drop unused dependency from group0_state_machine_merger.hh Reduces dependency load. Closes scylladb/scylladb#22781	2025-02-19 12:14:58 +03:00
Kefu Chai	d1f117620a	build: restrict -Xclang options to Clang compiler only Modify CMake configuration to only apply "-Xclang" options when building with the Clang compiler. These options are Clang-specific and can cause errors or warnings when used with other compilers like g++. This change: - Adds compiler detection to conditionally apply Clang-specific flags - Prevents build failures when using non-Clang compilers Previously, the build system would apply these flags universally, which could lead to compilation errors with other compilers. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22899	2025-02-19 12:13:35 +03:00
Kefu Chai	d384b0a63e	utils: use std::to_underlying() when appropriate Use std::to_underlying() when comparing unsigned types with enumeration values to fix type mismatch warnings in GCC-14. This specifically addresses an issue in utils/advanced_rpc_compressor.hh where comparing a uint8_t with 0 triggered a '-Werror=type-limits' warning: ``` error: comparison is always false due to limited range of data type [-Werror=type-limits] if (x < 0 \|\| x >= static_cast<underlying>(type::COUNT)) ~~^~~ ``` Using std::to_underlying() provides clearer type semantics and avoids these kind of comparison warnings. This change improves code readability while maintaining the same behavior. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22898	2025-02-19 12:12:28 +03:00
Benny Halevy	cc281ff88d	test_tablet_repair_scheduler: prepare_multi_dc_repair: use create_new_test_keyspace and return the keyspace unique name to the caller. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-19 09:35:33 +02:00
Aleksandra Martyniuk	f8e4198e72	service: tasks: hold token_metadata_ptr in tablet_virtual_task Hold token_metadata_ptr in tablet_virtual_task methods that iterate over tablets, to keep the tablet_map alive. Fixes: https://github.com/scylladb/scylladb/issues/22316. Closes scylladb/scylladb#22740	2025-02-19 09:33:53 +02:00
Dusan Malusev	4e6ea232d2	docs: add instruction for installing cassandra-stress Signed-off-by: Dusan Malusev <dusan.malusev@scylladb.com> Closes scylladb/scylladb#21723	2025-02-19 09:25:16 +02:00
Benny Halevy	cbe79b20f7	test/repair: create_table_insert_data_for_repair: create keyspace with unique name and return it to the caller Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-19 08:56:07 +02:00
Benny Halevy	9829b1594f	topology_tasks/test_tablet_tasks: use new_test_keyspace Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-19 08:52:59 +02:00
Benny Halevy	12f85ce57c	topology_tasks/test_node_ops_tasks: use new_test_keyspace Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-19 08:52:59 +02:00
Benny Halevy	0564e95c51	topology_custom/test_zero_token_nodes_no_replication: use create_new_test_keyspace Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-19 08:52:59 +02:00
Benny Halevy	46b1850f0c	topology_custom/test_zero_token_nodes_multidc: use create_new_test_keyspace Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-19 08:52:59 +02:00
Benny Halevy	b810791fbb	topology_custom/test_view_build_status: use new_test_keyspace Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-19 08:52:59 +02:00
Benny Halevy	2d4af01281	topology_custom/test_truncate_with_tablets: use new_test_keyspace Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-19 08:52:58 +02:00
Benny Halevy	16ef78075c	topology_custom/test_topology_failure_recovery: use new_test_keyspace Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-19 08:43:35 +02:00
Benny Halevy	96d327fb83	topology_custom/test_tablets_removenode: use create_new_test_keyspace Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-19 08:43:35 +02:00
Benny Halevy	f30e4c6917	topology_custom/test_tablets_migration: use new_test_keyspace Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-19 08:43:35 +02:00
Benny Halevy	20f7eda16e	topology_custom/test_tablets_merge: use new_test_keyspace Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-19 08:43:35 +02:00
Benny Halevy	5ff3153912	topology_custom/test_tablets_intranode: use new_test_keyspace Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-19 08:43:35 +02:00
Benny Halevy	e59aca66bf	topology_custom/test_tablets_cql: use new_test_keyspace And create_new_test_keyspace when we need drop to be explicit. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-19 08:43:35 +02:00
Benny Halevy	6b37d04aa9	topology_custom/test_tablets2: use *new_test_keyspace Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-19 08:43:35 +02:00
Benny Halevy	0b88ea9798	topology_custom/test_tablets2: test_schema_change_during_cleanup: drop unused check function Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-19 08:43:35 +02:00
Benny Halevy	649e68c6db	topology_custom/test_tablets: use new_test_keyspace Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-19 08:43:35 +02:00
Benny Halevy	005ceb77d3	topology_custom/test_table_desc_read_barrier: use new_test_keyspace Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-19 08:43:35 +02:00
Benny Halevy	50a8f5c1c0	topology_custom/test_shutdown_hang: use new_test_keyspace Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-19 08:43:35 +02:00
Benny Halevy	4fd6c2d24e	topology_custom/test_select_from_mutation_fragments: use new_test_keyspace Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-19 08:43:35 +02:00
Benny Halevy	72bc4016e7	topology_custom/test_rpc_compression: use new_test_keyspace Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-19 08:43:35 +02:00
Benny Halevy	47326d01b7	topology_custom/test_reversed_queries_during_simulated_upgrade_process: use new_test_keyspace Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-19 08:43:35 +02:00

1 2 3 4 5 ...

46837 Commits