Commit Graph

46837 Commits

Author SHA1 Message Date
Avi Kivity
6e70e69246 test/lib: mutation_assertions: deinline
While generally better to reduce inline code, here we get
rid of the clustering_interval_set.hh dependency, which in turns
depends on boost interval_set, a large dependency.

incremental_compaction_test.cc is adjusted for a missing header.

Closes scylladb/scylladb#22957
2025-02-25 11:40:54 +01:00
Calle Wilund
e49f2046e5 generic_server: Update conditions for is_broken_pipe_or_connection_reset
Refs scylla-enterprise#5185
Fixes #22901

If a tls socket gets EPIPE the error is not translated to a specific
gnutls error code, but only a generic ERROR_PULL/PUSH. Since we treat
EPIPE as ignorable for plain sockets, we need to unwind nested exception
here to detect that the error was in fact due to this, so we can suppress
log output for this.

Closes scylladb/scylladb#22888
2025-02-25 10:35:11 +02:00
Kefu Chai
9fdbe0e74b tree: Remove unused boost headers
This commit eliminates unused boost header includes from the tree.

Removing these unnecessary includes reduces dependencies on the
external Boost.Adapters library, leading to faster compile times
and a slightly cleaner codebase.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22997
2025-02-25 10:32:32 +03:00
Kefu Chai
42335baec5 backup_task: Use INFO level for upload abort during shutdown
When a backup upload is aborted due to instance shutdown, change the log
level from ERROR to INFO since this is expected behavior. Previously,
 `abort_requested_exception` during upload would trigger an ERROR log, causing
test failures since error logs indicate unexpected issues.

This change:
- Catches `abort_requested_exception` specifically during file uploads
- Logs these shutdown-triggered aborts at INFO level instead of ERROR
- Aligns with how `abort_requested_exception` is handled elsewhere in the service

This prevents false test failures while still informing administrators
about aborted uploads during shutdown.

Fixes scylladb/scylladb#22391

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22995
2025-02-25 10:32:10 +03:00
Benny Halevy
55dbf5493c docs: document the views-with-tablets experimental feature
Refs scylladb/scylladb#22217

Fixes scylladb/scylladb#22893

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#22896
2025-02-24 17:23:08 +01:00
Avi Kivity
d99df7af6c Merge 'Respect per-shard tablet goal and 10x default per-shard tablet count' from Tomasz Grabiec
This series achieves two things:

1) changes default number of tablet replicas per shard to be 10 in order to reduce load imbalance between shards

    This will result in new tables having at least 10 tablet replicas per
    shard by default.

    We want this to reduce tablet load imbalance due to differences in
    tablet count per shard, where some shards have 1 tablet and some
    shards have 2 tablets. With higher tablet count per shard, this
    difference-by-one is less relevant.

    Fixes https://github.com/scylladb/scylladb/issues/21967

2) introduces a global goal for tablet replica count per shard and adds logic to tablet scheduler to respect it by controlling per-table tablet count

    The per-shard goal is enforced by controlling average per-shard tablet replica
    count in a given DC, which is controlled by per-table tablet
    count. This is effective in respecting the limit on individual shards
    as long as tablet replicas are distributed evenly between shards.
    There is no attempt to move tablets around in order to enforce limits
    on individual shards in case of imbalance between shards.

    If the average per-shard tablet count exceeds the limit, all tables
    which contribute to it (have replicas in the DC) are scaled down
    by the same factor. Due to rounding up to the nearest power of 2,
    we may overshoot the per-shard goal by at most a factor of 2.

    The scaling is applied after computing desired tablet count due to
    all other factors: per-table tablet count hints, defaults, average tablet size.

    If different DCs want different scale factors of a given table, the
    lowest scale factor is chosen for a given table.

    When creating a new table, its tablet count is determined by tablet
    scheduler using the scheduler logic, as if the table was already created.
    So any scaling due to per-shard tablet count goal is reflected immediately
    when creating a table. It may however still take some time for the system
    to shrink existing tables. We don't reject requests to create new tables.

    Fixes #21458

Closes scylladb/scylladb#22522

* github.com:scylladb/scylladb:
  config, tablets: Allow tablets_initial_scale_factor to be a fraction
  test: tablets_test: Test scaling when creating lots of tables
  test: tablets_test: Test tablet count changes on per-table option and config changes
  test: tablets_test: Add support for auto-split mode
  test: cql_test_env: Expose db config
  config: Make tablets_initial_scale_factor live-updateable
  tablets: load_balancer: Pick initial_scale_factor from config
  tablets, load_balancer: Fix and improve logging of resize decisions
  tablets, load_balancer: Log reason for target tablet count
  tablets: load_balancer: Move hints processing to tablet scheduler
  tablets: load_balancer: Scale down tablet count to respect per-shard tablet count goal
  tablets: Use scheduler's make_sizing_plan() to decide about tablet count of a new table
  tablets: load_balancer: Determine desired count from size separately from count from options
  tablets: load_balancer: Determine resize decision from target tablet count
  tablets: load_balancer: Allow splits even if table stats not available
  tablets: load_balancer: Extract make_sizing_plan()
  tablets: Add formatter for resize_decision::way_type
  tablets: load_balancer: Simplify resize_urgency_cmp()
  tablets: load_balancer: Keep config items as instance members
  locator: network_topology_strategy: Simplify calculate_initial_tablets_from_topology()
  tablets: Change the meaning of initial_scale to mean min-avg-tablets-per-shard
  tablets: Set default initial tablet count scale to 10
  tablets: network_topology_stragy: Coroutinize calculate_initial_tablets_from_topology()
  tablets: load_balancer: Extract get_schema_and_rs()
  tablets: load_balancer: Drop test_mode
2025-02-24 17:59:26 +02:00
Łukasz Paszkowski
9ec1a457d6 alter_keyspace_statement: Include tablets information in system.topology
Altering a keyspace (that has tablets enabled) without changing
tablets attributes, i.e. no `AND tablets = {...}` results in incorrect
"Update Keyspace..." log message being printed. The printed log
contains "tablets={"enabled":false}".

Refs https://github.com/scylladb/scylladb/issues/22261

Closes scylladb/scylladb#22324
2025-02-24 15:11:14 +02:00
Botond Dénes
6ae3076b4e Merge 'tablet-mon.py: Improve split&merge visualization and make tablet id text optional in table mode' from Tomasz Grabiec
Tablet sequeunce number was part of the tablet identifier together
with last token, so on split and merge all ids changed and it appeared
in the simulator as all tablets of a table dropping and being created
anew. That's confusing. After this change, only last token is part of
the id, so split appears as adding tablets and merge appears as
removing half the tablets, which is more accurate.

Also includes an enhancement to make showing of tablet id text
optional in table mode.

Closes scylladb/scylladb#22981

* github.com:scylladb/scylladb:
  tablet-mon.py: Don't show merges and splits as full table recreations
  tablet-mon.py: Add toggle for tablet ids
2025-02-24 15:09:54 +02:00
Takuya ASADA
f2a8ae101b dist/docker: drop hostname package, use Python API
We currently depends on hostname command to get local IP, but we can do
this on Python API.
After the change, we can drop the package.

Closes scylladb/scylladb#22909
2025-02-24 15:03:44 +02:00
Anna Stuchlik
d0a48c5661 doc: remove the reference to the 6.2 version
This commit removes the OSS version name, which is irrelevant
and confusing for 2025.1 and later users.
Also, it updates the warning to avoid specifying the release
when the deprecated feature will be removed.

Fixes https://github.com/scylladb/scylladb/issues/22839

Closes scylladb/scylladb#22936
2025-02-24 15:02:11 +02:00
Botond Dénes
6ab16006a2 Merge 'Untangle sstable-directory vs sstable in pending log creation code' from Pavel Emelyanov
There's a sstable_directory::create_pending_deletion_log() helper method that's called by sstable's filesystem_storage atomic-delete methods and that prepares the deletion log for a bunch of sstables. For that method to do its job it needs to get private sstable->_storage field (which is always the filesystem_storage one), also the atomic-delete transparent context object is leaked into the sstable_directory code and low-level sstable storage code needs to include higher-level sstable_directory header.

This patch unties these knots. As the result:
- friendship between sstable and sstable_directory is removed
- transparent atomic_delete_context is encapsulated in storage.(cc|hh) code
- less code for create_pending_deletion_log() to dump TOC filename into log

Closes scylladb/scylladb#22823

* github.com:scylladb/scylladb:
  sstable: Unfriend sstable_directory class
  sstable_directory: Move sstable_directory::pending_delete_result
  sstable_directory: Calculate prefixes outside of create_pending_deletion_log()
  sstable_directory: Introduce local pending_delete_log variable
  sstable_directory: Relax toc file dumping to deletion log
2025-02-24 14:58:37 +02:00
Paweł Zakrzewski
854d2917a1 cql3/select_statement: reject PER PARTITION LIMIT with SELECT DISTINCT
Before this patch we silently allowed and ignored PER PARTITION LIMIT.
SELECT DISTINCT requires all the partition key columns, which means that
setting PER PARTITION LIMIT is redundant - only one result will be
returned from every partition anyway.

Cassandra behaves the same way, so this patch also ensures
compatibility.

Fixes scylladb/scylladb#15109

Closes scylladb/scylladb#22950
2025-02-24 14:50:18 +02:00
Yaron Kaikov
e6227f9a25 install-dependencies.sh: update node_exporter to 1.9.0
Update node_exporter to 1.9.0 to resolve the following CVE's
https://github.com/advisories/GHSA-49gw-vxvf-fc2g
https://github.com/advisories/GHSA-8xfx-rj4p-23jm
https://github.com/advisories/GHSA-crqm-pwhx-j97f
https://github.com/advisories/GHSA-j7vj-rw65-4v26

Fixes: https://github.com/scylladb/scylladb/issues/22884

regenerate frozen toolchain with optimized clang from
* https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-aarch64.tar.gz
* https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-x86_64.tar.gz

Closes scylladb/scylladb#22987
2025-02-24 13:49:36 +02:00
Avi Kivity
1891e10b7b sstables: writer.hh: drop unneeded boost depedencies
Closes scylladb/scylladb#22955
2025-02-24 13:26:44 +03:00
Avi Kivity
58d4d8142a install-dependencies.sh: harden pip_packages against shellcheck
pip_packages is an associative array, which in bash is constructed
as ([key]=value...). In our case the value is often empty (indicating
no version constraint). Shellcheck warns against it, since `[key]= x`
could be a mistype of `[key]=x`. It's not in our case, but shellcheck
doesn't know that.

Make shellcheck happier by specifying the empty values explicitly.

Closes scylladb/scylladb#22990
2025-02-24 13:26:10 +03:00
Kefu Chai
dfa40972bb topology_custom/test_zero_token_nodes_multidc: Enhance test logging and error handling
Add verbose logging to identify failing test combinations in multi-DC
setup:

- Log replication factor (RF) and consistency level (CL) for each test
  iteration
- Add validation checks for empty result sets

Improve error handling:
- Before indexing in a list, use `assert` to check for its emptiness
- Use assertion failures instead of exceptions for clearer test diagnostics

This change helps debug test failures by showing which RF/CL
combinations cause inconsistent results between zero-token and regular
nodes.

Refs scylladb/scylladb#22967

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22968
2025-02-24 11:09:51 +01:00
Kefu Chai
7bf7817e8a docs/cql: s/wasm32-wasi/wasm32-wasip1/
Rust's WASI target of wasm32-wasi was renamed to wasm32-wasip1, see
https://blog.rust-lang.org/2024/04/09/updates-to-rusts-wasi-targets.html.
and our building system has been adapted to this change. let's update
the document to reflect this change.

Fixes scylladb/scylladb#20878
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21184
2025-02-24 11:06:46 +01:00
Patryk Jędrzejczak
de751cad03 Merge 'test/topology_experimental_raft: add test_topology_upgrade_stuck' from Piotr Dulikowski
The test simulates the cluster getting stuck during upgrade to raft
topology due to majority loss, and then verifies that it's possible to
get out of the situation by performing recovery and redoing the upgrade.

Fixes: #17410

Closes scylladb/scylladb#17675

* https://github.com/scylladb/scylladb:
  test/topology_experimental_raft: add test_topology_upgrade_stuck
  test.py: bump minimum python version to 3.11
  test.py: move gather_safely to pylib utils
  cdc: generation: don't capture token metadata when retrying update
  test.py: topology: ignore hosts when waiting for group0 consistency
  raft: add error injection that drops append_entries
  topology_coordinator: add injection which makes upgrade get stuck
2025-02-24 11:02:32 +01:00
Kefu Chai
d92646a17e install.sh: simplify check_usermode_support()
because we don't care about the exact output of grep, let's silence
its output. also, no need to check for the string is empty, so let's
just use the status code of the grep for the return value of the
function, more idiomatic this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22737
2025-02-24 11:29:30 +03:00
Evgeniy Naydanov
99be9ac8d8 test.py: test_random_failures: improve handling of hung node
In some cases the paused/unpaused node can hang not after 30s timeout.
This make the test flaky.  Change the condition to always check the
coordinator's log if there is a hung node.

Add `stop_after_streaming` to the list of error injections which can
cause a node's hang.

Also add a wait for a new coordinator election in cluster events
which cause such elections.

Closes scylladb/scylladb#22825
2025-02-24 10:23:05 +03:00
Kefu Chai
fd52b0a3cc cql3: fix false-positive "used-after-move" warning in clang-tidy
`slice.is_reversed()` was falsely flagged as accessing moved data, since
the underlying enum_set remains valid after move. However, to improve code
clarity and silence the warning, now reference `command->slice` directly
instead, which is guaranteed to be valid as the move target.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22971
2025-02-23 18:58:35 +02:00
Marcin Maliszkiewicz
f34ea308b3 transport: remove unused _request_cpu from connection 2025-02-23 18:32:14 +02:00
Benny Halevy
7a4c563e40 feed_writers: optimize error path
Eliminate one try/catch block around call to wr.close()
by using coroutine::as_future.

Mark error paths as `[[unlikely]]`.

Use `coroutine::return_exception_ptr` to avoid rethrowing
the final exception.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#22831
2025-02-23 18:22:39 +02:00
Dawid Mędrek
138645f744 install-dependencies.sh: Make script capable of updating pip packages
Before these changes, the script didn't update the listed pip packages
if they were already installed. If the latest version of Scylla started
using new features and required an updated Python driver, for example,
the developers (and possibly the user) were forced to update it manually.

In this commit, we modify the script so that it updates the installed
packages when run. This should make things easier for everyone.

Closes scylladb/scylladb#22912
2025-02-23 16:26:50 +02:00
Yaron Kaikov
084f4d2ee3 .github/scripts/auto-backport.py: search for Fixes also in commits
In #22650 the backport process wasn't completed since the PR body didn't include the Fixes ref as expected but the commits did have it

Expanding the search for `Fixes` to include commits in the same PR

Fixes: https://github.com/scylladb/scylla-pkg/issues/4899

Closes scylladb/scylladb#22988
2025-02-23 13:20:28 +02:00
Pavel Emelyanov
a6c882e4e3 sstables: Remove dead get_config() and db::config declarations
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#22974
2025-02-21 15:56:04 +01:00
Tomasz Grabiec
62d53d2a47 tablet-mon.py: Don't show merges and splits as full table recreations
Tablet sequeunce number was part of the tablet identifier together
with last token, so on split and merge all ids changed and it appeared
in the simulator as all tablets of a table dropping and being created
anew. That's confusing. After this change, only last token is part of
the id, so split appears as adding tablets and merge appears as
removing half the tablets, which is more accurate.
2025-02-21 15:34:48 +01:00
Tomasz Grabiec
7227d70d4d tablet-mon.py: Add toggle for tablet ids 2025-02-21 15:34:48 +01:00
Kefu Chai
a80d7e6159 test/pylib: test/pylib: Simplify boolean logic in pagination check
Replace complex boolean expression:
```py
  not driver_response_future.has_more_pages or not all_pages
```
with clearer equivalent:
```py
  driver_response_future.has_more_pages and all_pages
```
The new expression is more intuitive as it directly checks for both
conditions (having more pages and wanting all pages) rather than using double
negation.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22969
2025-02-21 14:21:09 +03:00
Emil Maskovsky
574224491d raft/test: adjust the "raft_ignore_nodes" test for limited voters
Before the limited voters feature, the "raft_ignore_nodes" test was
relying upon the fact that all nodes will become voters.

With the limited voters feature, the test needs to be adjusted to
ensure that we do not lose the majority of the cluster. This could
happen when there are 7 nodes, but only 5 of them are voters - then if
we kill 3 nodes randomly we might end up with only 2 voters left.

Therefore we need to ensure that we only stop the appropriate number of
voter nodes. So we need to determine which nodes became voters and which
ones are non-voters, and select the nodes to be stopped based on that.

That means with 7 nodes and 5 voters, we can stop up to 2 voter nodes,
but at least one of the stopped nodes must be a non-voter.

Fixes: scylladb/scylladb#22902

Refs: scylladb/scylladb#18793
Refs: scylladb/scylladb#21969

Closes scylladb/scylladb#22904
2025-02-20 18:42:03 +01:00
Patryk Jędrzejczak
6bb1ed2ef4 Merge 'Merge topology_tasks and topology_random_failures into topology_custom' from Artsiom Mishuta
Now that we support suite subfolders, there is no
need to create an own suite for topology_tasks and topology_random_failures.

Closes scylladb/scylladb#22879

* https://github.com/scylladb/scylladb:
  test.py: merge topology_tasks suite into topology_custom suite
  test.py: merge topology_random_failures suite into topology_customs
2025-02-20 16:02:45 +01:00
Patryk Jędrzejczak
78c227c521 Merge 'raft topology: Add support for raft topology init to happen before group0 initialization' from Abhinav Kumar Jha
In the current scenario, the problem discovered is that there is a time
gap between group0 creation and raft_initialize_discovery_leader call.
Because of that, the group0 snapshot/apply entry enters wrong values
from the disk(null) and updates the in-memory variables to wrong values.
During the above time gap, the in-memory variables have wrong values and
perform absurd actions.

This PR removes the variable `_manage_topology_change_kind_from_group0`
which was used earlier as a work around for correctly handling
`topology_change_kind` variable, it was brittle and had some bugs
(causing issues like scylladb/scylladb#21114). The reason for this bug
that _manage_topology_change_kind used to block reading from disk and
was enabled after group0 initialization and starting raft server for the
restart case. Similarly, it was hard to manage `topology_change_kind`
using `_manage_topology_change_kind_from_group0` correctly in bug free
manner.

Post `_manage_topology_change_kind_from_group0` removal, careful
management of `topology_change_kind` variable was needed for maintaining
correct `topology_change_kind` in all scenarios. So this PR also
performs a refactoring to populate all init data to system tables even
before group0 creation(via `raft_initialize_discovery_leader` function).
Now because `raft_initialize_discovery_leader` happens before the group
0 creation, we write mutations directly to system tables instead of a
group 0 command. Hence, post group0 creation, the node can read the
correct values from system tables and correct values are maintained
throughout.

Added a new function `initialize_done_topology_upgrade_state` which
takes care of updating the correct upgrade state to system tables before
starting group0 server. This ensures that the node can read the correct
values from system tables and correct values are maintained throughout.

By moving `raft_initialize_discovery_leader` logic to happen before
starting group0 server, and not as group0 command post server start, we
also get rid of the potential problem of init group0 command not being
the 1st command on the server. Hence ensuring full integrity as expected
by programmer.

This PR fixes a bug. Hence we need to backport it.

Fixes: scylladb/scylladb#21114

Closes scylladb/scylladb#22484

* https://github.com/scylladb/scylladb:
  storage_service: Remove the variable _manage_topology_change_kind_from_group0
  storage_service: fix indentation after the previous commit
  raft topology: Add support for raft topology system tables initialization to happen before group0 initialization
  service/raft: Refactor mutation writing helper functions.
2025-02-20 14:42:39 +01:00
Benny Halevy
29b795709b token_group_based_splitting_mutation_writer: maybe_switch_to_new_writer: prevent double close
Currently, maybe_switch_to_new_writer resets _current_writer
only in a continuation after closing the current writer.
This leaves a window of vulnerability if close() yields,
and token_group_based_splitting_mutation_writer::close()
is called. Seeing the engaged _current_writer, close()
will call _current_writer->close() - which must be called
exactly once.

Solve this when switching to a new writer by resetting
_current_writer before closing it and potentially yielding.

Fixes #22715

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#22922
2025-02-20 15:41:09 +03:00
Kefu Chai
ccbfe4f669 compaction: replace boost::range::find with std::ranges::find
Replace boost::range::find() calls with std::ranges::find(). This
change reduces external dependencies and modernizes the codebase.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22942
2025-02-20 14:25:08 +02:00
Anna Stuchlik
a28bbc22bd doc: remove references to Enterprise
This commit removes the redundant references to Enterprise,
which are no longer valid.

Fixes https://github.com/scylladb/scylladb/issues/22927

Closes scylladb/scylladb#22930
2025-02-20 11:24:34 +02:00
Raphael S. Carvalho
4d8a333a7f storage_service: Don't retry split when table is dropped
The split monitor wasn't handling the scenario where the table being
split is dropped. The monitor would be unable to find the tablet map
of such a table, and the error would be treated as a retryable one
causing the monitor to fall into an endless retry loop, with sleeps
in between. And that would block further splits, since the monitor
would be busy with the retries. The fix is about detecting table
was dropped and skipping to the next candidate, if any.

Fixes #21859.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#22933
2025-02-20 10:13:55 +01:00
Gleb Natapov
914c9f1711 treewide: include build_mode.hh for SCYLLA_BUILD_MODE_RELEASE where it is missing
Fixes: #22914

Closes scylladb/scylladb#22915
2025-02-20 10:50:04 +03:00
Botond Dénes
1f553457dc Merge 'test/topology: use standard new_test_keyspace functions' from Benny Halevy
This PR improves and refactors the test.topology.util new_test_keyspace generator
and adds a corresponding create_new_test_keyspace function to be used by most if not
all topology unit tests in order to standardize the way the tests create keyspaces
and to mitigate the python driver create keyspace retry issue: https://github.com/scylladb/python-driver/issues/317

Fixes #22342
Fixes #21905
Refs https://github.com/scylladb/scylla-enterprise/issues/5060

* No backport required, though may be desired to stabilize CI also in release branches.

Closes scylladb/scylladb#22399

* github.com:scylladb/scylladb:
  test_tablet_repair_scheduler: prepare_multi_dc_repair: use create_new_test_keyspace
  test/repair: create_table_insert_data_for_repair: create keyspace with unique name
  topology_tasks/test_tablet_tasks: use new_test_keyspace
  topology_tasks/test_node_ops_tasks: use new_test_keyspace
  topology_custom/test_zero_token_nodes_no_replication: use create_new_test_keyspace
  topology_custom/test_zero_token_nodes_multidc: use create_new_test_keyspace
  topology_custom/test_view_build_status: use new_test_keyspace
  topology_custom/test_truncate_with_tablets: use new_test_keyspace
  topology_custom/test_topology_failure_recovery: use new_test_keyspace
  topology_custom/test_tablets_removenode: use create_new_test_keyspace
  topology_custom/test_tablets_migration: use new_test_keyspace
  topology_custom/test_tablets_merge: use new_test_keyspace
  topology_custom/test_tablets_intranode: use new_test_keyspace
  topology_custom/test_tablets_cql: use new_test_keyspace
  topology_custom/test_tablets2: use *new_test_keyspace
  topology_custom/test_tablets2: test_schema_change_during_cleanup: drop unused check function
  topology_custom/test_tablets: use new_test_keyspace
  topology_custom/test_table_desc_read_barrier: use new_test_keyspace
  topology_custom/test_shutdown_hang: use new_test_keyspace
  topology_custom/test_select_from_mutation_fragments: use new_test_keyspace
  topology_custom/test_rpc_compression: use new_test_keyspace
  topology_custom/test_reversed_queries_during_simulated_upgrade_process: use new_test_keyspace
  topology_custom/test_raft_snapshot_truncation: use create_new_test_keyspace
  topology_custom/test_raft_no_quorum: use new_test_keyspace
  topology_custom/test_raft_fix_broken_snapshot: use new_test_keyspace
  topology_custom/test_query_rebounce: use new_test_keyspace
  topology_custom/test_not_enough_token_owners: use new_test_keyspace
  topology_custom/test_node_shutdown_waits_for_pending_requests: use new_test_keyspace
  topology_custom/test_node_isolation: use create_new_test_keyspace
  topology_custom/test_mv_topology_change: use new_test_keyspace
  topology_custom/test_mv_tablets_replace: use new_test_keyspace
  topology_custom/test_mv_tablets_empty_ip: use new_test_keyspace
  topology_custom/test_mv_tablets: use new_test_keyspace
  topology_custom/test_mv_read_concurrency: use new_test_keyspace
  topology_custom/test_mv_fail_building: use new_test_keyspace
  topology_custom/test_mv_delete_partitions: use new_test_keyspace
  topology_custom/test_mv_building: use new_test_keyspace
  topology_custom/test_mv_backlog: use new_test_keyspace
  topology_custom/test_mv_admission_control: use new_test_keyspace
  topology_custom/test_major_compaction: use new_test_keyspace
  topology_custom/test_maintenance_mode: use new_test_keyspace
  topology_custom/test_lwt_semaphore: use new_test_keyspace
  topology_custom/test_ip_mappings: use new_test_keyspace
  topology_custom/test_hints: use new_test_keyspace
  topology_custom/test_group0_schema_versioning: use new_test_keyspace
  topology_custom/test_data_resurrection_after_cleanup: use new_test_keyspace
  topology_custom/test_read_repair_with_conflicting_hash_keys: use new_test_keyspace
  topology_custom/test_read_repair: use new_test_keyspace
  topology_custom/test_compacting_reader_tombstone_gc_with_data_in_memtable: use new_test_keyspace
  topology_custom/test_commitlog_segment_data_resurrection: use new_test_keyspace
  topology_custom/test_change_replication_factor_1_to_0: use new_test_keyspace
  topology/test_tls: test_upgrade_to_ssl: use new_test_keyspace
  test/topology/util: new_test_keyspace: drop keyspace only on success
  test/topology/util: refactor new_test_keyspace
  test/topology/util: CREATE KEYSPACE IF NOT EXISTS
  test/topology/util: new_test_keyspace: accept ManagerClient
2025-02-20 09:43:15 +02:00
Kefu Chai
ddfd438434 cql3: replace boost::accumulate() with std::ranges::fold_left()
Replace boost::accumulate() calls with std::ranges::fold_left(). This
change reduces external dependencies and modernizes the codebase.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22924
2025-02-20 09:32:17 +03:00
Kefu Chai
5be39740a8 tree: migrate from boost::find to std::ranges algorithms
Replace boost::find() calls with std::ranges::find() and std::ranges::contains()
to leverage modern C++ standard library features. This change reduces external
dependencies and modernizes the codebase.

The following changes were made:
- Replaced boost::find() with std::ranges::find() where index/iterator is needed
- Used std::ranges::contains() for simple element presence checks

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22920
2025-02-20 09:28:57 +03:00
Tomasz Grabiec
1a7023c85a config, tablets: Allow tablets_initial_scale_factor to be a fraction
We may want fewer than 1 tablets per shard in large clusters.

The per-table option is a fraction, so for consistency, this should be
too.
2025-02-19 16:29:08 +01:00
Tomasz Grabiec
2b2fa0203e test: tablets_test: Test scaling when creating lots of tables 2025-02-19 16:29:08 +01:00
Tomasz Grabiec
0e111990a1 test: tablets_test: Test tablet count changes on per-table option and config changes 2025-02-19 16:29:08 +01:00
Tomasz Grabiec
5e471c6f1b test: tablets_test: Add support for auto-split mode
rebalance_tablets() was performing migrations and merges automatically
but not splits, because splits need to be acked by replicas via
load_stats. It's inconvenient in tests which want to rebalance to the
equilibrium point. This patch changes rebalance_tablets() to split
automatically by default, can be disabled for tests which expect
differently.

shared_load_stats was introduced to provide a stable holder of
load_stats which can be reused across rebalance_tablets() calls.
2025-02-19 16:29:08 +01:00
Tomasz Grabiec
f3b63bfeff test: cql_test_env: Expose db config 2025-02-19 16:29:08 +01:00
Tomasz Grabiec
3d01ce3707 config: Make tablets_initial_scale_factor live-updateable 2025-02-19 16:29:08 +01:00
Tomasz Grabiec
7e4a61953d tablets: load_balancer: Pick initial_scale_factor from config
So that it can be live-updated.
2025-02-19 16:29:08 +01:00
Tomasz Grabiec
41789962ef tablets, load_balancer: Fix and improve logging of resize decisions
Resize is no longer only due to avg tablet size. Log avg tablet size as an
information, not the reason, and log the true reason for target tablet
count.
2025-02-19 16:29:07 +01:00
Tomasz Grabiec
d1ccbee7f9 tablets, load_balancer: Log reason for target tablet count
Helps in debugging.
2025-02-19 16:29:07 +01:00
Tomasz Grabiec
029505b179 tablets: load_balancer: Move hints processing to tablet scheduler
Hints have common meaning for all strategies, so the logic
belongs more to make_sizing_plan().

As a side effect, we can reuse shard capacity computation across
tables, which reduces computational complexity from O(tables*nodes) to
O(tables * DCs + nodes)
2025-02-19 16:29:07 +01:00