scylla

Author	SHA1	Message	Date
Nadav Har'El	830e52008d	test/alternator: add more tests for TagResource Issue #16904 discovered that Alternator refuses to allow an empty tag value while it's useful (and DynamoDB allows it). This brought to my attention that our test coverage of the TagResource operation was lacking. So this patch adds more tests for some corner cases of TagResource which we missed, including the allowed lengths of tag keys and values. These tests reproduce #16904 (the case of empty tag value) and also #16908 (allowing and correctly counting unicode letters), and also add regression testing to cases which we already handled correctly. As usual, all the new tests also pass on DynamoDB. Refs #16904 Refs #16908 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-23 11:55:22 +02:00
Botond Dénes	a48881801a	replica/tablets: drop keyspace_name from system.tablets partition-key The name of the keyspace being part of the partition key is not useful, the table_id already uniquely identifies the table. The keyspace name being part of the key, means that code wanting to interact with this table, often has to resolve the table id, just to be able to provide the keyspace name. This is counter productive, so make the keyspace_name just a static column instead, just like table_name already is. Fixes: #16377 Closes scylladb/scylladb#16881	2024-01-22 13:12:02 +01:00
Kamil Braun	1007ac4956	Merge 'sync_raft_topology_nodes: force_remove_endpoint for left nodes only if an IP is not used by other nodes' from Petr Gusev Before the patch we called `gossiper.remove_endpoint` for IP-s of the left nodes. The problem is that in replace-with-same-ip scenario we called `gossiper.remove_endpoint` for IP which is used by the new, replacing node. The `gossiper.remove_endpoint` method puts the IP into quarantine, which means gossiper will ignore all events about this IP for `quarantine_delay` (one minute by default). If we immediately replace just replaced node with the same IP again, the bootstrap will fail since the gossiper events are blocked for this IP, and we won't be able to resolve an IP for the new host_id. Another problem was that we called gossiper.remove_endpoint method, which doesn't remove an endpoint from `_endpoint_state_map`, only from live and unreachable lists. This means the IP will keep circulating in the gossiper message exchange between cluster nodes until full cluster restart. This patch fixes both of these problems. First, we rely on the fact that when topology coordinator moves the `being_replaced` node to the left state, the IP of the `replacing` node is known to all nodes. This means before removing an IP from the gossiper we can check if this IP is currently used by another node in the current raft topology. This is done by constructing the `used_ips` map based on normal and transition nodes. This map is cached to avoid quadratic behaviour. Second, we call `gossiper.force_remove_endpoint`, not `gossiper.remove_endpoint`. This function removes and IP from `_endpoint_state_map`, as well as from live and unreachable lists. Closes scylladb/scylladb#16820 * github.com:scylladb/scylladb: get_peer_info_for_update: update only required fields in raft topology mode get_peer_info_for_update: introduce set_field lambda storage_service::on_change: fix indent storage_service::on_change: skip handle_state functions in raft topology mode test_replace_different_ip: check old IP is removed from gossiper test_replace: check two replace with same IP one after another storage_service: sync_raft_topology_nodes: force_remove_endpoint for left nodes only if an IP is not used by other nodes	2024-01-22 11:25:55 +01:00
Botond Dénes	742bc1bd11	test/topology_experimental_raft: test_tablet.py: disable flaky test Skip test_tablet_missing_data_repair, it is failing a lot breaking promotion and CI. Can't revert because the PR introducing it was already piled on. So disable while investigated. Refs: #16859 Closes scylladb/scylladb#16879	2024-01-22 11:49:05 +02:00
Avi Kivity	9e8b65f587	chunked_vector: remove range constructor Standard containers don't have constructors that take ranges; instead people use boost::copy_range or C++23 std::ranges::to. Make the API more uniform by removing this special constructor. The only caller, in a test, is adjusted. Closes scylladb/scylladb#16905	2024-01-22 10:26:15 +02:00
Nadav Har'El	0bef50ef0c	cql-pytest: add "--vnodes" option to "run" script Running test/cql-pytest/run now defaults to enabling the "tablets" experimental feature when running Scylla - and tests detect this and use this feature as appropriate. This is the correct default going forward, but in the short term it would be nice to also have an option to easily do a manual test run without tablets. So this patch adds a "--vnodes" option to the test/cql-pytest/run script. This option causes "run" to run Scylla without enabling the "tablets" experimental feature. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16896	2024-01-22 09:35:11 +02:00
Eliran Sinvani	0e5a8cad62	Add test for mv prepared statements invalidation on base alter Issue #16392 describes a bug where when a base table is altered, it's materialized views prepared statements are not invalidated which in turn causes them to return missing data. This test reproduces this bug and serves as a regression test for this problem. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-21 15:44:06 +02:00
Petr Gusev	1e00889842	test_replace_different_ip: check old IP is removed from gossiper In this commit we modify the existing test_replace_different_ip. We add the check that the old IP is not contained in alive or down lists, which means it's completely wiped from gossiper. This test is failing without the force_remove_endpoint fix from a previous commit. We also check that the state of local system.peers table is correct.	2024-01-19 20:36:52 +04:00
Mikołaj Grzebieluch	c589793a9e	test.py: test_maintenance_socket: remove pytest.xfail Issue https://github.com/scylladb/python-driver/issues/278 was fixed in https://github.com/scylladb/python-driver/pull/279. Closes scylladb/scylladb#16873	2024-01-19 14:54:15 +01:00
Botond Dénes	b50d9bb802	Merge 'Add code coverage support' from Eliran Sinvani This mini-set includes code coverage support for ScyllaDB, it provides: 1. Support for building ScyllaDB with coverage support. 2. Utilities for processing coverage profiling data 3. test.py support for generation and processing of coverage profiling into an lcov trace files which can later be used to produce HTML or textual coverage reports. Refs #16323 Closes scylladb/scylladb#16784 * github.com:scylladb/scylladb: Add code coverage documentation test.py: support code coverage code coverage: Add libraries for coverage handling test.py: support --coverage and --coverage-mode configure.py support coverage profiles on standrad build modes	2024-01-19 15:27:44 +02:00
Pavel Emelyanov	e62114214f	Merge 'More logging for Raft-based topology' from Kamil Braun Currently if topology coordinator gets stuck in a CI test run it's hard to debug this (e.g. scylladb/scylladb#16708). We can add a lot of logging inside topology coordinator code to aid debugging, without spamming the logs -- these are relatively rare control plane events. Closes scylladb/scylladb#16749 * github.com:scylladb/scylladb: test/pylib: scylla_cluster: enable raft_topology=debug level by default raft topology: increase level of some TRACE messages raft topology: log when entering transition states raft topology: don't include null ID in exclude_nodes raft topology: INFO log when executing global commands and updating topology state storage_service: separate logger for raft topology	2024-01-19 16:19:44 +03:00
Botond Dénes	04881b3915	test/cql-pytest: run.py,suite.yaml: enable tablets by default All the preparations are done, the tests can now run with tablets.	2024-01-19 03:46:38 -05:00
Botond Dénes	075be5a04a	test/cql-pytest: sprinkle xfail_tablets and skip_with_tablets as needed For tests that cover functionality, which doesn't yet work with tablets. These tests and the respective functionality they test, are expected to be fixed soon, and then these fixtures will be removed.	2024-01-19 03:46:38 -05:00
Botond Dénes	6e6bee4368	test/cql-pytest: disable tablets for some keyspace-altering tests When tablets are enabled on a keyspace, they cannot be altered to simple replication strategy anymore. These keyspaces are testing exactly that, so disable tablets on the initial keyspace create statements.	2024-01-19 03:46:38 -05:00
Botond Dénes	5f11aa940d	test/cql-pytest: test_keyspace.py: test_storage_options_local(): fix for tablets This test expects a keyspace with local storage option, to not have a row in system_schema.scylla_keyspace. With tablets enabled by default, this won't be the case. Adjust the test to check for the specific storage-related columns instead.	2024-01-19 03:46:38 -05:00
Nadav Har'El	f92d2b4928	test/cql-pytest: fix test_tablets.py to set initial_tablets correctly Recently, in commit `49026dc319`, the way to choose the number of tablets in a new keyspace changed. This broke the test we had for a memory leak when many tablets were used, which saw the old syntax wasn't recognized and assumed Scylla is running without tablet support - so the test was skipped. Let's fix the syntax. After this patch the test passes if the tablets experimental feature is enabled, and only skipped if it isn't. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-19 03:46:38 -05:00
Botond Dénes	2119faf7fe	test/cql-pytest: add tablet detection logic and fixtures Add keyspace_has_tablets() utility function, which, given a keyspace, returns whether it is using tablets or not. In addition, 3 new fixtures are added: * has_tablets - does scylla has tablets by default? * xfail_tablets - the test is marked xfail, when tablets are enabled by default. * skip_with_tablets - the test is skipped when tablets are enabled by default, because it might crash with tablets. We expect the latter two to be removed soon(ish), as we make all test, and the functionality they test work with tablets.	2024-01-19 03:46:38 -05:00
Botond Dénes	6e53264bc3	test/cql-pytest: extract is_scylla check into util.py This logic is currently in the scylla_only fixture, but we want to re-use this in other utility functions in the next patches too.	2024-01-19 03:46:38 -05:00
Petr Gusev	070de5c551	test_replace: check two replace with same IP one after another This is a test case for the problem, described in the previous commit. Before that fix the second replace failed since it couldn't resolve an IP for the new host_id.	2024-01-19 12:24:04 +04:00
Avi Kivity	d65ce16cf6	Merge 'Prevent empty compaction tasks in cleanup, upgrade sstables, and add_sstable' from Benny Halevy This short series prevents the creation of compaction tasks when we know in advance that they have nothing to do. This is possible in the clean path by: - improve the detection of candidates for cleanup by skipping sstables that require cleanup but are already being compacted - checking that list of sstables selected for cleanup isn't empty before creating the cleanup task For upgrade sstables, and generally when rewriting all sstable: launch the task only if the list off candidate sstables isn't empty. For regular compaction, when triggered via `table::add_sstable_and_update_cache`, we currently trigger compaction (by calling `submit`) on all compaction groups while the sstable is added only to one of them. Also, it is typically called for maintenance sstables that are awaiting offstrategy compaction, in which case we can skip calling `submit` entirely since the caller triggers offstrategy compaction at a later stage. Refs scylladb/scylladb#15673 Refs scylladb/scylladb#16694 Fixes scylladb/scylladb#16803 Closes scylladb/scylladb#16808 * github.com:scylladb/scylladb: table: add_sstable_and_update_cache: trigger compaction only in compaction group compaction_manager: perform_task_on_all_files: return early when there are no sstables to compact compaction_manager: perform_cleanup: use compaction_manager::eligible_for_compaction	2024-01-18 19:47:33 +02:00
Kefu Chai	a1dcddd300	utils: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16833	2024-01-18 12:50:06 +02:00
Kamil Braun	e4918c0d31	test/pylib: scylla_cluster: enable raft_topology=debug level by default To help debugging test.py failures in CI.	2024-01-18 11:24:16 +01:00
Kamil Braun	71957b4320	storage_service: separate logger for raft topology Allows selectively enabling higher logging levels for just raft-topology related things, without doing it for the entire storage_service (which includes things like gossiper callbacks). Also gets rid of the redundant "raft topology:" prefix which was also not included everywhere.	2024-01-18 11:24:14 +01:00
Eliran Sinvani	c7dff1b81b	test.py: support code coverage test.py already support the routing of coverage data into a predetermined folder under the `tmpdir` logs folder. This patch extends on that and leverage the code coverage processing libraries to produce test coverage lcov files and a coverage summary at the end of the run. The reason for not generating the full report (which can be achieved with a one liner through the `coverage_utils.py` cli) is that it is assumed that unit testing is not necessarily the "last stop" in the testing process and it might need to be joined with other coverage information that is created at other testing stages (for example dtest). The result of this patch is that when running test.py with one of the coverage options (`--coverage` / `--mode-coverage`) it will perform another step of processing and aggregating the profiling information created. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-18 11:11:34 +02:00
Eliran Sinvani	00a55abdd6	code coverage: Add libraries for coverage handling Coverage handling is divided into 3 steps: 1. Generation of profiling data from a run of an instrumented file (which this patch doesn't cover) 2. Processing of profiling data, which involves indexing the profile and producing the data in some format that can be manipulated and unified. 3. Generate some reporting based on this data. The following patch is aiming to deal with the last two steps by providing a cli and a library for this end. This patch adds two libraries: 1. `coverage_utils.py` which is a library for manipulating coverage data, it also contains a cli for the (assumed) most common operations that are needed in order to eventually generate coverage reporting. 2. `lcov_utils.py` - which is a library to deal with lcov format data, which is a textual form containing a source dependant coverage data. An example of such manipulation can be `coverage diff` operation which produces a set like difference operation. cov_a - cov_b = diff where diff is an lcov formated file containing coverage data for code cov_a that is not covered at all in cov_b. The libraries and cli main goal is to provide a unified way to handle coverage data in a way that can be easily scriptable and extensible. This will pave the way for automating the coverage reporting and processing in test.py and in jenkins piplines (for example to also process dtest or sct coverage reporting) Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-18 11:11:34 +02:00
Eliran Sinvani	f4b6c9074a	test.py: support --coverage and --coverage-mode We aim to support code coverage reporting as part of our development process, to this end, we will need the ability to "route" the dumped profiles from scylla and unit test to a predetermined location. We can consider profile data as logged data that should persist after tests have been run. For this we add two supported options to test.py: --coverage - which means that all suits on all modes will participate in coverage. --coverage-mode - which can be used to "turn on" coverage support only for some of the modes in this run. The strategy chosen is to save the profile data in `tmpdir`/mode/coverage/%m.profraw (ref: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program) This means that for every suite the profiling data of each object is going to be merged into the same file (llvm claims to lock the file so concurrency is fine). More resolution than the suite level seems to not give us anything useful (at least not at the moment). Moreover, it can also be achieved by running a single test. Data in the suite level will help us to detect suits that don't generate coverage data at all and to fix this or to skip generating the profiles for them. Also added support of 'coverage' parameter in the `suite.yaml` file, which can be used to disable coverage for a specific suite, this parameter defaults to True but if a suite is known to not generate profiles or the suite profile data is not needed or obfuscate the result it can be set to false in order to cancel profiles routing and processing for this suite. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-18 11:11:34 +02:00
Botond Dénes	8087bc72f0	Merge 'Basic tablet repair support' from Asias He This patch adds basic tablet repair support. Below is an example showing how tablet repairs works. The `nodetool repair -pr` cmd was performed on all the nodes, which makes sure no duplication repair work will be performed and each tablet will be repaired exactly once. Three nodes in the cluster. RF = 2. 16 initial tablets. Tablets: ``` cqlsh> SELECT * FROM system.tablets; keyspace_name \| table_id \| last_token \| table_name \| tablet_count \| new_replicas \| replicas \| session \| stage ---------------+--------------------------------------+----------------------+------------+--------------+--------------+----------------------------------------------------------------------------------------+---------+------- ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -8070450532247928833 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 6), (2dd3808d-6601-4483-b081-adf41ef094e5, 5)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -6917529027641081857 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 0), (19caaeb3-d754-4704-a998-840df53eb54c, 5)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -5764607523034234881 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 2), (2dd3808d-6601-4483-b081-adf41ef094e5, 3)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -4611686018427387905 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 5), (2dd3808d-6601-4483-b081-adf41ef094e5, 4)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -3458764513820540929 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 1), (951cb5bc-5749-481a-9645-4dd0f624f24a, 0)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -2305843009213693953 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 7), (2dd3808d-6601-4483-b081-adf41ef094e5, 1)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -1152921504606846977 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 7), (951cb5bc-5749-481a-9645-4dd0f624f24a, 1)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -1 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 2), (2dd3808d-6601-4483-b081-adf41ef094e5, 7)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 1152921504606846975 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 6), (19caaeb3-d754-4704-a998-840df53eb54c, 2)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 2305843009213693951 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 5), (951cb5bc-5749-481a-9645-4dd0f624f24a, 7)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 3458764513820540927 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 1), (19caaeb3-d754-4704-a998-840df53eb54c, 3)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 4611686018427387903 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 7), (951cb5bc-5749-481a-9645-4dd0f624f24a, 1)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 5764607523034234879 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 6), (2dd3808d-6601-4483-b081-adf41ef094e5, 2)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 6917529027641081855 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 5), (951cb5bc-5749-481a-9645-4dd0f624f24a, 3)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 8070450532247928831 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 0), (19caaeb3-d754-4704-a998-840df53eb54c, 7)] \| null \| null ``` node1: ``` $nodetool repair -p 7199 -pr ks1 standard1 [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: starting user-requested repair for keyspace ks1, repair id 6, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 1 out of 5 tablets: table=ks1.standard1 tablet_id=2 range=(-6917529027641081857,-5764607523034234881] replicas={19caaeb3-d754-4704-a998-840df53eb54c:2, 2dd3808d-6601-4483-b081-adf41ef094e5:3} primary_replica_only=true [shard 2:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07399633 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7174440}, {127.0.0.2, 7174440}}, row_from_disk_nr={{127.0.0.1, 15330}, {127.0.0.2, 15330}}, row_from_disk_bytes_per_sec={{127.0.0.1, 92.4651}, {127.0.0.2, 92.4651}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 207172}, {127.0.0.2, 207172}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 2 out of 5 tablets: table=ks1.standard1 tablet_id=4 range=(-4611686018427387905,-3458764513820540929] replicas={19caaeb3-d754-4704-a998-840df53eb54c:1, 951cb5bc-5749-481a-9645-4dd0f624f24a:0} primary_replica_only=true [shard 1:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07302664 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7195032}, {127.0.0.3, 7195032}}, row_from_disk_nr={{127.0.0.1, 15374}, {127.0.0.3, 15374}}, row_from_disk_bytes_per_sec={{127.0.0.1, 93.9618}, {127.0.0.3, 93.9618}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 210526}, {127.0.0.3, 210526}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 3 out of 5 tablets: table=ks1.standard1 tablet_id=6 range=(-2305843009213693953,-1152921504606846977] replicas={19caaeb3-d754-4704-a998-840df53eb54c:7, 951cb5bc-5749-481a-9645-4dd0f624f24a:1} primary_replica_only=true [shard 7:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06781354 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7095816}, {127.0.0.3, 7095816}}, row_from_disk_nr={{127.0.0.1, 15162}, {127.0.0.3, 15162}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.7898}, {127.0.0.3, 99.7898}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 223584}, {127.0.0.3, 223584}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 4 out of 5 tablets: table=ks1.standard1 tablet_id=12 range=(4611686018427387903,5764607523034234879] replicas={19caaeb3-d754-4704-a998-840df53eb54c:6, 2dd3808d-6601-4483-b081-adf41ef094e5:2} primary_replica_only=true [shard 6:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06793772 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7150572}, {127.0.0.2, 7150572}}, row_from_disk_nr={{127.0.0.1, 15279}, {127.0.0.2, 15279}}, row_from_disk_bytes_per_sec={{127.0.0.1, 100.376}, {127.0.0.2, 100.376}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 224897}, {127.0.0.2, 224897}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 5 out of 5 tablets: table=ks1.standard1 tablet_id=13 range=(5764607523034234879,6917529027641081855] replicas={19caaeb3-d754-4704-a998-840df53eb54c:5, 951cb5bc-5749-481a-9645-4dd0f624f24a:3} primary_replica_only=true [shard 5:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.068579935 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7129512}, {127.0.0.3, 7129512}}, row_from_disk_nr={{127.0.0.1, 15234}, {127.0.0.3, 15234}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.1432}, {127.0.0.3, 99.1432}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 222135}, {127.0.0.3, 222135}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=6 duration=0.352379s ``` node2: ``` $nodetool repair -p 7200 -pr ks1 standard1 [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: starting user-requested repair for keyspace ks1, repair id 1, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 1 out of 6 tablets: table=ks1.standard1 tablet_id=1 range=(-8070450532247928833,-6917529027641081857] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:0, 19caaeb3-d754-4704-a998-840df53eb54c:5} primary_replica_only=true [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07016466 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7212816}, {127.0.0.2, 7212816}}, row_from_disk_nr={{127.0.0.1, 15412}, {127.0.0.2, 15412}}, row_from_disk_bytes_per_sec={{127.0.0.1, 98.0362}, {127.0.0.2, 98.0362}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 219655}, {127.0.0.2, 219655}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 2 out of 6 tablets: table=ks1.standard1 tablet_id=9 range=(1152921504606846975,2305843009213693951] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:5, 951cb5bc-5749-481a-9645-4dd0f624f24a:7} primary_replica_only=true [shard 5:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07180758 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7236216}, {127.0.0.3, 7236216}}, row_from_disk_nr={{127.0.0.2, 15462}, {127.0.0.3, 15462}}, row_from_disk_bytes_per_sec={{127.0.0.2, 96.104}, {127.0.0.3, 96.104}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 215325}, {127.0.0.3, 215325}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 3 out of 6 tablets: table=ks1.standard1 tablet_id=10 range=(2305843009213693951,3458764513820540927] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:1, 19caaeb3-d754-4704-a998-840df53eb54c:3} primary_replica_only=true [shard 1:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06772773 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7039188}, {127.0.0.2, 7039188}}, row_from_disk_nr={{127.0.0.1, 15041}, {127.0.0.2, 15041}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.1188}, {127.0.0.2, 99.1188}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 222080}, {127.0.0.2, 222080}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 4 out of 6 tablets: table=ks1.standard1 tablet_id=11 range=(3458764513820540927,4611686018427387903] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:7, 951cb5bc-5749-481a-9645-4dd0f624f24a:1} primary_replica_only=true [shard 7:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07025768 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7229664}, {127.0.0.3, 7229664}}, row_from_disk_nr={{127.0.0.2, 15448}, {127.0.0.3, 15448}}, row_from_disk_bytes_per_sec={{127.0.0.2, 98.1351}, {127.0.0.3, 98.1351}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 219876}, {127.0.0.3, 219876}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 5 out of 6 tablets: table=ks1.standard1 tablet_id=14 range=(6917529027641081855,8070450532247928831] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:0, 19caaeb3-d754-4704-a998-840df53eb54c:7} primary_replica_only=true [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0719635 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7225452}, {127.0.0.2, 7225452}}, row_from_disk_nr={{127.0.0.1, 15439}, {127.0.0.2, 15439}}, row_from_disk_bytes_per_sec={{127.0.0.1, 95.7531}, {127.0.0.2, 95.7531}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 214539}, {127.0.0.2, 214539}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 6 out of 6 tablets: table=ks1.standard1 tablet_id=15 range=(8070450532247928831,9223372036854775807] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:4, 19caaeb3-d754-4704-a998-840df53eb54c:3} primary_replica_only=true [shard 4:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0691715 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7122960}, {127.0.0.2, 7122960}}, row_from_disk_nr={{127.0.0.1, 15220}, {127.0.0.2, 15220}}, row_from_disk_bytes_per_sec={{127.0.0.1, 98.2049}, {127.0.0.2, 98.2049}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 220033}, {127.0.0.2, 220033}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=1 duration=0.42178s ``` node3: ``` $nodetool repair -p 7300 -pr ks1 standard1 [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: starting user-requested repair for keyspace ks1, repair id 1, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 1 out of 5 tablets: table=ks1.standard1 tablet_id=0 range=(minimum token,-8070450532247928833] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:6, 2dd3808d-6601-4483-b081-adf41ef094e5:5} primary_replica_only=true [shard 6:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07126866 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7133256}, {127.0.0.3, 7133256}}, row_from_disk_nr={{127.0.0.2, 15242}, {127.0.0.3, 15242}}, row_from_disk_bytes_per_sec={{127.0.0.2, 95.4529}, {127.0.0.3, 95.4529}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 213867}, {127.0.0.3, 213867}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 2 out of 5 tablets: table=ks1.standard1 tablet_id=3 range=(-5764607523034234881,-4611686018427387905] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:5, 2dd3808d-6601-4483-b081-adf41ef094e5:4} primary_replica_only=true [shard 5:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0701025 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7138404}, {127.0.0.3, 7138404}}, row_from_disk_nr={{127.0.0.2, 15253}, {127.0.0.3, 15253}}, row_from_disk_bytes_per_sec={{127.0.0.2, 97.1108}, {127.0.0.3, 97.1108}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 217581}, {127.0.0.3, 217581}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 3 out of 5 tablets: table=ks1.standard1 tablet_id=5 range=(-3458764513820540929,-2305843009213693953] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:7, 2dd3808d-6601-4483-b081-adf41ef094e5:1} primary_replica_only=true [shard 7:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06859512 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7171632}, {127.0.0.3, 7171632}}, row_from_disk_nr={{127.0.0.2, 15324}, {127.0.0.3, 15324}}, row_from_disk_bytes_per_sec={{127.0.0.2, 99.7068}, {127.0.0.3, 99.7068}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 223398}, {127.0.0.3, 223398}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 4 out of 5 tablets: table=ks1.standard1 tablet_id=7 range=(-1152921504606846977,-1] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:2, 2dd3808d-6601-4483-b081-adf41ef094e5:7} primary_replica_only=true [shard 2:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06975318 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7105176}, {127.0.0.3, 7105176}}, row_from_disk_nr={{127.0.0.2, 15182}, {127.0.0.3, 15182}}, row_from_disk_bytes_per_sec={{127.0.0.2, 97.1429}, {127.0.0.3, 97.1429}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 217653}, {127.0.0.3, 217653}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 5 out of 5 tablets: table=ks1.standard1 tablet_id=8 range=(-1,1152921504606846975] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:6, 19caaeb3-d754-4704-a998-840df53eb54c:2} primary_replica_only=true [shard 6:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.070810474 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7023276}, {127.0.0.3, 7023276}}, row_from_disk_nr={{127.0.0.1, 15007}, {127.0.0.3, 15007}}, row_from_disk_bytes_per_sec={{127.0.0.1, 94.5894}, {127.0.0.3, 94.5894}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 211932}, {127.0.0.3, 211932}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=1 duration=0.351395s ``` Fixes #16599 Closes scylladb/scylladb#16600 * github.com:scylladb/scylladb: test: Add test_tablet_missing_data_repair test: Add test_tablet_repair test: Allow timeout in server_stop_gracefully test: Increase STOP_TIMEOUT_SECONDS repair: Wire tablet repair with the user repair request repair: Pass raft_address_map to repair service repair: Add host2ip_t type repair: Add finished user-requested log for vnode table too repair: Log error in the rpc_stream_handler repair: Make row_level repair work with tablet repair: Add get_dst_shard_id repair: Add shard to repair_node_state repair: Add shard map to repair_neighbors	2024-01-18 09:13:00 +02:00
Asias He	1399dc0ff2	test: Add test_tablet_missing_data_repair The test verifies repair brings the missing rows to the owner. - Shutdown part of the nodes in the cluster - Insert data - Start all nodees - Run repair - Shutdown part of the nodes - Check all data is present	2024-01-18 08:49:06 +08:00
Asias He	bfe5894a9f	test: Add test_tablet_repair A basic repair test that verifies tablet repair works.	2024-01-18 08:49:06 +08:00
Asias He	39912d7bed	test: Allow timeout in server_stop_gracefully The default is 60s. Sometimes it takes more than 60s to stop a node for some reason.	2024-01-18 08:49:06 +08:00
Asias He	276b04a572	test: Increase STOP_TIMEOUT_SECONDS It is observed that the stop of scylla took more than 60s to finish in some cases. Increase the hard coded stop timeout.	2024-01-18 08:49:06 +08:00
Botond Dénes	f22fc88a64	Merge 'Configure service levels interval' from Michał Jadwiszczak Service level controller updates itself in interval. However the interval time is hardcoded in main to 10 seconds and it leads to long sleeps in some of the tests. This patch moves this value to `service_levels_interval_ms` command line option and sets this value to 0.5s in cql-pytest. Closes scylladb/scylladb#16394 * github.com:scylladb/scylladb: test:cql-pytest: change service levels intervals in tests configure service levels interval	2024-01-17 12:24:49 +02:00
Benny Halevy	51a46aa83b	compaction_manager: perform_task_on_all_files: return early when there are no sstables to compact Prevent the creation of a compaction task when the list of sstables is known to be empty ahead of time. Refs scylladb/scylladb#16694 Fixes scylladb/scylladb#16803 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-01-17 11:53:39 +02:00
Calle Wilund	af0772d605	commitlog: Add wait_for_pending_deletes Refs #16757 Allows waiting for all previous and pending segment deletes to finish. Useful if a caller of `discard_completed_segments` (i.e. a memtable flush target) not only wants to ensure segments are clean and released, but thoroughly deleted/recycled, and hence no treat to resurrecting data on crash+restart. Test included. Closes scylladb/scylladb#16801	2024-01-17 09:30:55 +02:00
Tomasz Grabiec	49026dc319	Merge 'Turn on tablets on keyspace by default when the feature is enabled' from Pavel Emelyanov To enable tablets replication one needs to turn on the (experimental) feature and specify the `initial_tablets: N` option when creating a keyspace. We want tablets to become default in the future and allow users to explicitly opt it out if they want to. This PR solves this by changing the CREATE KEYSPACE syntax wrt tablets options. Now there's a new TABLETS options map and the usage is * `CREATE KEYSPACE ...` will turn tablets on or off based on cluster feature being enabled/disabled * `CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false }` will turn tablets off regardless of what * `CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true }` will try to enable tablets with default configuration * `CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> }` is now the replacement for `REPLICATION = { ... 'initial_tablets': <int> }` thing fixes: #16319 Closes scylladb/scylladb#16364 * github.com:scylladb/scylladb: code: Enable tablets if cluster feature is enabled test: Turn off tablets feature by default test: Move test_tablet_drain_failure_during_decommission to another suite test/tablets: Enable tables for real on test keyspace test/tablets: Make timestamp local cql3: Add feature service to as_ks_metadata_update() cql3: Add feature service to ks_prop_defs::as_ks_metadata() cql3: Add feature service to get_keyspace_metadata() cql: Add tablets on/off switch to CREATE KEYSPACE cql: Move initial_tablets from REPLICATION to TABLETS in DDL network_topology_strategy: Estimate initial_tablets if 0 is set	2024-01-16 00:15:10 +01:00
Patryk Wrobel	aec0db1b96	cql_auth_query_test.cc: do not rely on templated operator<< This change is intended to remove the dependency to operator<<(std::ostream&, const std::unordered_set<seastar::sstring>&) from test/boost/cql_auth_query_test.cc. It prepares the test for removal of the templated helpers. Such removal is one of goals of the referenced issue that is linked below. Refs: #13245 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#16758	2024-01-15 13:30:05 +02:00
Kefu Chai	fc97d91f1a	auth: add fmt::format for auth::resource and friends before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we * define a formatter for `auth::resource` and friends, * update their callers of `operator<<` to use `fmt::print()`. * drop `operator<<`, as they are not used anymore. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16765	2024-01-15 13:26:39 +02:00
Kefu Chai	218334eaf5	test/nodetool: use build/$CMAKE_BUILD_TYPE when appropriate because the CMake-generated build.ninja is located under build/, and it puts the `scylla` executable at build/$CMAKE_BUILD_TYPE/scylla, instead of at build/$scylla_build_mode/scylla, so let's adapt to this change accordingly. we will promote this change to a shared place if we have similar needs in other tests as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16775	2024-01-15 12:52:35 +02:00
Pavel Emelyanov	dd892b0d8a	code: Enable tablets if cluster feature is enabled If the TABLETS map is missing in the CREATE KEYSPACE statement the tablets are anyway enabled if the respective cluster feature is enabled. To opt-out keyspaces one may use TABLETS = { 'enabled': false } syntax. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	4838eeb201	test: Turn off tablets feature by default Next patches will make per-keyspace initial_tables option really optional and turn tablets ON when the feature is ON. This will break all other tests' assumptions, that they are testing vnodes replication. So turn the feature off by default, tests that do need tables will need to explicitly enable this feature on their own Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	ae7da54f88	test: Move test_tablet_drain_failure_during_decommission to another suite In its current location it will be started with 3 pre-created scylla nodes with default features ON. Next patch will exclude `tablets` from the default list, so the test needs to create servers on its own Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	46b36d8c07	test/tablets: Enable tables for real on test keyspace When started cql_test_env creates a test keyspace. Some tablets test cases create a table in this keyspace, but misuse the whole feature. The thing is that while tablets feature is ON in those test cases, the keyspace itself doesn _not_ have the initial_tables option and thus tablets are not enabled for the ks' table for real. Currently test cases work just because this table is only used as a transparent table ID placeholder. If turning on tablets for the keyspace, several test cases would get broken for two reasons. First, the tables map will no longer be empty on test start. Second, applying changes to tablet metadata may not be visible, becase test case uses "ranom" timestamp, that can be less that the initial metadata mutations' timestamp. This patch fixes all three places: 1. enables tables for the test keyspace 2. removes assumption that the initial metadata is empty 3. uses large enough timestamp for subsequent mutations Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	2376b699e0	test/tablets: Make timestamp local Just to make next patching simpler Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	6cb3055059	cql: Add tablets on/off switch to CREATE KEYSPACE Now the user can do CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false } to turn tablets off. It will be useful in the future to opt-out keyspace from tablets when they will be turned on by default based on cluster features only. Also one can do just CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true } and let Scylla select the initial tablets value by its own Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:11 +03:00
Pavel Emelyanov	941f6d8fca	cql: Move initial_tablets from REPLICATION to TABLETS in DDL This patch changes the syntax of enabling tablets from CREATE KEYSPACE ... WITH REPLICATION = { ..., 'initial_tablets': <int> } to be CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> } and updates all tests accordingly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:04:48 +03:00
Gleb Natapov	f8b90aeb14	test: add test for automatic cleanup procedure The test runs two bootstraps and checks that there is no cleanup in between. Then it runs a decommission and checks that cleanup runs automatically and then it runs one more decommission and checks that no cleanup runs again. Second part checks manual cleanup triggering. It adds a node, triggers cleanup through the REST API, checks that is runs, decommissions a node and check that the cleanup did not run again.	2024-01-14 15:45:53 +02:00
Gleb Natapov	5882855669	test: add test for topology requests queue management This test creates a 5 node cluster with 2 down nodes (A and B). After that it creates a queue of 3 topology operation: bootstrap, removenode A and removenode B with ignore_nodes=A. Check that all operation manage to complete. Then it downs one node and creates a queue with two requests: bootstrap and decommission. Since none can proceed both should be canceled.	2024-01-14 15:45:53 +02:00
Gleb Natapov	97ab3f6622	storage_service: topology_coordinator: introduce cleanup REST API integrated with the topology coordinator Introduce new REST API "/storage_service/cleanup_all" that, when triggered, instructs the topology coordinator to initiate cluster wide cleanup on all dirty nodes. It is done by introducing new global command "global_topology_request::cleanup".	2024-01-14 15:45:53 +02:00
Gleb Natapov	0e68073b22	test: use servers_see_each_other when needed In the next patch we want to abort topology operations if there is no enough live nodes to perform them. This will break tests that do a topology operation right after restarting a node since a topology coordinator may still not see the restarted node as alive. Fix all those tests to wait between restart and a topology operation until UP state propagates.	2024-01-14 14:44:07 +02:00
Gleb Natapov	455ffaf5d8	test: add servers_see_each_other helper The helper makes sure that all nodes in the cluster see each other as alive.	2024-01-14 14:44:07 +02:00

1 2 3 4 5 ...

6155 Commits