Commit Graph

40905 Commits

Author SHA1 Message Date
Kefu Chai
b5ff098f28 thrift: add formatter for cassandra::ConsistencyLevel::type
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
cassandra::ConsistencyLevel::type.
please note, the operator<< for `cassandra::ConsistencyLevel::type`
is generated using `thrift` command line tool, which does not emit
specialization for fmt::formatter yet, so we need to use
`fmt::ostream_formatter` to implement the formatter for this type.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17013
2024-01-29 10:10:35 +02:00
Pavel Emelyanov
3abdb3c7ee tablets: Remove tablet_aware_replication_strategy::parse_initial_tablets
It's now unused, string with initial tablets its parsed elsewhere

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17010
2024-01-29 10:03:38 +02:00
Kefu Chai
912c588975 thrift: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17012
2024-01-29 10:02:30 +02:00
Kefu Chai
abb12979f8 raft: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17011
2024-01-29 10:00:56 +02:00
Kefu Chai
8f38bd5376 commitlog: add formatter for db::replay_position
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `db::replay_position`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17014
2024-01-29 09:59:30 +02:00
Botond Dénes
d3c1be9107 Merge 'alternator: enable tablets by default if experimental feature is enabled' from Nadav Har'El
This series does a similar change to Alternator as was done recently to CQL:

1. If the "tablets" experimental feature in enabled, new Alternator tables will use tablets automatically, without requiring an option on each new table. A default choice of initial_tablets is used. These choices can still be overridden per-table if the user wants to.
3. In particular, all test/alternator tests will also automatically run with tablets enabled
4. However, some tests will fail on tablets because they use features that haven't yet been implemented with tablets - namely Alternator Streams (Refs #16317) and Alternator TTL (Refs #16567). These tests will - until those features are implemented with tablets - continue to be run without tablets.
5. An option is added to the test/alternator/run to allow developers to manually run tests without tablets enabled, if they wish to (this option will be useful in the short term, and can be removed later).

Fixes #16355

Closes scylladb/scylladb#16900

* github.com:scylladb/scylladb:
  test/alternator: add "--vnodes" option to run script
  alternator: use tablets by default, if available
  test/alternator: run some tests without tablets
2024-01-29 09:22:13 +02:00
Kefu Chai
cb5453d534 .git: only allow codespell to run on master branch
so that non-master branches are not read by 3rd-party tools unless
they are audited.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16999
2024-01-29 09:04:20 +02:00
Kefu Chai
f96d25a0a7 tool: check for existence of keyspace before getting it
in general, user should save output of `DESC foo.bar` to a file,
and pass the path to the file as the argument of `--schema-file`
option of `scylla sstable` commands. the CQL statement generated
from `DESC` command always include the keyspace name of the table.
but in case user create the CQL statement manually and misses
the keyspace name. he/she would have following assertion failure
```
scylla: cql3/statements/cf_statement.cc:49: virtual const sstring &cql3::statements::raw::cf_statement::keyspace() const: Assertion `_cf_name->has_keyspace()' failed.
```
this is not a great user experience.

so, in this change, we check for the existence of keyspace before
looking it up. and throw a runtime error with a better error mesage.
so when the CQL statement does not have the keyspace name, the new
error message would look like:
```
error processing arguments: could not load schema via schema-file: std::runtime_error (tools::do_load_schemas(): CQL statement does not have keyspace specified)
```

since this check is only performed by `do_load_schemas()` which
care about the existence of keyspace, and it only expects the
CQL statement to create table/keyspace/type, we just override the
new `has_keyspace()` method of the corresponding types derived
from `cf_statement`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16981
2024-01-29 09:02:01 +02:00
Anna Stuchlik
dfa88ccc28 doc: document nodetool resetlocalschema
This adds the documentation for the nodetool resetlocalschema
command.
The syntax description is based on the description for Cassandra
and the ScyllaDB help for nodetool.

Fixes https://github.com/scylladb/scylladb/issues/16286

Closes scylladb/scylladb#16790
2024-01-28 21:09:02 +01:00
Kefu Chai
fe3bc00045 topology_coordinator: fix misspellings in log
these misspellings are identified by codespell.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17006
2024-01-26 16:50:39 +02:00
Dawid Medrek
b92fb3537a main: Postpone start-up of hint manager
In this commit, we postpone the start-up
of the hint manager until we obtain information
about other nodes in the cluster.

When we start the hint managers, one of the
things that happen is creating endpoint
managers -- structures managed by
db::hints::manager. Whether we create
an instance of endpoint manager depends on
the value returned by host_filter::can_hint_for,
which, in turn, may depend on the current state
of locator::topology.

If locator::topology is incomplete, some endpoint
managers may not be started even though they
should (because the target node IS part of the
cluster and we SHOULD send hints to it if there
are some).

The situation like that can happen because we
start the hint managers too early. This commit
aims to solve that problem. We only start
the hint managers when we've gathered information
about the other nodes in the cluster and created
the locator::topology using it.

Hinted Handoff is not negatively affected by these
changes since in between the previous point of
starting the hint managers and the current one,
all of the mutations performed by
service::storage_proxy target the local node, so
no hints would need to be generated anyway.

Fixes scylladb/scylladb#11870
Closes scylladb/scylladb#16511
2024-01-26 12:49:40 +01:00
Botond Dénes
c6fd4dffbb Merge 'Remove anonymous namespaces from headers' from Patryk Wróbel
Anonymous namespace implies internal linkage for its members.
When it is defined in a header, then each translation unit,
which includes such header defines its own unique instance
of members of the unnamed namespace that are ODR-used within
that translation unit.

This can lead to unexpected results including code bloat
or undefined behavior due to ODR violations.

This PR removes unnamed namespaces from header files.

References:

- [CppCoreGuidelines: "SF.21: Don’t use an unnamed (anonymous) namespace in a header"](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#sf21-dont-use-an-unnamed-anonymous-namespace-in-a-header)

- [SEI CERT C++: "DCL59-CPP. Do not define an unnamed namespace in a header file"](https://wiki.sei.cmu.edu/confluence/display/cplusplus/DCL59-CPP.+Do+not+define+an+unnamed+namespace+in+a+header+file)

Closes scylladb/scylladb#16998

* github.com:scylladb/scylladb:
  utils/config_file_impl.hh: remove anonymous namespace from header
  mutation/mutation.hh: remove anonymous namespace from header
2024-01-26 13:20:17 +02:00
Kefu Chai
a9d781d70f test/nodetool: only test "storage_service/cleanup_all" with scylla
this RESTful API is a scylla specific extension and is only used
by scylla-nodetool. currently, the java-based nodetool does not use
it at all, so mark it with "scylla_only".

one can verify this change with:
```
pytest --mode=debug --nodetool=cassandra test_cleanup.py::test_cleanup
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17001
2024-01-26 13:19:15 +02:00
Botond Dénes
582ddc70ec Merge 'test/nodetool: return a randomized address if not running with unshare' from Kefu Chai
we should allow user to run nodetool tests without `test.py`. but there
are good chance that the host could be reused by multiple tests or
multiple users who could be using port 12345. by randomizing the IP and
port, they would have better chance to complete the test without running
into used port problem.

Closes scylladb/scylladb#16996

* github.com:scylladb/scylladb:
  test/nodetool: return a randomized address if not running with unshare
  test/nodetool: return an address from loopback_network fixture
2024-01-26 13:15:58 +02:00
Kefu Chai
9ee6c00c84 docs: fix misspellings
these misspellings are identified by codespell.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17005
2024-01-26 13:14:21 +02:00
Kefu Chai
72cec22932 repair: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16993
2024-01-26 13:12:38 +02:00
Kamil Braun
4f736894e1 Merge 'Add maintenance mode' from Mikołaj Grzebieluch
In this mode, the node is not reachable from the outside, i.e.
* it refuses all incoming RPC connections,
* it does not join the cluster, thus
  * all group0 operations are disabled (e.g. schema changes),
  * all cluster-wide operations are disabled for this node (e.g. repair),
  * other nodes see this node as dead,
  * cannot read or write data from/to other nodes,
* it does not open Alternator and Redis transport ports and the TCP CQL port.

The only way to make CQL queries is to use the maintenance socket. The node serves only local data.

To start the node in maintenance mode, use the `--maintenance-mode true` flag or set `maintenance_mode: true` in the configuration file.

REST API works as usual, but some routes are disabled:
* authorization_cache
* failure_detector
* hinted_hand_off_manager

This PR also updates the maintenance socket documentation:
* add cqlsh usage to the documentation
* update the documentation to use `WhiteListRoundRobinPolicy`

Fixes #5489.

Closes scylladb/scylladb#15346

* github.com:scylladb/scylladb:
  test.py: add test for maintenance mode
  test.py: generalize usage of cluster_con
  test.py: when connecting to node in maintenance mode use maintenance socket
  docs: add maintenance mode documentation
  main: add maintenance mode
  main: move some REST routes initialization before joining group0
  message_service: add sanity check that rpc connections are not created in the maintenance mode
  raft_group0_client: disable group0 operations in the maintenance mode
  service/storage_service: add start_maintenance_mode() method
  storage_service: add MAINTENANCE option to mode enum
  service/maintenance_mode: add maintenance_mode_enabled bool class
  service/maintenance_mode: move maintenance_socket_enabled definition to seperate file
  db/config: add maintenance mode flag
  docs: add cqlsh usage to maintenance socket documentation
  docs: update maintenance socket documentation to use WhiteListRoundRobinPolicy
2024-01-26 11:02:34 +01:00
Kefu Chai
637dd73079 sstable/storage: use fs::path to represent _dir and _temp_dir
they are directories, and we are concating strings to build the paths
to the sstable components. so it would be more elegant to use fs::path
for manipulating paths.

this change was inspired by the discussion on passing the relative
path to sstable to `scylla sstables`, where we use the
`path::parent_path()` as the dir of sstable, and then concatenate
it with the filename component. but if the `parent_path()` method
returns an empty string, we end up with a path like
"/me-42-big-TOC.txt", which is not reachable. what we should be
reading is "me-42-big-TOC.txt". so, we should better off either
using `fs::path` or enforcing the absolute path.

since we already using "/" as separator, and concatenating strings,
this is an opportunity to switch over to `fs::path` to address
the problem and to avoid the string concatenating.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16982
2024-01-26 09:54:41 +02:00
Patryk Wrobel
6faa178f10 utils/config_file_impl.hh: remove anonymous namespace from header
Anonymous namespace implies internal linkage for its members.
When it is defined in a header, then each translation unit,
which includes such header defines its own unique instance
of members of the unnamed namespace that are ODR-used within
that translation unit.

This can lead to unexpected results including code bloat
or undefined behavior due to ODR violations.

This change aligns the code with the following guidelines:
 - CppCoreGuidelines: "SF.21: Don’t use an unnamed (anonymous)
                       namespace in a header"
 - SEI CERT C++: "DCL59-CPP. Do not define an unnamed namespace
                  in a header file"

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-26 08:44:44 +01:00
Patryk Wrobel
c218333afb cql3/type_json.cc: move stringstream content instead of copying it
C++20 introduced a new overload of std::ofstringstream::str()
that is selected when the mentioned member function is called
on r-value.

The new overload returns a string, that is move-constructed
from the underlying string instead of being copy-constructed.

This change applies std::move() on stringstream objects before
calling str() member function to avoid copying of the underlying
buffer.

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#16990
2024-01-26 09:41:09 +02:00
Kefu Chai
36e81f93d2 .git: do not apply codespell to licenses
we should keep the licenses as they are, even with misspellings.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16992
2024-01-26 09:39:27 +02:00
Patryk Wrobel
ba488b10ec mutation/mutation.hh: remove anonymous namespace from header
Anonymous namespace implies internal linkage for its members.
When it is defined in a header, then each translation unit,
which includes such header defines its own unique instance
of members of the unnamed namespace that are ODR-used within
that translation unit.

This can lead to unexpected results including code bloat
or undefined behavior due to ODR violations.

This change aligns the code with the following guidelines:
 - CppCoreGuidelines: "SF.21: Don’t use an unnamed (anonymous)
                       namespace in a header"
 - SEI CERT C++: "DCL59-CPP. Do not define an unnamed namespace
                  in a header file"

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-26 08:38:39 +01:00
Kefu Chai
01727a5399 test/nodetool: return a randomized address if not running with unshare
we should allow user to run nodetool tests without `test.py`. but there
are good chance that the host could be reused by multiple tests or
multiple users who could be using port 12345. by randomizing the IP and
port, they would have better chance to complete the test without running
into used port problem.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-26 13:32:47 +08:00
Kefu Chai
358d30fd29 test/nodetool: return an address from loopback_network fixture
* rename "maybe_setup_loopback_network" to "server_address"
* return an address from the fixture

this change prepares for bringing back the randomized IP and port,
in case users run this test without test.py, by randomizing the
IP and port, they would have better chance to complete the test
without running into used port problem.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-26 13:20:37 +08:00
Avi Kivity
03313d359e Merge ' db: commitlog_replayer: ignore mutations affected by (tablet) cleanups ' from Michał Chojnowski
To avoid data resurrection, mutations deleted by cleanup operations should be skipped during commitlog replay.

This series implements the above for tablet cleanups, by using a new system table which holds records of cleanup operations.

Fixes #16752

Closes scylladb/scylladb#16888

* github.com:scylladb/scylladb:
  test: test_tablets: add a test for cleanup after migration
  test: pylib: add ScyllaCluster.wipe_sstables
  test: boost: add commitlog_cleanup_test
  db: commitlog_replayer: ignore mutations affected by (tablet) cleanups
  replica: table: garbage-collect irrelevant system.commitlog_cleanups records
  db: commitlog: add min_position()
  replica: table: populate system.commitlog_cleanups on tablet cleanup
  db: system_keyspace: add system.commitlog_cleanups
  replica: table: refresh compound sstable set after tablet cleanup
2024-01-25 20:51:03 +02:00
Patryk Wrobel
a858daf038 service/client_state.cc: remove redundant copying
db::schema_tables::all_table_names() returns std::vector<sstring>.
Usage of range-for loop without reference results in copying each
of the elements of the traversed container. Such copying is redundant.

This change introduces usage of const reference to avoid copying.

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#16983
2024-01-25 20:35:05 +02:00
Kamil Braun
543ad0987a Merge 'raft topology: send barrier_and_drain to a decommissioning node' from Patryk Jędrzejczak
We didn't send the `barrier_and_drain` command to a
decommissioning node that could still be coordinating requests. It
could happen that a decommissioning node sent a request with an
old topology version after normal nodes received the new fence
version. Then, the request would fail on replicas with the stale
topology exception.

This PR fixes this problem by modifying `exec_global_command`.
From now on, it sends `barrier_and_drain` to a decommissioning
node.

We also stop filtering stale topology exceptions in
`test_topology_ops`. We added this filter after detecting the bug
fixed by this PR.

Fixes scylladb/scylladb#15804
Fixes scylladb/scylladb#16579
Fixes scylladb/scylladb#16642

Closes scylladb/scylladb#16797

* github.com:scylladb/scylladb:
  test: test_topology_ops: remove failed mutations filter
  raft topology: send barrier_and_drain to a decommissioning node
  raft topology: ensure at most one transitioning node
2024-01-25 16:09:02 +01:00
Kefu Chai
ee28cf2285 test.py: s/defalt/default/
this typo was identified by codespell

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16980
2024-01-25 16:54:07 +02:00
Botond Dénes
6d5ee6d48a Merge 'test/nodetool: run nodetool tests using "unshare"' from Kefu Chai
before this change, we use a random address when launching
rest_api_mock server, but there are chances that the randomly
picked address conflicts with an already-used address on the
host. and the subprocess fails right away with the returncode of
1 upon this failure, but we just continue on and check the readiness
of the already-dead server. actually, we've seen test failures
caused by the EADDRINUSE failure, and when we checked the readiness
of the rest_api_mock by sending HTTP request and reading the
response, what we had is not a JSON encoded response but a webpage,
which was likely the one returned by a minio server.

in this change, we

* specify the "launcher" option of nodetool
  test suite to "unshare", so that all its tests are launched
  in separated namespaces.
* do not use a random address for the mock server, as the network
  namespaces are separated.

Fixes #16542

Closes scylladb/scylladb#16773

* github.com:scylladb/scylladb:
  test/nodetool: run nodetool tests using "unshare"
  test.py: add "launcher" option support
2024-01-25 16:53:49 +02:00
Mikołaj Grzebieluch
763911af5b test.py: add test for maintenance mode
The test checks that in maintenance mode server A is not available for other
nodes and for clients. It is possible to connect by the maintenance socket
to server A and perform local CQL operations.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
ca35e352f5 test.py: generalize usage of cluster_con
Add option to pass load_balancing policy.
Change hosts type to list of IPs or cassandra.Endpoint.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
77a656bfd6 test.py: when connecting to node in maintenance mode use maintenance socket
A node in the maintenance socket hasn't an opened regular CQL port.
To connect to the node, the scylla cluster needs to use the node's maintenance socket.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
9c07a189e8 docs: add maintenance mode documentation 2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
0bdbd6e8f5 main: add maintenance mode
In maintenance mode:
* Group0 doesn't start and the node doesn't join the token ring to behave as a dead
node to others,
* Group0 operations are disabled and result in an error,
* Only the maintenance socket listens for CQL requests,
* The storage service initialises token_metadata with the local node as the only node
on the token ring.

Maintenance mode is enabled by passing the --maintenance-mode flag.

Maintenance mode starts before the group0 is initialised.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
617adde9c9 main: move some REST routes initialization before joining group0
Move REST endpoints that don't need connection with other nodes, before joining the group0.
This way, they can be initialized in the maintenance mode.

Move `snapshot_ctl` along with routes because of snapshots API and tasks API.
Its constructor is a noop, so it is safe to move it.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
d8de209dcf message_service: add sanity check that rpc connections are not created in the maintenance mode
In maintenance mode, a node shouldn't be able to communicate with other nodes.

To make sure this does not happen, the sanity check is added.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
c08266cfe5 raft_group0_client: disable group0 operations in the maintenance mode
In maintenance mode, the node doesn't communicate with other nodes, so it doesn't
start or apply group0 operations. Users can still try to start it, e.g. change
the schema, and the node can't allow it.

Init _upgrade_state with recovery in the maintenance mode.
Throw an error if the group0 operation is started in maintenance mode.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
97641f646a service/storage_service: add start_maintenance_mode() method
In the maintenance mode, other nodes won't be available thus we disabled joining
the token ring and the token metadata won't be populated with the local node's endpoint.
When a CQL query is executed it checks the `token_metadata` structure and fails if it is empty.

Add a method that initialises `token_metadata` with the local node as the only node in the token ring.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
c530756837 storage_service: add MAINTENANCE option to mode enum
join_cluster and start_maintenance_mode are incompatible.
To make sure that only one is called when the node starts, add the MAINTENANCE option.

start_maintenance_mode sets _operation_mode to MAINTENANCE.
join_cluster sets _operation_mode to STARTING.

set_mode will result in an internal error if:
* it tries to set MAINTENANCE mode when the _operation_mode is other than NONE,
  i.e. start_maintenance_mode is called after join_cluster (or it is called during
  the drain, but it also shouldn't happen).
* it tries to set STARTING mode when the mode is set to MAINTENANCE,
  i.e. join_cluster is called after start_maintenance_mode.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
d4c22fc86c service/maintenance_mode: add maintenance_mode_enabled bool class 2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
8b2f0e38d9 service/maintenance_mode: move maintenance_socket_enabled definition to seperate file 2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
e6a83b9819 db/config: add maintenance mode flag 2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
81ef9fc91e docs: add cqlsh usage to maintenance socket documentation
After https://github.com/scylladb/scylla-cqlsh/pull/67, the user can use
cqlsh to connect to the node by maintenance socket.
2024-01-25 15:27:53 +01:00
Botond Dénes
c67698ea06 compaction/compaction_manager: perform_cleanup(): hold the compaction gate
While the cleanup is ongoing. Otherwise, a concurrent table drop might
trigger a use-after-free, as we have seen in dtests recently.

Fixes: #16770

Closes scylladb/scylladb#16874
2024-01-25 14:52:50 +01:00
Mikołaj Grzebieluch
2c34d9fcd8 docs: update maintenance socket documentation to use WhiteListRoundRobinPolicy
After https://github.com/scylladb/python-driver/pull/287, the user can use
WhiteListRoundRobinPolicy to connect to the node by maintenance socket.
2024-01-25 14:52:24 +01:00
Pavel Emelyanov
bf3cae4992 Merge 'tests: utils: error injection: print time duration instead of count' from Kefu Chai
before this change, we always cast the wait duration to millisecond,
even if it could be using a higher resolution. actually
`std::chrono::steady_clock` is using `nanosecond` for its duration,
so if we inject a deadline using `steady_clock`, we could be awaken
earlier due to the narrowing of the duration type caused by the
duration_cast.

in this change, we just use the duration as it is. this should allow
the caller to use the resolution provided by Seastar without losing
the precision. the tests are updated to print the time duration
instead of count to provide information with a higher resolution.

Fixes #15902

Closes scylladb/scylladb#16264

* github.com:scylladb/scylladb:
  tests: utils: error injection: print time duration instead of count
  error_injection: do not cast to milliseconds when injecting timeout
2024-01-25 16:13:27 +03:00
Avi Kivity
69d597075a Merge 'tablets: Add support for removenode and replace handling' from Tomasz Grabiec
New tablet replicas are allocated and rebuilt synchronously with node
operations. They are safely rebuilt from all existing replicas.
The list of ignored nodes passed to node operations is respected.

Tablet scheduler is responsible for scheduling tablet rebuilding transition which
changes the replicas set. The infrastructure for handling decommission
in tablet scheduler is reused for this.

Scheduling is done incrementally, respecting per-shard load
limits. Rebuilding transitions are recognized by load calculation to
affect all tablet replicas.

New kind of tablet transition is introduced called "rebuild" which
adds new tablet replica and rebuilds it from existing replicas. Other
than that, the transition goes through the same stages as regular
migration to ensure safe synchronization with request coordinators.

In this PR we simply stream from all tablet replicas. Later we should
switch to calling repair to avoid sending excessive amounts of data.

Fixes https://github.com/scylladb/scylladb/issues/16690.

Closes scylladb/scylladb#16894

* github.com:scylladb/scylladb:
  tests: tablets: Add tests for removenode and replace
  tablets: Add support for removenode and replace handling
  topology_coordinator: tablets: Do not fail in a tight loop
  topology_coordinator: tablets: Avoid warnings about ignored failured future
  storage_service, topology: Track excluded state in locator::topology
  raft topology: Introduce param-less topology::get_excluded_nodes()
  raft topology: Move get_excluded_nodes() to topology
  tablets: load_balancer: Generalize load tracking
  tablets: Introduce get_migration_streaming_info() which works on migration request
  tablets: Move migration_to_transition_info() to tablets.hh
  tablets: Extract get_new_replicas() which works on migraiton request
  tablets: Move tablet_migration_info to tablets.hh
  tablets: Store transition kind per tablet
2024-01-25 14:49:43 +02:00
Patryk Jędrzejczak
b348014745 test: test_topology_ops: remove failed mutations filter
We added this filter after detecting a bug in the Raft-based
topology. We weren't sending `barrier_and_drain` commands to a
decommissioning node that could still be coordinating requests.
It could cause stale topology exceptions on replicas if the
decommissioning node sent a request with an old topology version
after normal nodes received the new fence version.

This bug has been fixed in the previous commit, so we remove the
filter.
2024-01-25 13:42:48 +01:00
Patryk Jędrzejczak
9aebd6dd96 raft topology: send barrier_and_drain to a decommissioning node
Before this patch, we didn't send the `barrier_and_drain` command
to a decommissioning node that could still be coordinating
requests. It could happen that a decommissioning node sent
a request with an old topology version after normal nodes received
the new fence version. Then, the request would fail on replicas
with the stale topology exception.

We fix this problem by modifying `exec_global_command`. From now
on, it sends `barrier_and_drain` to a decommissioning node, which
can also be in the `left_token_ring` state.
2024-01-25 13:42:48 +01:00
Patryk Jędrzejczak
378cbd0b70 raft topology: ensure at most one transitioning node
We add a sanity check to ensure at most one transitioning node at
a time. If there is more, something must have gone wrong.

In the future, we might implement concurrent topology operations.
Then, we will remove this sanity check.

We also extend the comment describing `transition_nodes` so that
it better explains why we use a map and how it should be handled.
2024-01-25 13:42:46 +01:00