This refactoring is a follow-up for https://github.com/scylladb/scylladb/pull/13376, move per keyspace data structures related to topology changes from `token_metadata` to `erm`.
We move `pending_endpoints` and `read_endpoints`, along with their computation logic, from `token_metadata` to `vnode_effective_replication_map`. The `vnode_effective_replication_map` seems more appropriate for them since it contains functionally similar `replication_map` and we will be able to reuse `pending_endpoints/read_endpoints` across keyspaces sharing the same `factory_key`.
At present, `pending_endpoints` and `read_endpoints` are updated in the `update_pending_ranges` function. The update logic comprises two parts - preparing data common to all keyspaces/replication_strategies, and calculating the `migration_info` for specific keyspaces. In this PR we introduce a new `topology_change_info` structure to hold the first part's data and create an `update_topology_change_info` function to update it. This structure will be used in `vnode_effective_replication_map` to compute `pending_endpoints` and `read_endpoints`. This enables the reuse of `topology_change_info` across all keyspaces, unlike the current `update_pending_ranges` implementation, which is another benefit of this refactoring.
The PR also optimises `replication_map` memory usage for the case `natural_endpoints_depend_on_token == false`. We store endpoints list only once with special key
instead of duplicating them for each `vnode` token.
The original `update_pending_ranges` remains unchanged during the PR commits, and will be removed entirely upon transitioning to the new implementation.
Closes#13715
* github.com:scylladb/scylladb:
token_metadata_test: add a test for everywhere strategy
token_metadata_test: check read_endpoints when bootstrapping first node
token_metadata_test: refactor tests, extract create_erm
token_metadata: drop has_pending_ranges and migration_info
effective_replication_map: add has_pending_ranges
token_metadata: drop update_pending_ranges
effective_replication_map: use new get_pending_endpoints and get_endpoints_for_reading
token_metadata_test.cc: create token_metadata and replication_strategy as shared pointers
vnode_effective_replication_map: get_pending_endpoints and get_endpoints_for_reading
calculate_effective_replication_map: compute pending_endpoints and read_endpoints
vnode_erm: optimize replication_map
vnode_erm::get_range_addresses: use sorted_tokens
abstract_replication_strategy.hh: de-virtualize natural_endpoints_depend_on_token
sequenced_set: add extract_vector method
effective_replication_map: clone_endpoints_gently -> clone_data_gently
vnode_erm: gentle destruction of _pending_endpoints and _read_endpoints
stall_free.hh: add clear_gently for rvalues
stall_free.hh: relax Container requirement
token_metadata: add pending_endpoints and read_endpoints to vnode_effective_replication_map
token_metadata: introduce topology_change_info
token_metadata: replace set_topology_transition_state with set_read_new
`check_and_repair_cdc_streams` is an existing API which you can use when the
current CDC generation is suboptimal, e.g. after you decommissioned a node the
current generation has more stream IDs than you need. In that case you can do
`nodetool checkAndRepairCdcStreams` to create a new generation with fewer
streams.
It also works when you change number of shards on some node. We don't
automatically introduce a new generation in that case but you can use
`checkAndRepairCdcStreams` to create a new generation with restored
shard-colocation.
This PR implements the API on top of raft topology, it was originally
implemented using gossiper. It uses the `commit_cdc_generation` topology
transition state and a new `publish_cdc_generation` state to create new CDC
generations in a cluster without any nodes changing their `node_state`s in the
process.
Closes#13683
* github.com:scylladb/scylladb:
docs: update topology-over-raft.md
test: topology_experimental_raft: test `check_and_repair_cdc` API
raft topology: implement `check_and_repair_cdc_streams` API
raft topology: implement global request handling
raft topology: introduce `prepare_new_cdc_generation_data`
raft_topology: `get_node_to_work_on_opt`: return guard if no node found
raft topology: remove `node_to_work_on` from `commit_cdc_generation` transition
raft topology: separate `publish_cdc_generation` state
raft topology: non-node-specific `exec_global_command`
raft topology: introduce `start_operation()`
raft topology: non-node-specific `topology_mutation_builder`
topology_state_machine: introduce `global_topology_request`
topology_state_machine: use `uint16_t` for `enum_class`es
raft topology: make `new_cdc_generation_data_uuid` topology-global
We already use the new pending_endpoints from erm though
the get_pending_ranges virtual function, in this commit
we update all the remaining places to use the new
implementation in erm, as well as remove the old implementation
in token_metadata.
This implicit link it pretty bad, because feature service is a low-level
one which lots of other services depend on. System keyspace is opposite
-- a high-level one that needs e.g. query processor and database to
operate. This inverse dependency is created by the feature service need
to commit enabled features' names into system keyspace on cluster join.
And it uses the qctx thing for that in a best-effort manner (not doing
anything if it's null).
The dependency can be cut. The only place when enabled features are
committed is when gossiper enables features on join or by receiving
state changes from other nodes. By that time the
sharded<system_keyspace> is up and running and can be used.
Despite gossiper already has system keyspace dependency, it's better not
to overload it with the need to mess with enabling and persisting
features. Instead, the feature_enabler instance is equipped with needed
dependencies and takes care of it. Eventually the enabler is also moved
to feature_service.cc where it naturally belongs.
Fixes: #13837Closes#13172
* github.com:scylladb/scylladb:
gossiper: Remove features and sysks from gossiper
system_keyspace: De-static save_local_supported_features()
system_keyspace: De-static load_|save_local_enabled_features()
system_keyspace: Move enable_features_on_startup to feature_service (cont)
system_keyspace: Move enable_features_on_startup to feature_service
feature_service: Open-code persist_enabled_feature_info() into enabler
gms: Move feature enabler to feature_service.cc
gms: Move gossiper::enable_features() to feature_service::enable_features_on_join()
gms: Persist features explicitly in features enabler
feature_service: Make persist_enabled_feature_info() return a future
system_keyspace: De-static load_peer_features()
gms: Move gossiper::do_enable_features to persistent_feature_enabler::enable_features()
gossiper: Enable features and register enabler from outside
gms: Add feature_service and system_keyspace to feature_enabler
The `system_keyspace` has several methods to query the tables in it. These currently require a storage proxy parameter, because the read has to go through storage-proxy. This PR uses the observation that all these reads are really local-replica reads and they only actually need a relatively small code snippet from storage proxy. These small code snippets are exported into standalone function in a new header (`replica/query.hh`). Then the system keyspace code is patched to use these new standalone functions instead of their equivalent in storage proxy. This allows us to replace the storage proxy dependency with a much more reasonable dependency on `replica::database`.
This PR patches the system keyspace code and the signatures of the affected methods as well as their immediate callers. Indirect callers are only patched to the extent it was needed to avoid introducing new includes (some had only a forward-declaration of storage proxy and so couldn't get database from it). There are a lot of opportunities left to free other methods or maybe even entire subsystems from storage proxy dependency, but this is not pursued in this PR, instead being left for follow-ups.
This PR was conceived to help us break the storage proxy -> storage service -> system tables -> storage proxy dependency loop, which become a major roadblock in migrating from IP -> host_id. After this PR, system keyspace still indirectly depends on storage proxy, because it still uses `cql3::query_processor` in some places. This will be addressed in another PR.
Refs: #11870Closes#13869
* github.com:scylladb/scylladb:
db/system_keyspace: remove dependency on storage_proxy
db/system_keyspace: replace storage_proxy::query*() with replica:: equivalent
replica: add query.hh
The auto& db = proxy.local().get_db() is called few lines above this
patch, so the &db can be reused for invoke_on_all() call.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13896
before this change, alternator_timeout_in_ms is not live-updatable,
as after setting executor's default timeout right before creating
sharded executor instances, they never get updated with this option
anymore. but many users would like to set the driver timers based on
server timers. we need to enable them to configure timeout even
when the server is still running.
in this change,
* `alternator_timeout_in_ms` is marked as live-updateable
* `executor::_s_default_timeout` is changed to a thread_local variable,
so it can be updated by a per-shard updateable_value. and
it is now a updateable_value, so its variable name is updated
accordingly. this value is set in the ctor of executor, and
it is disconnected from the corresponding named_value<> option
in the dtor of executor.
* alternator_timeout_in_ms is passed to the constructor of
executor via sharded_parameter, so `executor::_timeout_in_ms` can
be initialized on per-shard basis
* `executor::set_default_timeout()` is dropped, as we already pass
the option to executor in its ctor.
Fixes#12232Closes#13300
* github.com:scylladb/scylladb:
alternator: split the param list of executor ctor into multi lines
alternator,config: make alternator_timeout_in_ms live-updateable
Currently, error injections can be enabled either through HTTP or CQL.
While these mechanisms are effective for injecting errors after a node
has already started, it can't be reliably used to trigger failures
shortly after node start. In order to support this use case, this commit
adds possibility to enable some error injections via config.
A configuration option `error_injections_at_startup` is added. This
option uses our existing configuration framework, so it is possible to
supply it either via CLI or in the YAML configuration file.
- When passed in commandline, the option is parsed as a
semicolon-separated list of error injection names that should be
enabled. Those error injections are enabled in non-oneshot mode.
The CLI option is marked as not used in release mode and does not
appear in the option list.
Example:
--error-injections-at-startup failure_point1;failure_point2
- When provided in YAML config, the option is parsed as a list of items.
Each item is either a string or a map or parameters. This method is
more flexible as it allows to provide parameters for each injection
point. At this time, the only benefit is that it allows enabling
points in oneshot mode, but more parameters can be added in the future
if needed.
Explanatory example:
error_injections_at_startup:
- failure_point1 # enabled in non-oneshot mode
- name: failure_point2 # enabled in oneshot mode
one_shot: true # due to one_shot optional parameter
The primary goal of this feature is to facilitate testing of raft-based
cluster features. An error injection will be used to enable an
additional feature to simulate node upgrade.
Tests: manual
Closes#13861
The methods that take storage_proxy as argument can now accept a
replica::database instead. So update their signatures and update all
callers. With that, system_keyspace.* no longer depends on storage_proxy
directly.
Use the recently introduced replica side query utility functions to
query the content of the system tables. This allows us to cut the
dependency of the system keyspace on storage proxy.
The methods still take storage proxy parameter, this will be replaced
with replica::database in the next patch.
There is still one hidden storage proxy dependency left, via
clq3::query_processor. This will be addressed later.
Currently, when a user has permissions on a function/all functions in
keyspace, and the function/keyspace is dropped, the user keeps the
permissions. As a result, when a new function/keyspace is created
with the same name (and signature), they will be able to use it even
if no permissions on it are granted to them.
Simliarly to regular UDFs, the same applies to UDAs.
After this patch, the corresponding permissions on functions are dropped
when a function/keyspace is dropped.
Fixes#13820Closes#13823
We are going to use remapped_endpoints_for_reading, we need
to make sure we use it in the right place. The
get_live_sorted_endpoints function looks like what we
need - it's used in all read code paths.
From its name, however, this was not obvious.
Also, we add the parameter ks_name as we'll need it
to pass to remapped_endpoints_for_reading.
`topology` currently contains the `requests` map, which is suitable for
node-specific requests such as "this node wants to join" or "this node
must be removed". But for requests for operations that affect the
cluster as a whole, a separate request type and field is more
appropriate. Introduce one.
The enum currently contains the option `new_cdc_generation` for requests
to create a new CDC generation in the cluster. We will implement the
whole procedure in later commits.
- make it a static column in `system.topology`
- move it from node-specific `ring_slice` to cluster-global `topology`
We will use it in scenarios where no node is transitioning.
Also make it `std::optional` in topology for consistency with other
fields (previously, the 'no value' state for this field was represented
using default-constructed `utils::UUID`).
since #13452, we switched most of the caller sites from std::regex
to boost::regex. in this change, all occurences of `#include <regex>`
are dropped unless std::regex is used in the same source file.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13765
In https://github.com/scylladb/scylladb/pull/13482 we renamed the reader permit states to more descriptive names. That PR however only covered only the states themselves and their usages, as well as the documentation in `docs/dev`.
This PR is a followup to said PR, completing the name changes: renaming all symbols, names, comments etc, so all is consistent and up-to-date.
Closes#13573
* github.com:scylladb/scylladb:
reader_concurrency_semaphore: misc updates w.r.t. recent permit state name changes
reader_concurrency_semaphore: update permit members w.r.t. recent permit state name changes
reader_concurrency_semaphore: update RAII state guard classes w.r.t. recent permit state name changes
reader_concurrency_semaphore: update API w.r.t. recent permit state name changes
reader_concurrency_semaphore: update stats w.r.t. recent permit state name changes
Current S3 client was tested over minio and it takes few more touches to work with amazon S3.
The main challenge here is to support singed requests. The AWS S3 server explicitly bans unsigned multipart-upload requests, which in turn is the essential part of the sstables S3 backend, so we do need signing. Signing a request has many options and requirements, one of them is -- request _body_ can be or can be not included into signature calculations. This is called "(un)signed payload". Requests sent over plain HTTP require payload signing (i.e. -- request body should be included into signature calculations), which can a bit troublesome, so instead the PR uses unsigned payload (i.e. -- doesn't include the request body into signature calculation, only necessary headers and query parameters), but thus also needs HTTPS.
So what this set does is makes the existing S3 client code sign requests. In order to sign the request the code needs to get AWS key and secret (and region) from somewhere and this somewhere is the conf/object_storage.yaml config file. The signature generating code was previously merged (moved from alternator code) and updated to suit S3 client needs.
In order to properly support HTTPS the PR adds special connection factory to be used with seastar http client. The factory makes DNS resolving of AWS endpoint names and configures gnutls systemtrust.
fixes: #13425Closes#13493
* github.com:scylladb/scylladb:
doc: Add a document describing how to configure S3 backend
s3/test: Add ability to run boost test over real s3
s3/client: Sign requests if configured
s3/client: Add connection factory with DNS resolve and configurable HTTPS
s3/client: Keep server port on config
s3/client: Construct it with config
s3/client: Construct it with sstring endpoint
sstables: Make s3_storage with endpoint config
sstables_manager: Keep object storage configs onboard
code: Introduce conf/object_storage.yaml configuration file
In order to access real S3 bucket, the client should use signed requests
over https. Partially this is due to security considerations, partially
this is unavoidable, because multipart-uploading is banned for unsigned
requests on the S3. Also, signed requests over plain http require
signing the payload as well, which is a bit troublesome, so it's better
to stick to secure https and keep payload unsigned.
To prepare signed requests the code needs to know three things:
- aws key
- aws secret
- aws region name
The latter could be derived from the endpoint URL, but it's simpler to
configure it explicitly, all the more so there's an option to use S3
URLs without region name in them we could want to use some time.
To keep the described configuration the proposed place is the
object_storage.yaml file with the format
endpoints:
- name: a.b.c
port: 443
aws_key: 12345
aws_secret: abcdefghijklmnop
...
When loaded, the map gets into db::config and later will be propagated
down to sstables code (see next patch).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
in this series, we try to use `generation_type` as a proxy to hide the consumers from its underlying type. this paves the road to the UUID based generation identifier. as by then, we cannot assume the type of the `value()` without asking `generation_type` first. better off leaving all the formatting and conversions to the `generation_type`. also, this series changes the "generation" column of sstable registry table to "uuid", and convert the value of it to the original generation_type when necessary, this paves the road to a world with UUID based generation id.
Closes#13652
* github.com:scylladb/scylladb:
db: use uuid for the generation column in sstable registry table
db, sstable: add operator data_value() for generation_type
db, sstable: print generation instead of its value
* change the "generation" column of sstable registry table from
bigint to uuid
* from helper to convert UUID back to the original generation
in the long run, we encourage user to use uuid based generation
identifier. but in the transition period, both bigint based and uuid
based identifiers are used for the generation. so to cater both
needs, we use a hackish way to store the integer into UUID. to
differentiate the was-integer UUID from the geniune UUID, we
check the UUID's most_significant_bits. because we only support
serialize UUID v1, so if the timestamp in the UUID is zero,
we assume the UUID was generated from an integer when converting it
back to a generation identififer.
also, please note, the only use case of using generation as a
column is the sstable_registry table, but since its schema is fixed,
we cannot store both a bigint and a UUID as the value of its
`generation` column, the simpler way forward is to use a single type
for the generation. to be more efficient and to preserve the type of
the generation, instead of using types like ascii string or bytes,
we will always store the generation as a UUID in this table, if the
generation's identifier is a int64_t, the value of the integer will
be used as the least significant bits of the UUID.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
`clustering_key_columns()` returns a range view, and `front()` returns
the reference to its first element. so we cannot assume the availability
of this reference after the expression is evaluated. to address this
issue, let's capture the returned range by value, and keep the first
element by reference.
this also silences warning from GCC-13:
```
/home/kefu/dev/scylladb/db/schema_tables.cc:3654:30: error: possibly dangling reference to a temporary [-Werror=dangling-reference]
3654 | const column_definition& first_view_ck = v->clustering_key_columns().front();
| ^~~~~~~~~~~~~
/home/kefu/dev/scylladb/db/schema_tables.cc:3654:79: note: the temporary was destroyed at the end of the full expression ‘(& v)->view_ptr::operator->()->schema::clustering_key_columns().boost::iterator_range<__gnu_cxx::__normal_iterator<const column_definition*, std::vector<column_definition> > >::<anonymous>.boost::iterator_range_detail::iterator_range_base<__gnu_cxx::__normal_iterator<const column_definition*, std::vector<column_definition> >, boost::iterators::random_access_traversal_tag>::<anonymous>.boost::iterator_range_detail::iterator_range_base<__gnu_cxx::__normal_iterator<const column_definition*, std::vector<column_definition> >, boost::iterators::bidirectional_traversal_tag>::<anonymous>.boost::iterator_range_detail::iterator_range_base<__gnu_cxx::__normal_iterator<const column_definition*, std::vector<column_definition> >, boost::iterators::incrementable_traversal_tag>::front()’
3654 | const column_definition& first_view_ck = v->clustering_key_columns().front();
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
```
Fixes#13720
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13721
this series silences the warnings from GCC 13. some of these changes are considered as critical fixes, and posted separately.
see also #13243Closes#13723
* github.com:scylladb/scylladb:
cdc: initialize an optional using its value type
compaction: disambiguate type name
db: schema_tables: drop unused variable
reader_concurrency_semaphore: fix signed/unsigned comparision
locator: topology: disambiguate type names
raft: disambiguate promise name in raft::awaited_conf_changes
We change the meaning and name of `replication_state`: previously it was meant
to describe the "state of tokens" of a specific node; now it describes the
topology as a whole - the current step in the 'topology saga'. It was moved
from `ring_slice` into `topology`, renamed into `transition_state`, and the
topology coordinator code was modified to switch on it first instead of node
state - because there may be no single transitioning node, but the topology
itself may be transitioning.
This PR was extracted from #13683, it contains only the part which refactors
the infrastructure to prepare for non-node specific topology transitions.
Closes#13690
* github.com:scylladb/scylladb:
raft topology: rename `update_replica_state` -> `update_topology_state`
raft topology: remove `transition_state::normal`
raft topology: switch on `transition_state` first
raft topology: `handle_ring_transition`: rename `res` to `exec_command_res`
raft topology: parse replaced node in `exec_global_command`
raft topology: extract `cleanup_group0_config_if_needed` from `get_node_to_work_on`
storage_service: extract raft topology coordinator fiber to separate class
raft topology: rename `replication_state` to `transition_state`
raft topology: make `replication_state` a topology-global state
this also silence the warning from GCC-13:
```
/home/kefu/dev/scylladb/db/schema_tables.cc:1489:10: error: variable ‘ts’ set but not used [-Werror=unused-but-set-variable]
1489 | auto ts = db_clock::now();
| ^~
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
so we can apply `execute_cql()` on `generation_type` directly without
extracting its value using `generation.value()`. this paves the road to
adding UUID based generation id to `generation_type`. as by then, we
will have both UUID based and integer based `generation_type`, so
`generation_type::value()` will not be able to represent its value
anymore. and this method will be replaced by `operator data_value()` in
this use case.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
this change prepares for the change to use `variant<UUID, int64_t>`
as the value of `generation_type`. as after this change, the "value"
of a generation would be a UUID or an integer, and we don't want to
expose the variant in generation's public interface. so the `value()`
method would be changed or removed by then.
this change takes advantage of the fact that the formatter of
`generation_type` always prints its value. also, it's better to
reuse `generation_type` formatter when appropriate.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
The new name is more generic - it describes the current step of a
'topology saga` (a sequence of steps used to implement a larger topology
operation such as bootstrap).
Previously it was part of `ring_slice`, belonging to a specific node.
This commit moves it into `topology`, making it a cluster-global
property.
The `replication_state` column in `system.topology` is now `static`.
This will allow us to easily introduce topology transition states that
do not refer to any specific node. `commit_cdc_generation` will be such
a state, allowing us to commit a new CDC generation even though all
nodes are normal (none are transitioning). One could argue that the
other states are conceptually already cluster-global: for example,
`write_both_read_new` doesn't affect only the tokens of a bootstrapping
(or decommissioning etc.) node; it affects replica sets of other tokens
as well (with RFs greater than 1).
This PR introduces an experimental feature called "tablets". Tablets are
a way to distribute data in the cluster, which is an alternative to the
current vnode-based replication. Vnode-based replication strategy tries
to evenly distribute the global token space shared by all tables among
nodes and shards. With tablets, the aim is to start from a different
side. Divide resources of replica-shard into tablets, with a goal of
having a fixed target tablet size, and then assign those tablets to
serve fragments of tables (also called tablets). This will allow us to
balance the load in a more flexible manner, by moving individual tablets
around. Also, unlike with vnode ranges, tablet replicas live on a
particular shard on a given node, which will allow us to bind raft
groups to tablets. Those goals are not yet achieved with this PR, but it
lays the ground for this.
Things achieved in this PR:
- You can start a cluster and create a keyspace whose tables will use
tablet-based replication. This is done by setting `initial_tablets`
option:
```
CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy',
'replication_factor': 3,
'initial_tablets': 8};
```
All tables created in such a keyspace will be tablet-based.
Tablet-based replication is a trait, not a separate replication
strategy. Tablets don't change the spirit of replication strategy, it
just alters the way in which data ownership is managed. In theory, we
could use it for other strategies as well like
EverywhereReplicationStrategy. Currently, only NetworkTopologyStrategy
is augmented to support tablets.
- You can create and drop tablet-based tables (no DDL language changes)
- DML / DQL work with tablet-based tables
Replicas for tablet-based tables are chosen from tablet metadata
instead of token metadata
Things which are not yet implemented:
- handling of views, indexes, CDC created on tablet-based tables
- sharding is done using the old method, it ignores the shard allocated in tablet metadata
- node operations (topology changes, repair, rebuild) are not handling tablet-based tables
- not integrated with compaction groups
- tablet allocator piggy-backs on tokens to choose replicas.
Eventually we want to allocate based on current load, not statically
Closes#13387
* github.com:scylladb/scylladb:
test: topology: Introduce test_tablets.py
raft: Introduce 'raft_server_force_snapshot' error injection
locator: network_topology_strategy: Support tablet replication
service: Introduce tablet_allocator
locator: Introduce tablet_aware_replication_strategy
locator: Extract maybe_remove_node_being_replaced()
dht: token_metadata: Introduce get_my_id()
migration_manager: Send tablet metadata as part of schema pull
storage_service: Load tablet metadata when reloading topology state
storage_service: Load tablet metadata on boot and from group0 changes
db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata()
migration_notifier: Introduce before_drop_keyspace()
migration_manager: Make prepare_keyspace_drop_announcement() return a future<>
test: perf: Introduce perf-tablets
test: Introduce tablets_test
test: lib: Do not override table id in create_table()
utils, tablets: Introduce external_memory_usage()
db: tablets: Add printers
db: tablets: Add persistence layer
dht: Use last_token_of_compaction_group() in split_token_range_msb()
locator: Introduce tablet_metadata
dht: Introduce first_token()
dht: Introduce next_token()
storage_proxy: Improve trace-level logging
locator: token_metadata: Fix confusing comment on ring_range()
dht, storage_proxy: Abstract token space splitting
Revert "query_ranges_to_vnodes_generator: fix for exclusive boundaries"
db: Exclude keyspace with per-table replication in get_non_local_strategy_keyspaces_erms()
db: Introduce get_non_local_vnode_based_strategy_keyspaces()
service: storage_proxy: Avoid copying keyspace name in write handler
locator: Introduce per-table replication strategy
treewide: Use replication_strategy_ptr as a shorter name for abstract_replication_strategy::ptr_type
locator: Introduce effective_replication_map
locator: Rename effective_replication_map to vnode_effective_replication_map
locator: effective_replication_map: Abstract get_pending_endpoints()
db: Propagate feature_service to abstract_replication_strategy::validate_options()
db: config: Introduce experimental "TABLETS" feature
db: Log replication strategy for debugging purposes
db: Log full exception on error in do_parse_schema_tables()
db: keyspace: Remove non-const replication strategy getter
config: Reformat
in C++20, compiler generate operator!=() if the corresponding
operator==() is already defined, the language now understands
that the comparison is symmetric in the new standard.
fortunately, our operator!=() is always equivalent to
`! operator==()`, this matches the behavior of the default
generated operator!=(). so, in this change, all `operator!=`
are removed.
in addition to the defaulted operator!=, C++20 also brings to us
the defaulted operator==() -- it is able to generated the
operator==() if the member-wise lexicographical comparison.
under some circumstances, this is exactly what we need. so,
in this change, if the operator==() is also implemented as
a lexicographical comparison of all memeber variables of the
class/struct in question, it is implemented using the default
generated one by removing its body and mark the function as
`default`. moreover, if the class happen to have other comparison
operators which are implemented using lexicographical comparison,
the default generated `operator<=>` is used in place of
the defaulted `operator==`.
sometimes, we fail to mark the operator== with the `const`
specifier, in this change, to fulfil the need of C++ standard,
and to be more correct, the `const` specifier is added.
also, to generate the defaulted operator==, the operand should
be `const class_name&`, but it is not always the case, in the
class of `version`, we use `version` as the parameter type, to
fulfill the need of the C++ standard, the parameter type is
changed to `const version&` instead. this does not change
the semantic of the comparison operator. and is a more idiomatic
way to pass non-trivial struct as function parameters.
please note, because in C++20, both operator= and operator<=> are
symmetric, some of the operators in `multiprecision` are removed.
they are the symmetric form of the another variant. if they were
not removed, compiler would, for instance, find ambiguous
overloaded operator '=='.
this change is a cleanup to modernize the code base with C++20
features.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13687
Fix two issues with the replace operation introduced by recent PRs.
Add a test which performs a sequence of basic topology operations (bootstrap,
decommission, removenode, replace) in a new suite that enables the `raft`
experimental feature (so that the new topology change coordinator code is used).
Fixes: #13651Closes#13655
* github.com:scylladb/scylladb:
test: new suite for testing raft-based topology
test: remove topology_custom/test_custom.py
raft topology: don't require new CDC generation UUID to always be present
raft topology: include shard_count/ignore_msb during replace
That's, in fact, an independent change, because feature enabler doesn't
need this method. So this patch is like "while at it" thing, but on the
other hand it ditches one more qctx usage.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
All callers now have the system keyspace instance at hand.
Unfortunately, this de-static doesn't allow more qctx drops, because
both methods use set_|get_scylla_local_param helpers that do use qctx
and are still in use by other static methods.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This code belongs to feature service, system keyspace shoulnd't be aware
of any pecularities of startup features enabling, only loading and
saving the feature lists.
For now the move happens only in terms of code declarations, the
implementation is kept in its old place to reduce the patch churn.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
During node replace we don't introduce a new CDC generation, only during
regular bootstrap. Instead of checking that `new_cdc_generation_uuid`
must be present whenever there's a topology transition, only check it
when we're in `commit_cdc_generation` state.
Will be used by tablet-based replication strategies, for which
effective replication map is different per table.
Also, this patch adapts existing users of effective replication map to
use the per-table effective replication map.
For simplicity, every table has an effective replication map, even if
the erm is per keyspace. This way the client code can be uniform and
doesn't have to check whether replication strategy is per table.
Not all users of per-keyspace get_effective_replication_map() are
adapted yet to work per-table. Those algorithms will throw an
exception when invoked on a keyspace which uses per-table replication
strategy.
`seastar::current_backtrace()` can be quite heavey.
When we pass it to a log message in relatively detailed log_level
(debug/trace), we pay the price of `current_backtrace` every time,
but we rarely print the message.
Closes#13527
* github.com:scylladb/scylladb:
locator/topology: call seastar::current_backtrace only when log_level is enabled
schema_tables: call seastar::current_backtrace only when log_level is enabled
This series cleans up the generation and value types used in gms / gossiper.
Currently we use a blend of int, int32_t, and int64_t around messaging.
This change defines gms::generation_type and gms::version_type as int32_t
and add check in non-release modes that the respective int64 value passed over messaging do not overflow 32 bits.
Closes#12966
* github.com:scylladb/scylladb:
gossiper: version_generator: add {debug_,}validate_gossip_generation
gms: gossip_digest: use generation_type and version_type
gms: heart_beat_state: use generation_type and version_type
gms: versioned_value: use version_type
gms: version_generator: define version_type and generation_type strong types
utils: move generation-number to gms
utils: add tagged_integer
gms: versioned_value: make members private
scylla-gdb: add get_gms_versioned_value
gms: versioned_value: delete unused compare_to function
gms: gossip_digest: delete unused compare_to function
The only reason why it's there (right next to compaction_fwd.hh) is
because the database::table_truncate_state subclass needs the definition
of compaction_manager::compaction_reenabler subclass.
However, the former sub is not used outside of database.cc and can be
defined in .cc. Keeping it outside of the header allows dropping the
compaction_manager.hh from database.hh thus greatly reducing its fanout
over the code (from ~180 indirect inclusions down to ~20).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13622
Derived from utils::tagged_integer, using different tags,
the types are incompatible with each other and require explicit
typecasting to- and from- their value type.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>