Commit Graph

725 Commits

Author SHA1 Message Date
Pavel Emelyanov
eb3b237e05 tablet_allocator: Add initial tablets scale to config
When allocating tablets for table for the frist time their initial count
is calculated so that each shard in a cluster gets one tablet. It may
happen that more than one initial tablet per shard is better, e.g. perf
tests typically rely on that.

It's possible to specify the initial tablets count when creating a
keyspace, this number doesn't take the cluster topology into
consideration and may also be not very nice.

As a temporary solution (e.g. for perf tests) we may add a configurable
that scales the initial number of calculated tablets by some factor

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-22 19:14:45 +03:00
Kefu Chai
ce076b5ae3 gossiping_property_file_snitch: drop unused using namespace
we don't use any symbol in this namespace, in this function, so drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16893
2024-01-21 16:48:37 +02:00
Pavel Emelyanov
8595d64d01 locator: Handle replication factor of 0 for initial_tablets calculations
When calculating per-DC tablets the formula is shards_in_dc / rf_in_dc,
but the denominator in it can be configured to be literally zero and the
division doesn't work.

Fix by assuming zero tablets for dcs with zero rf

fixes: #16844

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#16861
2024-01-18 19:42:08 +02:00
Kefu Chai
0ae81446ef ./: not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16766
2024-01-17 16:30:14 +02:00
Pavel Emelyanov
941f6d8fca cql: Move initial_tablets from REPLICATION to TABLETS in DDL
This patch changes the syntax of enabling tablets from

  CREATE KEYSPACE ... WITH REPLICATION = { ..., 'initial_tablets': <int> }

to be

  CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> }

and updates all tests accordingly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:04:48 +03:00
Pavel Emelyanov
4c4a9679d8 network_topology_strategy: Estimate initial_tablets if 0 is set
If user configured zero initial tablets (spoiler: or this value was set
automagically when enabling tablets begind the scenes) we still need
some value to start with and this patch calculates one.

The math is based on topology and RF so that all shards are covered:

initial_tablets = max(nr_shards_in(dc) / RF_in(dc) for dc in datacenters)

The estimation is done when a table is created, not when the keyspace is
created. For that, the keyspace is configured with zero initial tabled,
and table-creation time zero is converted into auto-estimated value.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:04:48 +03:00
Gleb Natapov
23a27ccc24 vnode_effective_replication_map: add get_all_pending_nodes() function
Add a function that returns all nodes that have vnode been moved to them
during a topology change operation. Needed to know which nodes need to
do cleanup in case of failed topology change operation.
2024-01-14 14:37:16 +02:00
Gleb Natapov
a8f11852da vnode_effective_replication_map: pre calculate dirty endpoints during topology change
Some topology change operations causes some nodes loose ranges. This
information is needed to know which nodes need to do cleanup after
topology operation completes. Pre calculate it during erm creation.
2024-01-14 14:11:19 +02:00
Petr Gusev
41c15814e6 erm: for_each_natural_endpoint_until: use is_vnode == true
This is an optimisation - for_each_natural_endpoint_until is
called only for vnode tokens, we don't need to run the
binary search for it in tm.first_token.

Also the function is made private since it's only used
in erm itself.
2024-01-12 12:23:22 +04:00
Petr Gusev
07f2ec63c7 erm: switch the internal data structures to host_id-s
Before this patch the host_id -> IP mapping was done
in calculate_effective_replication_map. This function
is called from mutate_token_metadata, which means we
have to have an IP for each host_id in topology_state_load,
otherwise we get an error. We are going to remove
the IP waiting loop from topology_state_load, so
we need to get rid of IPs resolution from
calculate_effective_replication_map.

In this patch we move the host_id -> IP resolution to
the data plane. When a write or read request is sent
the target endpoints are requested from erm through
get_natural_endpoints_without_node_being_replaced,
get_pending_endpoints and get_endpoints_for_reading
methods and this is where the IP resolution
will now occur.
2024-01-12 12:23:22 +04:00
Petr Gusev
1928dc73a8 erm: has_pending_ranges: switch to host_id
In the next patches we are going to change erm data structures
(replication_map and ring_mapping) from IP to host_id. Having
locator::host_id instead of IP in has_pending_ranges arguments
makes this transition easier.
2024-01-12 12:23:19 +04:00
Pavel Emelyanov
cfeff893c6 network_topology_strategy: Print map of dc:rf pairs in one go
The strategy constructor prints the dc:rf at the end making the sstring
for it by hand. Modern fmt-based logger can format unordered_map-s on
its own. The message would look slightly different though:

  Configured datacenter replicas are: foo:1 bar:2

into

  Configured datacenter replicas are: {"foo": 1, "bar": 2}

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#16443
2024-01-09 11:30:49 +02:00
Kefu Chai
2c394e3f6f tablets: remove unused #includes
the removed #include headers are not used, so let's drop their
`#include`s.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16619
2024-01-03 15:30:40 +01:00
Benny Halevy
ad8a9104d8 endpoint_state subscriptions: batch on_change notification
Rather than calling on_change for each particular
application_state, pass an endpoint_state::map_type
with all changed states, to be processed as a batch.

In particular, thise allows storage_service::on_change
to update_peer_info once for all changed states.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Takuya ASADA
0b894a7cac locator::ec2_snitch: change retry logic to exponential backoff
Since Amazon recommended to use exponential backoff logic when retries
to call AWS API, we should switch the logic on ec2_snitch.

see https://docs.aws.amazon.com/general/latest/gr/api-retries.html

Related with #12160

Closes scylladb/scylladb#13442
2023-12-25 18:17:23 +02:00
Pavel Emelyanov
c43501d973 locator,schema: Move initial tablets from r.s. options to params
The option is kepd in DDL, but is _not_ stored in
system_schema.keyspaces. Instead, it's removed from the provided options
and kept in scylla_keyspaces table in its own column. All the places
that had optional initial_tablets disengaged now set this value up the
way the find appropriate.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 16:07:10 +03:00
Pavel Emelyanov
562fcf0c19 locator: Keep optional initial_tablets on r.s. params
Now all the callers have it at hands (spoiler: not yet initialized, but
still) so the params can also have it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 16:02:41 +03:00
Pavel Emelyanov
45f4276de6 locator: Pass abstract_replication_strategy& into validate_tablet_options()
It will need to check if the r.s. in question had been marked as
per-table one in next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 15:56:49 +03:00
Pavel Emelyanov
bf824d79d9 locator: Carry r.s. params into process_tablet_options()
The latter method is the one that will need extended params in next
patches. It's called from network_topology_strategy() constructor which
already has params at hand.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 15:56:02 +03:00
Pavel Emelyanov
a943bd927b locator: Call create_replication_strategy() with r.s. params
Previous patch added params to r.s. classes' constructors, but callers
don't construct those directly, instead they use the create_r.s.()
wrapper. This patch adds params to the wrapper too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 15:54:59 +03:00
Pavel Emelyanov
f88ba0bf5a locator: Wrap replication_strategy_config_options into replication_strategy_params
When replication strategy class is created caller parr const reference
on the config options which is, in turn, a map<string, string>. In the
future r.s. classes will need to get "scylla specific" info along with
legacy options and this patch prepares for that by passing more generic
params argument into constructor. Currently the only inhabitant of the
new params is the legacy options.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 15:53:03 +03:00
Pavel Emelyanov
ecbafd81f2 locator: Use local members in ..._replication_strategy constructors
The `config_options` arg had been used to initialize `_config_options`
field of the base abstract_replication_strategy class, so it's more
idiomatic to use the latter. Also it makes next patches simpler.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 15:51:51 +03:00
Raphael S. Carvalho
bcbba9a5e3 locator: Introduce tablet_map::get_tablet_id_and_range_side(token)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:26:32 -03:00
Kefu Chai
10a11c2886 token_metadata: pass node id when formatting it
before this change, we use the format string of
"Can't replace node {} with itself", but fail to include the host id as seastar::format()'s arguments. this fails the compile-time check of fmt, which is yet merged. so, if we really run into this problem, {fmt} would throw before the intended runtime_error is raised -- currently, seastar::log formats the logging messages at runtime, this is not intended.

in this change, we pass `existing_node`, so it can be formatted, and the
intended error message can be printed in log.

Refs 11a4908683
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16422
2023-12-15 16:43:44 +01:00
Kefu Chai
764d1e01da locator: include used headers
* exceptions/exceptions.hh is not used
* std::set is not used, while std::unordered_set is uset

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16406
2023-12-14 16:14:01 +01:00
Petr Gusev
9d93a518ac topology: remove_endpoint: remove inet_address overload
The overload was used only in tests.
2023-12-12 23:19:54 +04:00
Petr Gusev
fbf507b1ba token_metadata: topology: cleanup add_or_update_endpoint
Make host_id parameter non-optional and
move it to the beginning of the arguments list.

Delete unused overloads of add_or_update_endpoint.

Delete unused overload of token_metadata::update_topology
with inet_address argument.
2023-12-12 23:19:54 +04:00
Petr Gusev
11a4908683 token_metadata: add_replacing_endpoint: forbid replacing node with itself
This used to work before in replace-with-same-ip scenario, but
with host_id-s it's no longer relevant.

base_token_metadata has been removed from topology_change_info
because the conditions needed for its creation
are no longer met.
2023-12-12 23:19:54 +04:00
Petr Gusev
3b59919a9c topology: drop key_kind, host_id is now the primary key 2023-12-12 23:19:54 +04:00
Petr Gusev
8c551f9104 dc_rack_fn: make it non-template 2023-12-12 23:19:54 +04:00
Petr Gusev
7b55ccbd8e token_metadata: drop the template
Replace token_metadata2 ->token_metadata,
make token_metadata back non-template.

No behavior changes, just compilation fixes.
2023-12-12 23:19:54 +04:00
Petr Gusev
799f747c8f shared_token_metadata: switch to the new token_metadata 2023-12-12 23:19:54 +04:00
Petr Gusev
11cc21d0a9 erm: switch to the new token_metadata
In this commit we replace token_metadata with token_metadata2
in the erm interface and field types. To accommodate the change
some of strategy-related methods are also updated.

All the boost and topology tests pass with this change.
2023-12-12 23:19:53 +04:00
Petr Gusev
f53f34f989 storage_service: get_token_to_endpoint_map: use new token_metadata
The token_metadata::get_normal_and_bootstrapping_token_to_endpoint_map
method was used only here. It's inlined in this
commit since it's too specific and incurs the overhead
of creating an intermediate map.
2023-12-12 23:19:53 +04:00
Petr Gusev
b2fb650098 calculate_natural_endpoints: fix indentation 2023-12-12 23:19:53 +04:00
Petr Gusev
80ccbc0d53 calculate_natural_endpoints: switch to token_metadata2
All usages of calculate_natural_endpoints are migrated,
now we can change its interface to take token_metadata2
instead of token_metadata.
2023-12-12 23:19:53 +04:00
Petr Gusev
ef534ac876 rebuild_with_repair, replace_with_repair: use new token_metadata
Just mechanical changes to the new token_metadata.

All the boost and topology tests pass with this change.
2023-12-12 23:19:53 +04:00
Petr Gusev
93263bf9e7 bootstrap: use new token_metadata
Just mechanical changes to the new token_metadata.

All the boost and topology tests pass with this change.
2023-12-12 23:19:53 +04:00
Petr Gusev
d9283bd025 tablets: switch to token_metadata2
locator_topology_test, network_topology_strategy_test and
tablets_test are fully switched to the host_id-based token_metadata,
meaning they no longer populate the old token_metadata.

All the boost and topology tests pass with this change.
2023-12-12 23:19:53 +04:00
Petr Gusev
f5038f6c72 calculate_effective_replication_map: use new token_metadata
In this commit we switch the function
calculate_effective_replication_map to use the new
token_metadata. We do this by employing our new helper
calculate_natural_ips function. We can't use this helper for
current_endpoints/target_endpoints though,
since in that case we won't add the IP to the
pending_endpoints in the replace-with-same-ip scenario

The token_metadata_test is migrated to host_ids in the same
commit to make it pass. Other tests work because they fill
both versions of the token_metadata, but for this test it was
simpler to just migrate it straight away. The test constructs
the old token_metadata over the new token_metadata,
this means only the get_new() method will work on it. That's
why we also need to switch some other functions
(maybe_remove_node_being_replaced, do_get_natural_endpoints,
get_replication_factor) to the new version in the same commit.

All the boost and topology tests pass with this change.
2023-12-12 23:19:53 +04:00
Petr Gusev
fe3c543c4e calculate_natural_endpoints: fix formatting 2023-12-12 23:19:53 +04:00
Petr Gusev
d5b4b02b28 abstract_replication_strategy: calculate_natural_endpoints: make it work with both versions of token_metadata
We've updated all the places where token_metadata
is mutated, and now we can progress to the next stage
of the refactoring - gradually switching the read
code paths.

The calculate_natural_endpoints function
is at the core of all of them. It decides to what nodes
the given token should be replicated to for the given
token_metadata. It has a lot of usages in various contexts,
we can't switch them all in one commit, so instead we
allowed the function to behave in both ways. If
use_host_id parameter is false, the function uses the provided
token_metadata as is and returns endpoint_set as a result.
If it's true, it uses get_new() on the provided token_metadata
and returns host_id_set as a result.

The scope of the whole refactoring is limited to the erm data
structure, its interface will be kept inet_address based for now.
This means we'll often need to resolve host_ids to inet_address-es
as soon as we got a result from calculated_natural_endpoints.
A new calculate_natural_ips function is added for convenience.
It uses the new token_metadata and immediately resolves
returned host_id-s to inet_address-es.

The auxiliary declarations natural_ep_type, set_type, vector_type,
get_self_id, select_tm are introduced only for the sake of
migration, they will be removed later.
2023-12-12 23:19:53 +04:00
Petr Gusev
e4253776a1 locator::topology: allow being_replaced and replacing nodes to have the same IP
When we're replacing a node with the same IP address, we want
the following behavior:
  * host_id -> IP mapping should work and return the same IP address for two
  different host_ids - old and new.
  * the IP -> host_id mapping should return the host_id of the old (replaced)
  host.
This variant is most convenient for preserving the current behavior
of the code, especially the functions maybe_remove_node_being_replaced,
erm::get_natural_endpoints_without_node_being_replaced,
erm::get_pending_endpoints. The 'being_replaced' node will be properly removed in
maybe_remove_node_being_replaced and 'replacing' node will be added to
the pending_endpoints.
2023-12-11 12:51:34 +04:00
Petr Gusev
5a1418fdba token_metadata: get_endpoint_for_host_id -> get_endpoint_for_host_id_if_known
This commit fixes an inconsistency in method names:
get_host_id and get_host_id_if_known are
(internal_error, returns null), but there was only
one method for the opposite conversion - get_endpoint_for_host_id,
and it returns null. In this commit we change it to on_internal_error
if it can't find the argument and add another method
get_endpoint_for_host_id_if_known which returns null in this case.

We can't use get_endpoint_for_host_id/get_host_id
in host_id_or_endpoint::resolve since it's called
from storage_service::parse_node_list
-> token_metadata::parse_host_id_and_endpoint,
and exceptions are caught and handled in
`storage_service::parse_node_list`.
2023-12-11 12:51:34 +04:00
Petr Gusev
08b47d645a token_metadata: get_host_id: exception -> on_internal_error
It's a bug to use get_host_id on a non-existent endpoint,
so on_internal_error is more appropriate. Also, it's
easier to debug since it provides a backtrace.

If a missing inet_address is expected, get_host_id_if_known
should be used instead. We update one such case in
storage_service::force_remove_completion. Other
usages of get_host_id are correct.
2023-12-11 12:51:34 +04:00
Petr Gusev
39bbe5f457 token_metadata: add get_all_ips method
This is convenient for migrating code that uses
get_all_endpoints.
2023-12-11 12:51:34 +04:00
Petr Gusev
9edf0709e6 token_metadata: support host_id-based version
In this commit we enhance token_metadata with a pointer to the
new host_id-based generic_token_metadata specialisation (token_metadata2).
The idea is that in the following commits we'll go over all token_metadata
modifications and make the corresponding modifications to its new
host_id-based alternative.

The pointer to token_metadata2 is stored in the
generic_token_metadata::_new_value field. The pointer can be
mutable, immutable, or absent altogether (std::monostate).
It's mutable if this generic_token_metadata owns it, meaning
it was created using the generic_token_metadata(config cfg)
constructor. It's immutable if the
generic_token_metadata(lw_shared_ptr<const token_metadata2> new_value);
constructor was used. This means this old token_metadata is a wrapper for
new token_metadata and we can only use the get_new() method on it. The field
_new_value is empty for the new host_id-based token_metadata version.

The generic_token_metadata(std::unique_ptr<token_metadata_impl<NodeId>> impl, token_metadata2 new_value);
constructor is used for clone methods. We clone both versions,
and we need to pass a cloned token_metadata2 into constructor.

There are two overloads of get_new, for mutable and immutable
generic_token_metadata. Both of them throws an exception if
they can't get the appropriate pointer. There is also a
get_new_strong method, which returns an immutable owning
pointer. This is convenient since a lot of API's want an
owning pointer. We can't make the get_new/get_new_strong API
simpler and use get_new_strong everywhere since it mutate the
original generic_token_metadata by incrementing the reference
counter and this causes raises when it's passed between
shards in replicate_to_all_cores.
2023-12-11 12:51:34 +04:00
Petr Gusev
63f64f3303 token_metadata: make it a template with NodeId=inet_address/host_id
NodeId is used in all internal token_metadata data structures, that
previously used inet_address. We choose topology::key_kind based
on the value of the template parameter.

generic_token_metadata::update_topology overload with host_id
parameter is added to make update_topology_change_info work,
it now uses NodeId as a parameter type.

topology::remove_endpoint(host_id) is added to make
generic_token_metadata::remove_endpoint(NodeId) work.

pending_endpoints_for and endpoints_for_reading are just removed - they
are not used and not implemented. The declarations were left by mistake
from a refactoring in which these methods were moved to erm.

generic_token_metadata_base is extracted to contain declarations, common
to both token_metadata versions.

Templates are explicitly instantiated inside token_metadata.cc, since
implementation part is also a template and it's not exposed to the header.

There are no other behavioral changes in this commit, just syntax
fixes to make token_metadata a template.
2023-12-11 12:51:34 +04:00
Petr Gusev
c9fbe3d377 locator: make dc_rack_fn a template
In the next commits token_metadata will be
made a template with NodeId=inet_address|host_id
parameter. This parameter will be passed to dc_rack_fn
function, so it also should be made a template.
2023-12-11 12:51:33 +04:00
Piotr Dulikowski
5227b71363 locator/topology: add key_kind parameter
For the host_id-based token_metadata we want host_id
to be the main node key, meaning it should be used
in add_or_update_endpoint to find the node to update.
For the inet_address-based token_metadata version
we want to retain the old behaviour during transition period.

In this commit we introduce key_kind parameter and use
key_kind::inet_address in all current topology usages.
Later we'll use key_kind::host_id for the new token_metadata.

In the last commits of the series, when the new token_metadata
version is used everywhere, we will remove key_kind enum.
2023-12-11 12:51:33 +04:00