Since 5d1f60439a we have
this node's host_id in topology config, so it can be used
to determine this node when adding it.
Prepare for extending the token_metadata interface
to provide host_id in update_topology.
We would like to compare the host_id first to be able to distinguish
this node from a node we're replacing that may have the same ip address
(but different host_id).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
In branch 5.2 we erase `dc` from `_datacenters` if there are
no more endpoints listed in `_dc_endpoints[dc]`.
This was lost unintentionally in f3d5df5448
and this commit restores that behavior, and fixes test_remove_endpoint.
Fixesscylladb/scylladb#14896
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#14897
Currently the node::state is coarse grained
so one cannot distinguish between e.g. a leaving
node due to decommission (where the node is used
for reading) vs. due to remove node (where the
node is not used for reading).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
when comparing signed and unsigned numbers, the compiler promotes
the signed number to coomon type -- in this case, the unsigned type,
so they can be compared. but sometimes, it matters. and after the
promotion, the comparison yields the wrong result. this can be
manifested using a short sample like:
```
int main(int argc, char **argv) {
int x = -1;
unsigned y = 2;
fmt::print("{}\n", x < y);
return 0;
}
```
this error can be identified by `-Werror=sign-compare`, but before
enabling this compiling option. let's use `std::cmp_*()` to compare
them.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
The `locator::topology::config::this_host_id` field is redundant
in all places that use `locator::topology::config`, so we can
safely remove it.
Closes#14638Closes#14723
The eps reference was reused to manipulate
the racks dictionary. This resulted in
assigning a set of nodes from the racks
dictionary to an element of the _dc_endpoints dictionary.
The problem was demonstrated by the dtest
test_decommission_last_node_in_rack
(scylladb/scylla-dtest#3299).
The test set up four nodes, three on one rack
and one on another, all within a single data
center (dc). It then switched to a
'network_topology_strategy' for one keyspace
and tried to decommission the single node
on the second rack. This decomission command
with error message 'zero replica after the removal.'
This happened because unindex_node assigned
the empty list from the second rack
as a value for the single dc in
_dc_endpoints dictionary. As a result,
we got empty nodes list for single dc in
natural_endpoints_tracker::_all_endpoints,
node_count == 0 in data_center_endpoints,
_rf_left == 0, so
network_topology_strategy::calculate_natural_endpoints
rejected all the endpoints and returned an empty
endpoint_set. In
repair_service::do_decommission_removenode_with_repair
this caused the 'zero replica after the removal' error.
With this fix the test passes both with
--consistent-cluster-management option and
without it.
The specific unit test for this problem was added.
Fixes: #14184Closes#14673
`seastar::current_backtrace()` can be quite heavey.
When we pass it to a log message in relatively detailed log_level
(debug/trace), we pay the price of `current_backtrace` every time,
but we rarely print the message.
Closes#13527
* github.com:scylladb/scylladb:
locator/topology: call seastar::current_backtrace only when log_level is enabled
schema_tables: call seastar::current_backtrace only when log_level is enabled
Fixes a problem when raft-based topology is enabled, which loads
topology from storage. It starts by clearing topology and then adding
nodes one by one. Before this patch, this violates internal invariant
of topology object which puts the local node as the first node. This
would manifest by triggering an assert in topology::pop_node() which
throws if popping the node at index 0 in order to keep the information
about local node around. This is normally prevented by a check in
topology::remove_node() which avoid calling pop_node() if removing the
local node. But since there is no node which is marked as local, this
check allows the first node to be popped.
To fix the problem I lift the invariant that local node is always in
_nodes. We still have information about local node in config. Instead
of keeping it in _nodes, we recognize it as part of indexing. We also
allow removing the local node like a regular node.
The path which reloads topology works correctly after this, the local
node will be recognized when (if) it is added to the topology.
Fixes#13495
topology config may designate a different node than
get_broadcast_address() as local node. In particular, some tests don't
designate any node as the local node, which leads to logic errors
where current get_location(ep) for ep which happens to have the
address 127.0.0.1 returns location of the first node in _nodes rather
than ep.
Fix by looking up in _nodes first and fall back to local node if it's
equal to configured local node (if any).
Config is separate from state of the topology (nodes it
contains). Preserving the config will make it easier in later patches
to maintain invariants for cloned instances.
`seastar::current_backtrace()` can be quite heavey.
When we pass it to a log message in relatively detailed log_level
(debug/trace), we pay the price of `current_backtrace` every time,
but we rarely print the message.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Current if index_node throws when trying to
add an already indexed node, pop_node might
unindex the existing node instead of the new one.
Instead, with this change, unindex_node looks up
the node by its pointer and removed it from the
index map only if it's found there so to clean up
safely after index_node throws (at any stage).
Add a unit test to verify that.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently, opt_st overrides the internal `changed` flag
by setting it with the opt_st changed status.
Instead, it should use `|=` to keep it true if it is already so.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#13502
Add a simple node state model with:
`joining`, `normal`, `leaving`, and `left` states
to help managing nodes during replace
with the the same ip address.
Later on, this could also help prevent nodes
that were decommissioned, removed, or replaced
from rejoining the cluster.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
And keep per node information (idx, host_id, endpoint, dc_rack, is_pending)
in node objects, indexed by topology on several indices like:
idx, host_id, endpoint, current/pending, per dc, per dc/rack.
The node index is a shorthand identifier for the node.
node* and index are valid while the respective topology instance is valid.
To be used, the caller must hold on to the topology / token_metadata object
(e.g. via a token_metadata_ptr or effective_replication_map)
Refs #6403
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
topology: add node idx
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Define get_location() that gets the location
for the local node, and use either this entry point
or get_location(inet_address) to get the respective
dc or rack.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Refactor the thread_local default_location out of
topology::get_location so it can be used elsewhere.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This function is called on the fast data path
from storage_proxy when sorting multiple endpoints
by proximity.
This change calculates numeric node diff metrics
based on each address proximity to a given node
(by <dc, rack, same node>) to eliminate logic
branches in the function and reduce its footprint.
based on objdump -d output, compare_endpoints
footprint was reduced by 58.5% (3632 / 8752 bytes)
with clang version 15.0.7 (Fedora 15.0.7-1.fc37)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Now, with a44ca06906,
is_normal_token_owner that replaced is_member
does not rely anymore on the pending status
of endpoints in topology.
With that we can get rid of this state and just keep
all endpoints we know about in the topology.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Recent changes in topology restricted the get_dc/get_rack calls. Older
code was trying to locate the endpoint in gossiper, then in system
keyspace cache and if the endpoint was not found in both -- returned
"default" location.
New code generates internal error in this case. This approach already
helped to spot several BUGs in code that had been eventually fixed, but
echoes of that change still pop up.
This patch relaxes the "missing endpoint" case by printing a warning in
logs and returning back the "default" location like old code did.
tests: update_cluster_layout_tests.py::*
hintedhandoff_additional_test.py::TestHintedHandoff::test_hintedhandoff_rebalance
bootstrap_test.py::TestBootstrap::test_decommissioned_wiped_node_can_join
bootstrap_test.py::TestBootstrap::test_failed_bootstap_wiped_node_can_join
materialized_views_test.py::TestMaterializedViews::test_decommission_node_during_mv_insert_4_nodes
refs: #11900
refs: #12054fixes: #11870
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#12067
Returns an unordered set of datacenter names
to be used by network_topology_replication_strategy
and for ks_prop_defs.
The set is kept in sync with _dc_endpoints.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#12023