Commit Graph

28 Commits

Author SHA1 Message Date
Benny Halevy
d903d03bf8 locator: topology: node::state: make fine grained
Currently the node::state is coarse grained
so one cannot distinguish between e.g. a leaving
node due to decommission (where the node is used
for reading) vs. due to remove node (where the
node is not used for reading).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-07-31 10:33:48 +03:00
Kefu Chai
3129ae3c8c treewide: compare signed and unsigned using std::cmp_*()
when comparing signed and unsigned numbers, the compiler promotes
the signed number to coomon type -- in this case, the unsigned type,
so they can be compared. but sometimes, it matters. and after the
promotion, the comparison yields the wrong result. this can be
manifested using a short sample like:

```
int main(int argc, char **argv) {
    int x = -1;
    unsigned y = 2;
    fmt::print("{}\n", x < y);
    return 0;
}
```

this error can be identified by `-Werror=sign-compare`, but before
enabling this compiling option. let's use `std::cmp_*()` to compare
them.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-18 10:27:18 +08:00
Patryk Jędrzejczak
7ae7be0911 locator: remove this_host_id from topology::config
The `locator::topology::config::this_host_id` field is redundant
in all places that use `locator::topology::config`, so we can
safely remove it.

Closes #14638

Closes #14723
2023-07-17 14:57:36 +02:00
Petr Gusev
3737bf8fa2 topology.cc: unindex_node: _dc_racks removal fix
The eps reference was reused to manipulate
the racks dictionary. This resulted in
assigning a set of nodes from the racks
dictionary to an element of the _dc_endpoints dictionary.

The problem was demonstrated by the dtest
test_decommission_last_node_in_rack
(scylladb/scylla-dtest#3299).
The test set up four nodes, three on one rack
and one on another, all within a single data
center (dc). It then switched to a
'network_topology_strategy' for one keyspace
and tried to decommission the single node
on the second rack. This decomission command
with error message 'zero replica after the removal.'
This happened because unindex_node assigned
the empty list from the second rack
as a value for the single dc in
_dc_endpoints dictionary. As a result,
we got empty nodes list for single dc in
natural_endpoints_tracker::_all_endpoints,
node_count == 0 in data_center_endpoints,
_rf_left == 0, so
network_topology_strategy::calculate_natural_endpoints
rejected all the endpoints and returned an empty
endpoint_set. In
repair_service::do_decommission_removenode_with_repair
this caused the 'zero replica after the removal' error.

With this fix the test passes both with
--consistent-cluster-management option and
without it.

The specific unit test for this problem was added.

Fixes: #14184

Closes #14673
2023-07-13 11:16:01 +03:00
Tomasz Grabiec
e110167a2a locator: Store node shard count in topology
Will be needed by tablet allocator.
2023-06-21 00:58:25 +02:00
Botond Dénes
85abece927 Merge 'Restrict logging of current_backtrace to log_level' from Benny Halevy
`seastar::current_backtrace()` can be quite heavey.
When we pass it to a log message in relatively detailed log_level
(debug/trace), we pay the price of `current_backtrace` every time,
but we rarely print the message.

Closes #13527

* github.com:scylladb/scylladb:
  locator/topology: call seastar::current_backtrace only when log_level is enabled
  schema_tables: call seastar::current_backtrace only when log_level is enabled
2023-04-24 08:50:32 +03:00
Tomasz Grabiec
0ec700cd00 locator: topology: Fix move assignment
Defaulted assignment doesn't update node::_topology.
2023-04-20 23:39:18 +02:00
Tomasz Grabiec
6ed841b8d7 locator: topology: Add printer 2023-04-20 23:39:18 +02:00
Tomasz Grabiec
7d3384089a locator: topology: Recognize local node as part of indexing it
Fixes a problem when raft-based topology is enabled, which loads
topology from storage. It starts by clearing topology and then adding
nodes one by one. Before this patch, this violates internal invariant
of topology object which puts the local node as the first node. This
would manifest by triggering an assert in topology::pop_node() which
throws if popping the node at index 0 in order to keep the information
about local node around. This is normally prevented by a check in
topology::remove_node() which avoid calling pop_node() if removing the
local node. But since there is no node which is marked as local, this
check allows the first node to be popped.

To fix the problem I lift the invariant that local node is always in
_nodes. We still have information about local node in config. Instead
of keeping it in _nodes, we recognize it as part of indexing. We also
allow removing the local node like a regular node.

The path which reloads topology works correctly after this, the local
node will be recognized when (if) it is added to the topology.

Fixes #13495
2023-04-20 23:39:18 +02:00
Tomasz Grabiec
eb9d6df8bf locator: topology: Fix get_location(ep) for local node
topology config may designate a different node than
get_broadcast_address() as local node. In particular, some tests don't
designate any node as the local node, which leads to logic errors
where current get_location(ep) for ep which happens to have the
address 127.0.0.1 returns location of the first node in _nodes rather
than ep.

Fix by looking up in _nodes first and fall back to local node if it's
equal to configured local node (if any).
2023-04-20 23:39:18 +02:00
Tomasz Grabiec
0a675291dd locator: topology: Fix typo 2023-04-20 23:39:18 +02:00
Tomasz Grabiec
0b1dfb2683 locator: topology: Preserve config when cloning
Config is separate from state of the topology (nodes it
contains). Preserving the config will make it easier in later patches
to maintain invariants for cloned instances.
2023-04-20 23:39:18 +02:00
Benny Halevy
58129fad92 locator/topology: call seastar::current_backtrace only when log_level is enabled
`seastar::current_backtrace()` can be quite heavey.
When we pass it to a log message in relatively detailed log_level
(debug/trace), we pay the price of `current_backtrace` every time,
but we rarely print the message.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-16 14:22:06 +03:00
Benny Halevy
e29994b2aa topology: add_node, unindex_node: make exception safe
Current if index_node throws when trying to
add an already indexed node, pop_node might
unindex the existing node instead of the new one.

Instead, with this change, unindex_node looks up
the node by its pointer and removed it from the
index map only if it's found there so to clean up
safely after index_node throws (at any stage).

Add a unit test to verify that.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-14 17:51:05 +03:00
Benny Halevy
b71f229fc2 topology: node: update_node: do not override internal changed flag by state option
Currently, opt_st overrides the internal `changed` flag
by setting it with the opt_st changed status.
Instead, it should use `|=` to keep it true if it is already so.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #13502
2023-04-13 17:46:59 +02:00
Benny Halevy
7b76369ffc topology: add for_each_node
To eventually replace token_metadata::get_endpoint_to_host_id_map_for_reading

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-11 15:55:39 +03:00
Benny Halevy
c17df1759e topology: add node state
Add a simple node state model with:
`joining`, `normal`, `leaving`, and `left` states
to help managing nodes during replace
with the the same ip address.

Later on, this could also help prevent nodes
that were decommissioned, removed, or replaced
from rejoining the cluster.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-02 20:18:31 +03:00
Benny Halevy
027f188a97 topology: remove dead code
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-02 20:13:04 +03:00
Benny Halevy
f3d5df5448 locator: add class node
And keep per node information (idx, host_id, endpoint, dc_rack, is_pending)
in node objects, indexed by topology on several indices like:
idx, host_id, endpoint, current/pending, per dc, per dc/rack.

The node index is a shorthand identifier for the node.

node* and index are valid while the respective topology instance is valid.
To be used, the caller must hold on to the topology / token_metadata object
(e.g. via a token_metadata_ptr or effective_replication_map)

Refs #6403

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

topology: add node idx

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-02 20:13:02 +03:00
Benny Halevy
006e02410f topology: rename update_endpoint to add_or_update_endpoint
To reflect what it does,

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-02 20:08:03 +03:00
Benny Halevy
df1c92649e topology: define get_{rack,datacenter} inline
Define get_location() that gets the location
for the local node, and use either this entry point
or get_location(inet_address) to get the respective
dc or rack.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-02 20:07:49 +03:00
Benny Halevy
9cce01a12c locator: endpoint_dc_rack: refactor default_location
Refactor the thread_local default_location out of
topology::get_location so it can be used elsewhere.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-02 20:06:53 +03:00
Benny Halevy
bb36237cf4 topology: optimize compare_endpoints
This function is called on the fast data path
from storage_proxy when sorting multiple endpoints
by proximity.

This change calculates numeric node diff metrics
based on each address proximity to a given node
(by <dc, rack, same node>) to eliminate logic
branches in the function and reduce its footprint.

based on objdump -d output, compare_endpoints
footprint was reduced by 58.5% (3632 / 8752 bytes)
with clang version 15.0.7 (Fedora 15.0.7-1.fc37)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-15 11:48:24 +02:00
Benny Halevy
68141d0aac topology: get rid of pending state
Now, with a44ca06906,
is_normal_token_owner that replaced is_member
does not rely anymore on the pending status
of endpoints in topology.

With that we can get rid of this state and just keep
all endpoints we know about in the topology.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-13 14:17:18 +02:00
Benny Halevy
f2753eba30 topology: debug log update and remove endpoint
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-13 14:17:13 +02:00
Pavel Emelyanov
ae79669fd2 topology: Be less restrictive about missing endpoints
Recent changes in topology restricted the get_dc/get_rack calls. Older
code was trying to locate the endpoint in gossiper, then in system
keyspace cache and if the endpoint was not found in both -- returned
"default" location.

New code generates internal error in this case. This approach already
helped to spot several BUGs in code that had been eventually fixed, but
echoes of that change still pop up.

This patch relaxes the "missing endpoint" case by printing a warning in
logs and returning back the "default" location like old code did.

tests: update_cluster_layout_tests.py::*
       hintedhandoff_additional_test.py::TestHintedHandoff::test_hintedhandoff_rebalance
       bootstrap_test.py::TestBootstrap::test_decommissioned_wiped_node_can_join
       bootstrap_test.py::TestBootstrap::test_failed_bootstap_wiped_node_can_join
       materialized_views_test.py::TestMaterializedViews::test_decommission_node_during_mv_insert_4_nodes

refs: #11900
refs: #12054
fixes: #11870

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12067
2022-11-28 22:01:09 +02:00
Benny Halevy
996eac9569 topology: add get_datacenters
Returns an unordered set of datacenter names
to be used by network_topology_replication_strategy
and for ks_prop_defs.

The set is kept in sync with _dc_endpoints.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12023
2022-11-23 18:39:36 +02:00
Benny Halevy
d0bd305d16 locator: refactor topology out of token_metadata
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-16 21:55:54 +02:00