Commit Graph

1677 Commits

Author SHA1 Message Date
Petr Gusev
5a1418fdba token_metadata: get_endpoint_for_host_id -> get_endpoint_for_host_id_if_known
This commit fixes an inconsistency in method names:
get_host_id and get_host_id_if_known are
(internal_error, returns null), but there was only
one method for the opposite conversion - get_endpoint_for_host_id,
and it returns null. In this commit we change it to on_internal_error
if it can't find the argument and add another method
get_endpoint_for_host_id_if_known which returns null in this case.

We can't use get_endpoint_for_host_id/get_host_id
in host_id_or_endpoint::resolve since it's called
from storage_service::parse_node_list
-> token_metadata::parse_host_id_and_endpoint,
and exceptions are caught and handled in
`storage_service::parse_node_list`.
2023-12-11 12:51:34 +04:00
Petr Gusev
08b47d645a token_metadata: get_host_id: exception -> on_internal_error
It's a bug to use get_host_id on a non-existent endpoint,
so on_internal_error is more appropriate. Also, it's
easier to debug since it provides a backtrace.

If a missing inet_address is expected, get_host_id_if_known
should be used instead. We update one such case in
storage_service::force_remove_completion. Other
usages of get_host_id are correct.
2023-12-11 12:51:34 +04:00
Petr Gusev
c9fbe3d377 locator: make dc_rack_fn a template
In the next commits token_metadata will be
made a template with NodeId=inet_address|host_id
parameter. This parameter will be passed to dc_rack_fn
function, so it also should be made a template.
2023-12-11 12:51:33 +04:00
Avi Kivity
9c0f05efa1 Merge 'Track tablet streaming under global sessions to prevent side-effects of failed streaming' from Tomasz Grabiec
Tablet streaming involves asynchronous RPCs to other replicas which transfer writes. We want side-effects from streaming only within the migration stage in which the streaming was started. This is currently not guaranteed on failure. When streaming master fails (e.g. due to RPC failing), it can be that some streaming work is still alive somewhere (e.g. RPC on wire) and will have side-effects at some point later.

This PR implements tracking of all operations involved in streaming which may have side-effects, which allows the topology change coordinator to fence them and wait for them to complete if they were already admitted.

The tracking and fencing is implemented by using global "sessions", created for streaming of a single tablet. Session is globally identified by UUID. The identifier is assigned by the topology change coordinator, and stored in system.tablets. Sessions are created and closed based on group0 state (tablet metadata) by the barrier command sent to each replica, which we already do on transitions between stages. Also, each barrier waits for sessions which have been closed to be drained.

The barrier is blocked only if there is some session with work which was left behind by unsuccessful streaming. In which case it should not be blocked for long, because streaming process checks often if the guard was left behind and stops if it was.

This mechanism of tracking is fault-tolerant: session id is stored in group0, so coordinator can make progress on failover. The barriers guarantee that session exists on all replicas, and that it will be closed on all replicas.

Closes scylladb/scylladb#15847

* github.com:scylladb/scylladb:
  test: tablets: Add test for failed streaming being fenced away
  error_injection: Introduce poll_for_message()
  error_injection: Make is_enabled() public
  api: Add API to kill connection to a particular host
  range_streamer: Do not block topology change barriers around streaming
  range_streamer, tablets: Do not keep token metadata around streaming
  tablets: Fail gracefully when migrating tablet has no pending replica
  storage_service, api: Add API to disable tablet balancing
  storage_service, api: Add API to migrate a tablet
  storage_service, raft topology: Run streaming under session topology guard
  storage_service, tablets: Use session to guard tablet streaming
  tablets: Add per-tablet session id field to tablet metadata
  service: range_streamer: Propagate topology_guard to receivers
  streaming: Always close the rpc::sink
  storage_service: Introduce concept of a topology_guard
  storage_service: Introduce session concept
  tablets: Fix topology_metadata_guard holding on to the old erm
  docs: Document the topology_guard mechanism
2023-12-07 16:29:02 +02:00
Tomasz Grabiec
c228f2c940 range_streamer, tablets: Do not keep token metadata around streaming
It holds back global token metadata barrier during streaming, which
limits parallelism of load balancing.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
d1c1b59236 storage_service, api: Add API to disable tablet balancing
Load balancing needs to be disabled before making a series of manual
migrations so that we don't fight with the load balancer.

Also will be used in tests to ensure tablets stick to expected locations.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
1f57d1ea28 storage_service, api: Add API to migrate a tablet
Will be used in tests, or for hot fixes in production.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
31c995332c storage_service, raft topology: Run streaming under session topology guard
Prevents stale streaming operation from running beyond topology
operation they were started in. After the session field is cleared, or
changed to something else, the old topology_guard used by streaming is
interrupted and fenced and the next barrier will join with any
remaining work.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
080169cad6 storage_service, tablets: Use session to guard tablet streaming 2023-12-06 18:36:17 +01:00
Tomasz Grabiec
fd3c089ccc service: range_streamer: Propagate topology_guard to receivers 2023-12-06 18:36:16 +01:00
Botond Dénes
d2a88cd8de Merge 'Typos: fix typos in code' from Yaniv Kaul
Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors.

Refs: https://github.com/scylladb/scylladb/issues/16255

Closes scylladb/scylladb#16289

* github.com:scylladb/scylladb:
  Update unified/build_unified.sh
  Update main.cc
  Update dist/common/scripts/scylla-housekeeping
  Typos: fix typos in code
2023-12-06 07:36:41 +02:00
Avi Kivity
12f160045b Merge 'Get rid of fb_utilities' from Benny Halevy
utils::fb_utilities is a global in-memory registry for storing and retrieving broadcast_address and broadcat_rpc_address.
As part of the effort to get rid of all global state, this series gets rid of fb_utilities.
This will eventually allow e.g. cql_test_env to instantiate multiple scylla server nodes, each serving on its own address.

Closes scylladb/scylladb#16250

* github.com:scylladb/scylladb:
  treewide: get rid of now unused fb_utilities
  tracing: use locator::topology rather than fb_utilities
  streaming: use locator::topology rather than fb_utilities
  raft: use locator::topology/messaging rather than fb_utilities
  storage_service: use locator::topology rather than fb_utilities
  storage_proxy: use locator::topology rather than fb_utilities
  service_level_controller: use locator::topology rather than fb_utilities
  misc_services: use locator::topology rather than fb_utilities
  migration_manager: use messaging rather than fb_utilities
  forward_service: use messaging rather than fb_utilities
  messaging_service: accept broadcast_addr in config rather than via fb_utilities
  messaging_service: move listen_address and port getters inline
  test: manual: modernize message test
  table: use gossiper rather than fb_utilities
  repair: use locator::topology rather than fb_utilities
  dht/range_streamer: use locator::topology rather than fb_utilities
  db/view: use locator::topology rather than fb_utilities
  database: use locator::topology rather than fb_utilities
  db/system_keyspace: use topology via db rather than fb_utilities
  db/system_keyspace: save_local_info: get broadcast addresses from caller
  db/hints/manager: use locator::topology rather than fb_utilities
  db/consistency_level: use locator::topology rather than fb_utilities
  api: use locator::topology rather than fb_utilities
  alternator: ttl: use locator::topology rather than fb_utilities
  gossiper: use locator::topology rather than fb_utilities
  gossiper: add get_this_endpoint_state_ptr
  test: lib: cql_test_env: pass broadcast_address in cql_test_config
  init: get_seeds_from_db_config: accept broadcast_address
  locator: replication strategies: use locator::topology rather than fb_utilities
  locator: topology: add helpers to retrieve this host_id and address
  snitch: pass broadcast_address in snitch_config
  snitch: add optional get_broadcast_address method
  locator: ec2_multi_region_snitch: keep local public address as member
  ec2_multi_region_snitch: reindent load_config
  ec2_multi_region_snitch: coroutinize load_config
  ec2_snitch: reindent load_config
  ec2_snitch: coroutinize load_config
  thrift: thrift_validation: use std::numeric_limits rather than fb_utilities
2023-12-05 19:40:14 +02:00
Yaniv Kaul
ae2ab6000a Typos: fix typos in code
Fixes some more typos as found by codespell run on the code.
In this commit, there are more user-visible errors.

Refs: https://github.com/scylladb/scylladb/issues/16255
2023-12-05 15:18:11 +02:00
Tomasz Grabiec
0e42fe4c3c storage_service: Introduce concept of a topology_guard
topology_guard is used to track distributed operations started by the
topology change coordinator, e.g. streaming, to make sure that those
operations have no side effects after topology change coordinator
moved to the next migration stage, of a given tablet or of the whole
ring.

topology_guard can be sent over the wire in the form of
frozen_topology_guard. It can be materialized again on the other
side. While in transit, it doesn't block the coordinator barriers. But
if the coordinator moved on, materialization of the guard will
fail. So tracking safety is preserved.

In this patch, the guard implementation is based on tracking work
under global sessions, but the concept is flexible and other
mechanisms can be used without changing user code.
2023-12-05 14:09:35 +01:00
Tomasz Grabiec
d3d83869ce storage_service: Introduce session concept 2023-12-05 14:09:34 +01:00
Benny Halevy
b3bede8141 storage_service: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 13:23:27 +02:00
Kamil Braun
52ae6b8738 Merge 'fix shutdown order between group0 and storage service' from Gleb
Storage service uses group0 internally, but group0 is create long after
storage service is initialized and passed to it using ss::set_group0()
function. What it means is that during shutdown group0 is destroyed
before ss::stop() is called and thus storage service is left with a
dangling reference. Fix it by introducing a function that cancels all
group0 operations and waits for background fibers to complete. For that
we need separate abort source for group0 operation which the patch
series also introduces.

* 'gleb/group0-ss-shutdown' of github.com:scylladb/scylla-dev:
  storage_service: topology coordinator: ignore abort_requested_exception in background fibers
  storage_service: fix de-initialization order between storage service and group0_service
2023-12-05 11:20:52 +01:00
Yaniv Kaul
c658bdb150 Typos: fix typos in comments
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.

Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
2023-12-02 22:37:22 +02:00
Gleb Natapov
3ddc1458ee storage_service: topology coordinator: ignore abort_requested_exception in background fibers
The exception may be thrown by "event" CV during shutdown.
2023-11-30 17:52:40 +02:00
Gleb Natapov
8ed8b151da storage_service: fix de-initialization order between storage service and group0_service
Storage service uses group0 internally, but group0 is create long after
storage service is initialized and passed to it using ss::set_group0()
function. But what it means is that during shutdown group0 is destroyed
before ss::stop() is called and thus storage service is left with a
dangling reference. Fix it by introducing a function that cancels all
group0 operations and waits for background fibers to complete. For that
we need separate abort source for group0 operation which the patch also
introduces.
2023-11-30 17:52:38 +02:00
Kamil Braun
8a14839a00 Merge 'handle more failures during topology operations' from Gleb
This series adds handling for more failures during a topology operation
(we already handle a failure during streaming). Here we add handling of
tablet draining errors by aborting the operation and handling of errors
after streaming where an operation cannot be aborted any longer. If the
error happens when rollback is no longer possible we wait for ring delay
and proceed to the next step. Each individual patch that adds the sleep
has an explanation what the consequences of the patch are.

* 'gleb/topology-coordinator-failures' of github.com:scylladb/scylla-dev:
  test: add test to check errro handling during tablet draining
  test: fix test_topology_streaming_failure test to not grep the whole file
  storage_service: add error injection into the tablet migration code
  storage_service: topology coordinator: rollback on handle_tablet_migration failure during tablet_draining stage
  storage_service: topology coordinator: do not retry the metadata barrier forever in write_both_read_new state
  storage_service: topology coordinator: do not retry the metadata barrier forever in left_token_ring state
  storage_service: topology coordinator: return a node that is being removed from get_excluded_nodes
  storage_service: topology_coordinator: use new rollback_to_normal state in the rollback procedure
  storage_service: topology coordinator: add rollback_to_normal node state
  storage_service: topology coordinator: put fence version into the raft state
  storage_service: topology coordinator: do fencing even if draining failed
2023-11-29 19:02:35 +01:00
Petr Gusev
dca28417b2 storage_service: drop unused method handle_state_replacing_update_pending_ranges 2023-11-27 12:37:26 +01:00
Tomasz Grabiec
ae5220478c tablets: Release group0 guard when waiting for streaming to finish
This bug manifested as delays in DDL statement execution, which had to
wait until streaming is finished so that the topology change
coordinator releases the guard.

The reason is that topology change coordinator didn't release the
group0 guard if there is no work to do with active migrations, and
awaits the condition variable without leaving the scope.

Fixes #16182

Closes scylladb/scylladb#16183
2023-11-27 12:24:27 +01:00
Gleb Natapov
c83ff5a0dd storage_service: add error injection into the tablet migration code 2023-11-27 13:09:58 +02:00
Gleb Natapov
4ebdddc31b storage_service: topology coordinator: rollback on handle_tablet_migration failure during tablet_draining stage
During remove or decommission as a first step tables are drained from
the leaving node. Theoretically this step may fail. Rollback the
topology operation if it happen. Since some tables may stay in migration
state the topology needs to go to the tablet_migration state. Lets do it
always since it should be save to do it even if there is no on going
tablet migrations.
2023-11-27 13:09:58 +02:00
Gleb Natapov
7267376eac storage_service: topology coordinator: do not retry the metadata barrier forever in write_both_read_new state
Handle the barrier failure by sleeping for a "ring delay" and
continuing. The purpose of the barrier is to wait for all reads to
old replica set to complete and fence the remaining requests.  If the
barrier fails we give the fence some time to propagate and continue with
the topology change. Of fence did not propagate we may have stale reads,
but this is not worse that we have with gossiper.
2023-11-23 15:30:10 +02:00
Gleb Natapov
7ea8fa459c storage_service: topology coordinator: do not retry the metadata barrier forever in left_token_ring state
Handle the barrier failure by sleeping for a "ring delay" and
continuing. The purpose of the barrier is to wait for unfinished writes
to decommissioned node complete. If barrier fails we give them some time
to complete and then proceed with node decommission. The worse thing
that may happen if some write will fail because the node will be
shutdown.
2023-11-23 15:30:10 +02:00
Gleb Natapov
11b7ee32ec storage_service: topology coordinator: return a node that is being removed from get_excluded_nodes
Not that is removed is dead, so no need to talk to it.
2023-11-23 15:30:10 +02:00
Gleb Natapov
4c76b8b59f storage_service: topology_coordinator: use new rollback_to_normal state in the rollback procedure
Go through the rollback_to_normal state when the node needs to move to
normal during the rollback and update fence in this state before moving
the node to normal. This guaranties that the fence update will not
be missed. Not that when a node moves to left state it already passes
through left_token_ring which guaranties the same.
2023-11-23 15:29:36 +02:00
Gleb Natapov
95dd0e453d storage_service: topology coordinator: add rollback_to_normal node state
When a topology coordinator rolls back from unsuccessful topology operation it
advances the fence (which is now in the raft state) after moving to normal
state. We do not want this to fail (only majority of nodes is needed for
it to not to), but currently it may fail in case the coordinator moves
to another node after changing the rollback node's state to normal, but
before updating the fence. To solve that the rollback operation needs to
go through a new rollback_to_normal state that will do the fencing
before moving to normal. This patch introduces that state, but does not use
it yet.
2023-11-23 15:27:28 +02:00
Kamil Braun
03ecc8457c Merge 'raft topology: reject replace if the node being replaced is not dead' from Patryk Jędrzejczak
The replace operation is defined to succeed only if the node being
replaced is dead. We should reject this operation when the failure
detector considers the node being replaced alive.

Apart from adding this change, this PR adds a test case -
`test_replacing_alive_node_fails` - that verifies it. A few testing
framework adjustments were necessary to implement this test and
to avoid flakiness in other tests that use the replace operation after
the change. From now, we need to ensure that all nodes see the
node being replaced as dead before starting the replace. Otherwise,
the check added in this PR could reject the replace.

Additionally, this PR changes the replace procedure in a way that
if the replacing node reuses the IP of the node being replaced, other
nodes can see it as alive only after the topology coordinator accepts
its join request. The replacing node may become alive before the
topology coordinator checks if the node being replaced is dead. If
that happens and the replacing node reuses the IP of the node being
replaced, the topology coordinator cannot know which of these two
nodes is alive and whether it should reject the join request.

Fixes #15863

Closes scylladb/scylladb#15926

* github.com:scylladb/scylladb:
  test: add test_replacing_alive_node_fails
  raft topology: reject replace if the node being replaced is not dead
  raft topology: add the gossiper ref to topology_coordinator
  test: test_cluster_features: stop gracefully before replace
  test: decrease failure_detector_timeout_in_ms in replace tests
  test: move test_replace to topology_custom
  test: server_add: wait until the node being replaced is dead
  test: server_add: add support for expected errors
  raft topology: join: delay advertising replacing node if it reuses IP
  raft topology: join: fix a condition in validate_joining_node
2023-11-23 10:31:59 +01:00
Patryk Jędrzejczak
bf7a67224c raft topology: reject replace if the node being replaced is not dead
The replace operation is defined to succeed only if the node being
replaced is dead. We should reject this operation when the failure
detector considers the node being replaced alive.
2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak
94ffdb4792 raft topology: add the gossiper ref to topology_coordinator
It is used in the following commit.
2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak
ee45a1c430 raft topology: join: delay advertising replacing node if it reuses IP
After this change, other nodes can see the replacing node as alive
only after the topology coordinator accepts its join request.

In the following commits, we make the topology coordinator reject
join requests if the node being replaced is considered alive by the
gossiper. However, the replacing node may become alive before the
topology coordinator does the validation. If the replacing node
reuses the IP of the node being replaced, the topology coordinator
cannot know which of these two nodes is alive and whether it should
reject the join request.

The gossiper-based topology also delays the replacing node from
advertising itself if it reuses the IP. To achieve the same effect
in raft-based topology, we only need to move the definition of
replacing_a_node_with_same_ip. However, there is a code that puts
bootstrap tokens of the node being replaced into the gossiper
state, and it depends on replacing_a_node_with_same_ip and
replacing_a_node_with_diff_ip being always false in the raft-based
topology mode. We prevent it from breaking by changing the
condition.
2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak
c0e4b8e9c0 raft topology: join: fix a condition in validate_joining_node
It was incorrect. node.rs->state evaluated to node_state::none
for both join and replace.
2023-11-21 12:39:13 +01:00
Pavel Emelyanov
f4626f6b8e storage_service: Drop (un)init_messaging_service_part() pair
It's no longer needed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 13:59:08 +03:00
Pavel Emelyanov
c42c13e658 storage_service: Init/Deinit RPC handlers in constructor/stop
All the services that need to register RPC handlers do it in service
constructor or .start() method. Unregistration happens in .stop().
Storage service explicitly (de)initializes its RPC handlers in dedicated
calls, but there's no point in that. The handlers' accessibility is
determined by messaging service start_lister/shutdown, handlers
themselves can be registered any time before it and unregistered any
time after it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 13:57:07 +03:00
Pavel Emelyanov
40cb9dd66f storage_service: Dont capture container() on RPC handler
The handlers are about to be initialized from inside storage_service
constructor. At that time container() is not yet available and its
invalid to capture it on handlers' lambda. Fortunately, there's only one
handler that does it, other handlers capture 'this' and call container()
explicitly. This patch fixes the remaining one to do the same.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 13:55:56 +03:00
Pavel Emelyanov
cc76f03f63 storage_service: Use storage_service::_sys_dist_ks in some places
The main goal here is to drop sys.dist.ks argument from the
init_messaging_service call to make future patching simpler. While doing
it it turned out that the argument was needed to be passed all the way
down to the mark_existing_views_as_built(), so this patch also dropes
this argument from this whole call-trace.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 13:53:55 +03:00
Pavel Emelyanov
4df5af931a storage_service: Add explicit dependency on system dist. keyspace
This effectively reverts bc051387c5 (storage_service: Remove sys_dist_ks
from storage_service dependencies) since now storage service needs the
sys. disk. ks not only cluster join time. Next patch will make more use
of it as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 13:52:42 +03:00
Pavel Emelyanov
a7f23930cb storage_service: Rurn query processor pointer into reference
It's non-nullptr all the time after previous patch and can be a
reference instead

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 13:52:04 +03:00
Pavel Emelyanov
e59544674a storage_service: Add explicity query_processor dependency
It's now set via a dedicated call that happens after query processor is
started. Now query processor is started before storage service and the
latter can get the q.p. local reference via constructor.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 13:51:09 +03:00
Gleb Natapov
6edbf4b663 storage_service: topology coordinator: put fence version into the raft state
Currently when the coordinator decides to move the fence it issues an
RPC to each node and each node locally advances fence version. This is
fine if there are no failures or failures are handled by retrying
fencing, but if we want to allow topology changes to progress even in
the presence of barrier failures it is easier to store the fence version
in the raft state. The nodes that missed fence rpc may easily catch up
to the latest fence version by simply executing a raft barrier.
2023-11-19 15:28:08 +02:00
Gleb Natapov
f04e890690 storage_service: topology coordinator: do fencing even if draining failed
Token metadata barrier consists for two steps. First old request are
drained and then requests that are not drained are fenced. But currently
if draining fails then fencing is note done. This is fine if the
barrier's failure handled by retrying, but we when to start handling
errors differently. In fact during topology operation rollback we
already do not retry failed barrier.

The patch fixes the metadata barrier to do fencing even if draining
failed.
2023-11-14 13:06:41 +02:00
Kamil Braun
d24b305712 Merge 'raft topology: join: do not time out waiting for the node to be joined' from Patryk Jędrzejczak
When a node tries to join the cluster, it asks the topology
coordinator to add them and then waits for the response. The
response is not guaranteed to come back. If the topology
coordinator cannot contact the joining node, it moves the node to
the left state and moves on.

Currently, to handle the case when the response does not come back,
the joining node gives up waiting for it after 3 minutes. However,
it might take more time for the topology coordinator to start
handling the request to join, as it might be working on other tasks
like adding other nodes, performing tablet migrations, etc. In
general, any timeout duration would be unreliable.

Therefore, we get rid of the timeout. From now on, the operator
will be responsible for shutting down the node if the topology
coordinator fails to deliver the rejection.

Additionally, after removing the timeout, we adjust the topology
coordinator. We make it try sending the response (both acceptance
and rejection) only once since we do not care if it fails anymore. We
only need to ensure that the joining node is moved to the left state
if sending fails.

Fixes #15865

Closes scylladb/scylladb#15944

* github.com:scylladb/scylladb:
  raft topology: fix indentation
  raft topology: join: try sending the response only once
  raft topology: join: do not time out waiting for the node to be joined
  group 0: group0_handshaker: add the abort_source parameter to post_server_start
2023-11-13 15:02:27 +01:00
Botond Dénes
2b11a02b67 Merge 'Improvements to gossiper shadow round' from Kamil Braun
Remove `fall_back_to_syn_msg` which is not necessary in newer Scylla versions.
Fix the calculation of `nodes_down` which could count a single node multiple times.
Make shadow round mandatory during bootstrap and replace -- these operations are unsafe to do without checking features first, which are obtained during the shadow round (outside raft-topology mode).
Finally, during node restart, allow the shadow round to be skipped when getting `timeout_error`s from contact points, not only when getting `closed_error`s (during restart it's best-effort anyway, and in general it's impossible to distinguish between a dead node and a partitioned node).
More details in commit messages.

Ref: https://github.com/scylladb/scylladb/issues/15675

Closes scylladb/scylladb#15941

* github.com:scylladb/scylladb:
  gossiper: do_shadow_round: increment `nodes_down` in case of timeout
  gossiper: do_shadow_round: fix `nodes_down` calculation
  storage_service: make shadow round mandatory during bootstrap/replace
  gossiper: do_shadow_round: remove default value for nodes param
  gossiper: do_shadow_round: remove `fall_back_to_syn_msg`
2023-11-13 13:37:13 +02:00
Patryk Jędrzejczak
2d7bfeb3fa raft topology: fix indentation
Broken in the previous commit.
2023-11-10 12:36:37 +01:00
Patryk Jędrzejczak
e94c7cff28 raft topology: join: try sending the response only once
When a node tries to join the cluster, it asks the topology
coordinator to add them and then waits for the response.
In the previous commit, we have made the operator responsible for
shutting down the joining node if the topology coordinator fails
to deliver a response by removing the timeout. In this commit, we
adjust the topology coordinator. We make it try sending the
response (both acceptance and rejection) only once since we do not
care if it fails anymore. We only need to ensure that the joining
node is moved to the left state if sending fails.
2023-11-10 12:36:37 +01:00
Patryk Jędrzejczak
4ffa692cb3 raft topology: join: do not time out waiting for the node to be joined
When a node tries to join the cluster, it asks the topology
coordinator to add them and then waits for the response. The
response is not guaranteed to come back. If the topology
coordinator cannot contact the joining node, it moves the node to
the left state and moves on.

Currently, to handle the case when the response does not come back,
the joining node gives up waiting for it after 3 minutes. However,
it might take more time for the topology coordinator to start
handling the request to join, as it might be working on other tasks
like adding other nodes, performing tablet migrations, etc. In
general, any timeout duration would be unreliable.

Therefore, we get rid of the timeout. From now on, the operator
will be responsible for shutting down the node if the topology
coordinator fails to deliver the rejection.

This change additionally fixes the TODO in
raft_group0::join_group0.
2023-11-10 12:36:37 +01:00
Patryk Jędrzejczak
5f36e1d7f2 group 0: group0_handshaker: add the abort_source parameter to post_server_start
Used in the following commit to enable the clean shutdown of a
node that does not receive the join rejection from the topology
coordinator.
2023-11-10 12:35:38 +01:00