scylla

Author	SHA1	Message	Date
Petr Gusev	5a1418fdba	token_metadata: get_endpoint_for_host_id -> get_endpoint_for_host_id_if_known This commit fixes an inconsistency in method names: get_host_id and get_host_id_if_known are (internal_error, returns null), but there was only one method for the opposite conversion - get_endpoint_for_host_id, and it returns null. In this commit we change it to on_internal_error if it can't find the argument and add another method get_endpoint_for_host_id_if_known which returns null in this case. We can't use get_endpoint_for_host_id/get_host_id in host_id_or_endpoint::resolve since it's called from storage_service::parse_node_list -> token_metadata::parse_host_id_and_endpoint, and exceptions are caught and handled in `storage_service::parse_node_list`.	2023-12-11 12:51:34 +04:00
Petr Gusev	08b47d645a	token_metadata: get_host_id: exception -> on_internal_error It's a bug to use get_host_id on a non-existent endpoint, so on_internal_error is more appropriate. Also, it's easier to debug since it provides a backtrace. If a missing inet_address is expected, get_host_id_if_known should be used instead. We update one such case in storage_service::force_remove_completion. Other usages of get_host_id are correct.	2023-12-11 12:51:34 +04:00
Petr Gusev	c9fbe3d377	locator: make dc_rack_fn a template In the next commits token_metadata will be made a template with NodeId=inet_address\|host_id parameter. This parameter will be passed to dc_rack_fn function, so it also should be made a template.	2023-12-11 12:51:33 +04:00
Avi Kivity	9c0f05efa1	Merge 'Track tablet streaming under global sessions to prevent side-effects of failed streaming' from Tomasz Grabiec Tablet streaming involves asynchronous RPCs to other replicas which transfer writes. We want side-effects from streaming only within the migration stage in which the streaming was started. This is currently not guaranteed on failure. When streaming master fails (e.g. due to RPC failing), it can be that some streaming work is still alive somewhere (e.g. RPC on wire) and will have side-effects at some point later. This PR implements tracking of all operations involved in streaming which may have side-effects, which allows the topology change coordinator to fence them and wait for them to complete if they were already admitted. The tracking and fencing is implemented by using global "sessions", created for streaming of a single tablet. Session is globally identified by UUID. The identifier is assigned by the topology change coordinator, and stored in system.tablets. Sessions are created and closed based on group0 state (tablet metadata) by the barrier command sent to each replica, which we already do on transitions between stages. Also, each barrier waits for sessions which have been closed to be drained. The barrier is blocked only if there is some session with work which was left behind by unsuccessful streaming. In which case it should not be blocked for long, because streaming process checks often if the guard was left behind and stops if it was. This mechanism of tracking is fault-tolerant: session id is stored in group0, so coordinator can make progress on failover. The barriers guarantee that session exists on all replicas, and that it will be closed on all replicas. Closes scylladb/scylladb#15847 * github.com:scylladb/scylladb: test: tablets: Add test for failed streaming being fenced away error_injection: Introduce poll_for_message() error_injection: Make is_enabled() public api: Add API to kill connection to a particular host range_streamer: Do not block topology change barriers around streaming range_streamer, tablets: Do not keep token metadata around streaming tablets: Fail gracefully when migrating tablet has no pending replica storage_service, api: Add API to disable tablet balancing storage_service, api: Add API to migrate a tablet storage_service, raft topology: Run streaming under session topology guard storage_service, tablets: Use session to guard tablet streaming tablets: Add per-tablet session id field to tablet metadata service: range_streamer: Propagate topology_guard to receivers streaming: Always close the rpc::sink storage_service: Introduce concept of a topology_guard storage_service: Introduce session concept tablets: Fix topology_metadata_guard holding on to the old erm docs: Document the topology_guard mechanism	2023-12-07 16:29:02 +02:00
Tomasz Grabiec	c228f2c940	range_streamer, tablets: Do not keep token metadata around streaming It holds back global token metadata barrier during streaming, which limits parallelism of load balancing.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	d1c1b59236	storage_service, api: Add API to disable tablet balancing Load balancing needs to be disabled before making a series of manual migrations so that we don't fight with the load balancer. Also will be used in tests to ensure tablets stick to expected locations.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	1f57d1ea28	storage_service, api: Add API to migrate a tablet Will be used in tests, or for hot fixes in production.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	31c995332c	storage_service, raft topology: Run streaming under session topology guard Prevents stale streaming operation from running beyond topology operation they were started in. After the session field is cleared, or changed to something else, the old topology_guard used by streaming is interrupted and fenced and the next barrier will join with any remaining work.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	080169cad6	storage_service, tablets: Use session to guard tablet streaming	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	fd3c089ccc	service: range_streamer: Propagate topology_guard to receivers	2023-12-06 18:36:16 +01:00
Botond Dénes	d2a88cd8de	Merge 'Typos: fix typos in code' from Yaniv Kaul Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors. Refs: https://github.com/scylladb/scylladb/issues/16255 Closes scylladb/scylladb#16289 * github.com:scylladb/scylladb: Update unified/build_unified.sh Update main.cc Update dist/common/scripts/scylla-housekeeping Typos: fix typos in code	2023-12-06 07:36:41 +02:00
Avi Kivity	12f160045b	Merge 'Get rid of fb_utilities' from Benny Halevy utils::fb_utilities is a global in-memory registry for storing and retrieving broadcast_address and broadcat_rpc_address. As part of the effort to get rid of all global state, this series gets rid of fb_utilities. This will eventually allow e.g. cql_test_env to instantiate multiple scylla server nodes, each serving on its own address. Closes scylladb/scylladb#16250 * github.com:scylladb/scylladb: treewide: get rid of now unused fb_utilities tracing: use locator::topology rather than fb_utilities streaming: use locator::topology rather than fb_utilities raft: use locator::topology/messaging rather than fb_utilities storage_service: use locator::topology rather than fb_utilities storage_proxy: use locator::topology rather than fb_utilities service_level_controller: use locator::topology rather than fb_utilities misc_services: use locator::topology rather than fb_utilities migration_manager: use messaging rather than fb_utilities forward_service: use messaging rather than fb_utilities messaging_service: accept broadcast_addr in config rather than via fb_utilities messaging_service: move listen_address and port getters inline test: manual: modernize message test table: use gossiper rather than fb_utilities repair: use locator::topology rather than fb_utilities dht/range_streamer: use locator::topology rather than fb_utilities db/view: use locator::topology rather than fb_utilities database: use locator::topology rather than fb_utilities db/system_keyspace: use topology via db rather than fb_utilities db/system_keyspace: save_local_info: get broadcast addresses from caller db/hints/manager: use locator::topology rather than fb_utilities db/consistency_level: use locator::topology rather than fb_utilities api: use locator::topology rather than fb_utilities alternator: ttl: use locator::topology rather than fb_utilities gossiper: use locator::topology rather than fb_utilities gossiper: add get_this_endpoint_state_ptr test: lib: cql_test_env: pass broadcast_address in cql_test_config init: get_seeds_from_db_config: accept broadcast_address locator: replication strategies: use locator::topology rather than fb_utilities locator: topology: add helpers to retrieve this host_id and address snitch: pass broadcast_address in snitch_config snitch: add optional get_broadcast_address method locator: ec2_multi_region_snitch: keep local public address as member ec2_multi_region_snitch: reindent load_config ec2_multi_region_snitch: coroutinize load_config ec2_snitch: reindent load_config ec2_snitch: coroutinize load_config thrift: thrift_validation: use std::numeric_limits rather than fb_utilities	2023-12-05 19:40:14 +02:00
Yaniv Kaul	ae2ab6000a	Typos: fix typos in code Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors. Refs: https://github.com/scylladb/scylladb/issues/16255	2023-12-05 15:18:11 +02:00
Tomasz Grabiec	0e42fe4c3c	storage_service: Introduce concept of a topology_guard topology_guard is used to track distributed operations started by the topology change coordinator, e.g. streaming, to make sure that those operations have no side effects after topology change coordinator moved to the next migration stage, of a given tablet or of the whole ring. topology_guard can be sent over the wire in the form of frozen_topology_guard. It can be materialized again on the other side. While in transit, it doesn't block the coordinator barriers. But if the coordinator moved on, materialization of the guard will fail. So tracking safety is preserved. In this patch, the guard implementation is based on tracking work under global sessions, but the concept is flexible and other mechanisms can be used without changing user code.	2023-12-05 14:09:35 +01:00
Tomasz Grabiec	d3d83869ce	storage_service: Introduce session concept	2023-12-05 14:09:34 +01:00
Benny Halevy	b3bede8141	storage_service: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 13:23:27 +02:00
Kamil Braun	52ae6b8738	Merge 'fix shutdown order between group0 and storage service' from Gleb Storage service uses group0 internally, but group0 is create long after storage service is initialized and passed to it using ss::set_group0() function. What it means is that during shutdown group0 is destroyed before ss::stop() is called and thus storage service is left with a dangling reference. Fix it by introducing a function that cancels all group0 operations and waits for background fibers to complete. For that we need separate abort source for group0 operation which the patch series also introduces. * 'gleb/group0-ss-shutdown' of github.com:scylladb/scylla-dev: storage_service: topology coordinator: ignore abort_requested_exception in background fibers storage_service: fix de-initialization order between storage service and group0_service	2023-12-05 11:20:52 +01:00
Yaniv Kaul	c658bdb150	Typos: fix typos in comments Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2023-12-02 22:37:22 +02:00
Gleb Natapov	3ddc1458ee	storage_service: topology coordinator: ignore abort_requested_exception in background fibers The exception may be thrown by "event" CV during shutdown.	2023-11-30 17:52:40 +02:00
Gleb Natapov	8ed8b151da	storage_service: fix de-initialization order between storage service and group0_service Storage service uses group0 internally, but group0 is create long after storage service is initialized and passed to it using ss::set_group0() function. But what it means is that during shutdown group0 is destroyed before ss::stop() is called and thus storage service is left with a dangling reference. Fix it by introducing a function that cancels all group0 operations and waits for background fibers to complete. For that we need separate abort source for group0 operation which the patch also introduces.	2023-11-30 17:52:38 +02:00
Kamil Braun	8a14839a00	Merge 'handle more failures during topology operations' from Gleb This series adds handling for more failures during a topology operation (we already handle a failure during streaming). Here we add handling of tablet draining errors by aborting the operation and handling of errors after streaming where an operation cannot be aborted any longer. If the error happens when rollback is no longer possible we wait for ring delay and proceed to the next step. Each individual patch that adds the sleep has an explanation what the consequences of the patch are. * 'gleb/topology-coordinator-failures' of github.com:scylladb/scylla-dev: test: add test to check errro handling during tablet draining test: fix test_topology_streaming_failure test to not grep the whole file storage_service: add error injection into the tablet migration code storage_service: topology coordinator: rollback on handle_tablet_migration failure during tablet_draining stage storage_service: topology coordinator: do not retry the metadata barrier forever in write_both_read_new state storage_service: topology coordinator: do not retry the metadata barrier forever in left_token_ring state storage_service: topology coordinator: return a node that is being removed from get_excluded_nodes storage_service: topology_coordinator: use new rollback_to_normal state in the rollback procedure storage_service: topology coordinator: add rollback_to_normal node state storage_service: topology coordinator: put fence version into the raft state storage_service: topology coordinator: do fencing even if draining failed	2023-11-29 19:02:35 +01:00
Petr Gusev	dca28417b2	storage_service: drop unused method handle_state_replacing_update_pending_ranges	2023-11-27 12:37:26 +01:00
Tomasz Grabiec	ae5220478c	tablets: Release group0 guard when waiting for streaming to finish This bug manifested as delays in DDL statement execution, which had to wait until streaming is finished so that the topology change coordinator releases the guard. The reason is that topology change coordinator didn't release the group0 guard if there is no work to do with active migrations, and awaits the condition variable without leaving the scope. Fixes #16182 Closes scylladb/scylladb#16183	2023-11-27 12:24:27 +01:00
Gleb Natapov	c83ff5a0dd	storage_service: add error injection into the tablet migration code	2023-11-27 13:09:58 +02:00
Gleb Natapov	4ebdddc31b	storage_service: topology coordinator: rollback on handle_tablet_migration failure during tablet_draining stage During remove or decommission as a first step tables are drained from the leaving node. Theoretically this step may fail. Rollback the topology operation if it happen. Since some tables may stay in migration state the topology needs to go to the tablet_migration state. Lets do it always since it should be save to do it even if there is no on going tablet migrations.	2023-11-27 13:09:58 +02:00
Gleb Natapov	7267376eac	storage_service: topology coordinator: do not retry the metadata barrier forever in write_both_read_new state Handle the barrier failure by sleeping for a "ring delay" and continuing. The purpose of the barrier is to wait for all reads to old replica set to complete and fence the remaining requests. If the barrier fails we give the fence some time to propagate and continue with the topology change. Of fence did not propagate we may have stale reads, but this is not worse that we have with gossiper.	2023-11-23 15:30:10 +02:00
Gleb Natapov	7ea8fa459c	storage_service: topology coordinator: do not retry the metadata barrier forever in left_token_ring state Handle the barrier failure by sleeping for a "ring delay" and continuing. The purpose of the barrier is to wait for unfinished writes to decommissioned node complete. If barrier fails we give them some time to complete and then proceed with node decommission. The worse thing that may happen if some write will fail because the node will be shutdown.	2023-11-23 15:30:10 +02:00
Gleb Natapov	11b7ee32ec	storage_service: topology coordinator: return a node that is being removed from get_excluded_nodes Not that is removed is dead, so no need to talk to it.	2023-11-23 15:30:10 +02:00
Gleb Natapov	4c76b8b59f	storage_service: topology_coordinator: use new rollback_to_normal state in the rollback procedure Go through the rollback_to_normal state when the node needs to move to normal during the rollback and update fence in this state before moving the node to normal. This guaranties that the fence update will not be missed. Not that when a node moves to left state it already passes through left_token_ring which guaranties the same.	2023-11-23 15:29:36 +02:00
Gleb Natapov	95dd0e453d	storage_service: topology coordinator: add rollback_to_normal node state When a topology coordinator rolls back from unsuccessful topology operation it advances the fence (which is now in the raft state) after moving to normal state. We do not want this to fail (only majority of nodes is needed for it to not to), but currently it may fail in case the coordinator moves to another node after changing the rollback node's state to normal, but before updating the fence. To solve that the rollback operation needs to go through a new rollback_to_normal state that will do the fencing before moving to normal. This patch introduces that state, but does not use it yet.	2023-11-23 15:27:28 +02:00
Kamil Braun	03ecc8457c	Merge 'raft topology: reject replace if the node being replaced is not dead' from Patryk Jędrzejczak The replace operation is defined to succeed only if the node being replaced is dead. We should reject this operation when the failure detector considers the node being replaced alive. Apart from adding this change, this PR adds a test case - `test_replacing_alive_node_fails` - that verifies it. A few testing framework adjustments were necessary to implement this test and to avoid flakiness in other tests that use the replace operation after the change. From now, we need to ensure that all nodes see the node being replaced as dead before starting the replace. Otherwise, the check added in this PR could reject the replace. Additionally, this PR changes the replace procedure in a way that if the replacing node reuses the IP of the node being replaced, other nodes can see it as alive only after the topology coordinator accepts its join request. The replacing node may become alive before the topology coordinator checks if the node being replaced is dead. If that happens and the replacing node reuses the IP of the node being replaced, the topology coordinator cannot know which of these two nodes is alive and whether it should reject the join request. Fixes #15863 Closes scylladb/scylladb#15926 * github.com:scylladb/scylladb: test: add test_replacing_alive_node_fails raft topology: reject replace if the node being replaced is not dead raft topology: add the gossiper ref to topology_coordinator test: test_cluster_features: stop gracefully before replace test: decrease failure_detector_timeout_in_ms in replace tests test: move test_replace to topology_custom test: server_add: wait until the node being replaced is dead test: server_add: add support for expected errors raft topology: join: delay advertising replacing node if it reuses IP raft topology: join: fix a condition in validate_joining_node	2023-11-23 10:31:59 +01:00
Patryk Jędrzejczak	bf7a67224c	raft topology: reject replace if the node being replaced is not dead The replace operation is defined to succeed only if the node being replaced is dead. We should reject this operation when the failure detector considers the node being replaced alive.	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	94ffdb4792	raft topology: add the gossiper ref to topology_coordinator It is used in the following commit.	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	ee45a1c430	raft topology: join: delay advertising replacing node if it reuses IP After this change, other nodes can see the replacing node as alive only after the topology coordinator accepts its join request. In the following commits, we make the topology coordinator reject join requests if the node being replaced is considered alive by the gossiper. However, the replacing node may become alive before the topology coordinator does the validation. If the replacing node reuses the IP of the node being replaced, the topology coordinator cannot know which of these two nodes is alive and whether it should reject the join request. The gossiper-based topology also delays the replacing node from advertising itself if it reuses the IP. To achieve the same effect in raft-based topology, we only need to move the definition of replacing_a_node_with_same_ip. However, there is a code that puts bootstrap tokens of the node being replaced into the gossiper state, and it depends on replacing_a_node_with_same_ip and replacing_a_node_with_diff_ip being always false in the raft-based topology mode. We prevent it from breaking by changing the condition.	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	c0e4b8e9c0	raft topology: join: fix a condition in validate_joining_node It was incorrect. node.rs->state evaluated to node_state::none for both join and replace.	2023-11-21 12:39:13 +01:00
Pavel Emelyanov	f4626f6b8e	storage_service: Drop (un)init_messaging_service_part() pair It's no longer needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 13:59:08 +03:00
Pavel Emelyanov	c42c13e658	storage_service: Init/Deinit RPC handlers in constructor/stop All the services that need to register RPC handlers do it in service constructor or .start() method. Unregistration happens in .stop(). Storage service explicitly (de)initializes its RPC handlers in dedicated calls, but there's no point in that. The handlers' accessibility is determined by messaging service start_lister/shutdown, handlers themselves can be registered any time before it and unregistered any time after it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 13:57:07 +03:00
Pavel Emelyanov	40cb9dd66f	storage_service: Dont capture container() on RPC handler The handlers are about to be initialized from inside storage_service constructor. At that time container() is not yet available and its invalid to capture it on handlers' lambda. Fortunately, there's only one handler that does it, other handlers capture 'this' and call container() explicitly. This patch fixes the remaining one to do the same. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 13:55:56 +03:00
Pavel Emelyanov	cc76f03f63	storage_service: Use storage_service::_sys_dist_ks in some places The main goal here is to drop sys.dist.ks argument from the init_messaging_service call to make future patching simpler. While doing it it turned out that the argument was needed to be passed all the way down to the mark_existing_views_as_built(), so this patch also dropes this argument from this whole call-trace. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 13:53:55 +03:00
Pavel Emelyanov	4df5af931a	storage_service: Add explicit dependency on system dist. keyspace This effectively reverts `bc051387c5` (storage_service: Remove sys_dist_ks from storage_service dependencies) since now storage service needs the sys. disk. ks not only cluster join time. Next patch will make more use of it as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 13:52:42 +03:00
Pavel Emelyanov	a7f23930cb	storage_service: Rurn query processor pointer into reference It's non-nullptr all the time after previous patch and can be a reference instead Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 13:52:04 +03:00
Pavel Emelyanov	e59544674a	storage_service: Add explicity query_processor dependency It's now set via a dedicated call that happens after query processor is started. Now query processor is started before storage service and the latter can get the q.p. local reference via constructor. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 13:51:09 +03:00
Gleb Natapov	6edbf4b663	storage_service: topology coordinator: put fence version into the raft state Currently when the coordinator decides to move the fence it issues an RPC to each node and each node locally advances fence version. This is fine if there are no failures or failures are handled by retrying fencing, but if we want to allow topology changes to progress even in the presence of barrier failures it is easier to store the fence version in the raft state. The nodes that missed fence rpc may easily catch up to the latest fence version by simply executing a raft barrier.	2023-11-19 15:28:08 +02:00
Gleb Natapov	f04e890690	storage_service: topology coordinator: do fencing even if draining failed Token metadata barrier consists for two steps. First old request are drained and then requests that are not drained are fenced. But currently if draining fails then fencing is note done. This is fine if the barrier's failure handled by retrying, but we when to start handling errors differently. In fact during topology operation rollback we already do not retry failed barrier. The patch fixes the metadata barrier to do fencing even if draining failed.	2023-11-14 13:06:41 +02:00
Kamil Braun	d24b305712	Merge 'raft topology: join: do not time out waiting for the node to be joined' from Patryk Jędrzejczak When a node tries to join the cluster, it asks the topology coordinator to add them and then waits for the response. The response is not guaranteed to come back. If the topology coordinator cannot contact the joining node, it moves the node to the left state and moves on. Currently, to handle the case when the response does not come back, the joining node gives up waiting for it after 3 minutes. However, it might take more time for the topology coordinator to start handling the request to join, as it might be working on other tasks like adding other nodes, performing tablet migrations, etc. In general, any timeout duration would be unreliable. Therefore, we get rid of the timeout. From now on, the operator will be responsible for shutting down the node if the topology coordinator fails to deliver the rejection. Additionally, after removing the timeout, we adjust the topology coordinator. We make it try sending the response (both acceptance and rejection) only once since we do not care if it fails anymore. We only need to ensure that the joining node is moved to the left state if sending fails. Fixes #15865 Closes scylladb/scylladb#15944 * github.com:scylladb/scylladb: raft topology: fix indentation raft topology: join: try sending the response only once raft topology: join: do not time out waiting for the node to be joined group 0: group0_handshaker: add the abort_source parameter to post_server_start	2023-11-13 15:02:27 +01:00
Botond Dénes	2b11a02b67	Merge 'Improvements to gossiper shadow round' from Kamil Braun Remove `fall_back_to_syn_msg` which is not necessary in newer Scylla versions. Fix the calculation of `nodes_down` which could count a single node multiple times. Make shadow round mandatory during bootstrap and replace -- these operations are unsafe to do without checking features first, which are obtained during the shadow round (outside raft-topology mode). Finally, during node restart, allow the shadow round to be skipped when getting `timeout_error`s from contact points, not only when getting `closed_error`s (during restart it's best-effort anyway, and in general it's impossible to distinguish between a dead node and a partitioned node). More details in commit messages. Ref: https://github.com/scylladb/scylladb/issues/15675 Closes scylladb/scylladb#15941 * github.com:scylladb/scylladb: gossiper: do_shadow_round: increment `nodes_down` in case of timeout gossiper: do_shadow_round: fix `nodes_down` calculation storage_service: make shadow round mandatory during bootstrap/replace gossiper: do_shadow_round: remove default value for nodes param gossiper: do_shadow_round: remove `fall_back_to_syn_msg`	2023-11-13 13:37:13 +02:00
Patryk Jędrzejczak	2d7bfeb3fa	raft topology: fix indentation Broken in the previous commit.	2023-11-10 12:36:37 +01:00
Patryk Jędrzejczak	e94c7cff28	raft topology: join: try sending the response only once When a node tries to join the cluster, it asks the topology coordinator to add them and then waits for the response. In the previous commit, we have made the operator responsible for shutting down the joining node if the topology coordinator fails to deliver a response by removing the timeout. In this commit, we adjust the topology coordinator. We make it try sending the response (both acceptance and rejection) only once since we do not care if it fails anymore. We only need to ensure that the joining node is moved to the left state if sending fails.	2023-11-10 12:36:37 +01:00
Patryk Jędrzejczak	4ffa692cb3	raft topology: join: do not time out waiting for the node to be joined When a node tries to join the cluster, it asks the topology coordinator to add them and then waits for the response. The response is not guaranteed to come back. If the topology coordinator cannot contact the joining node, it moves the node to the left state and moves on. Currently, to handle the case when the response does not come back, the joining node gives up waiting for it after 3 minutes. However, it might take more time for the topology coordinator to start handling the request to join, as it might be working on other tasks like adding other nodes, performing tablet migrations, etc. In general, any timeout duration would be unreliable. Therefore, we get rid of the timeout. From now on, the operator will be responsible for shutting down the node if the topology coordinator fails to deliver the rejection. This change additionally fixes the TODO in raft_group0::join_group0.	2023-11-10 12:36:37 +01:00
Patryk Jędrzejczak	5f36e1d7f2	group 0: group0_handshaker: add the abort_source parameter to post_server_start Used in the following commit to enable the clean shutdown of a node that does not receive the join rejection from the topology coordinator.	2023-11-10 12:35:38 +01:00

1 2 3 4 5 ...

1677 Commits