scylla/locator at e63d8ae272c8d0191bf7e2365933c596d64b85fe - scylla - گیت مرکز وب ایران

bahman/scylla

Files

History

Tomasz Grabiec e63d8ae272 Merge 'Handle tablet migration failure while streaming' from Pavel Emelyanov

It can happen that a node is lost during tablet migration involving that node. Migration will be stuck, blocking topology state machine. To recover from this, the current procedure is for the admin to execute nodetool removenode or replacing the node. This marks the node as "ignored" and tablet state machine can pick this up and abort the migration.

This PR implements the handling for streaming stage only and adds a test for it. Checking other stages needs more work with failure injection to inject failures into specific barrier.

To handle streaming failure two new stages are introduced -- cleanup_target and revert_migration. The former is to clean the pending replica that could receive some data by the time streaming stopped working, the latter is like end_migration, but doesn't commit the new_replicas into replicas field.

refs: #16527

Closes scylladb/scylladb#17360

* github.com:scylladb/scylladb:
  test/topology: Add checking error paths for failed migration
  topology.tablets_migration: Handle failed streaming
  topology.tablets_migration: Add cleanup_target transition stage
  topology.tablets_migration: Add revert_migration transition stage
  storage_service: Rewrap cleanup stage checking in cleanup_tablet()
  test/topology: Move helpers to get tablet replicas to pylib

2024-02-20 18:50:55 +01:00

..

abstract_replication_strategy.cc

Rename keyspace::get_effective_replication_map()

2024-02-13 20:22:02 +02:00

abstract_replication_strategy.hh

abstract_replication_strategy: Make validate_replication_factor return value

2024-02-02 14:36:47 +03:00

azure_snitch.cc

treewide: replace seastar::future::get0() with seastar::future::get()

2024-02-02 22:12:57 +08:00

azure_snitch.hh

treewide: remove empty comments in top-of-files

2022-05-13 07:11:58 +02:00

CMakeLists.txt

build: cmake: add check-header target

2023-11-13 10:27:06 +02:00

ec2_multi_region_snitch.cc

locator: do not include unused headers

2024-01-23 09:12:23 +02:00

ec2_multi_region_snitch.hh

endpoint_state subscriptions: batch on_change notification

2023-12-31 18:37:34 +02:00

ec2_snitch.cc

locator: do not include unused headers

2024-01-23 09:12:23 +02:00

ec2_snitch.hh

locator::ec2_snitch: change retry logic to exponential backoff

2023-12-25 18:17:23 +02:00

everywhere_replication_strategy.cc

locator: Wrap replication_strategy_config_options into replication_strategy_params

2023-12-25 15:53:03 +03:00

everywhere_replication_strategy.hh

locator: Wrap replication_strategy_config_options into replication_strategy_params

2023-12-25 15:53:03 +03:00

gce_snitch.cc

treewide: replace seastar::future::get0() with seastar::future::get()

2024-02-02 22:12:57 +08:00

gce_snitch.hh

treewide: remove empty comments in top-of-files

2022-05-13 07:11:58 +02:00

gossiping_property_file_snitch.cc

treewide: replace seastar::future::get0() with seastar::future::get()

2024-02-02 22:12:57 +08:00

gossiping_property_file_snitch.hh

locator: do not include unused headers

2024-01-23 09:12:23 +02:00

host_id.hh

everywhere: define locator::host_id as a strong tagged_uuid type

2022-08-12 06:01:44 +03:00

load_sketch.hh

tree: remove unnecessary yields around for_each_tablet()

2024-02-12 17:10:25 +01:00

local_strategy.cc

locator: Wrap replication_strategy_config_options into replication_strategy_params

2023-12-25 15:53:03 +03:00

local_strategy.hh

locator: do not include unused headers

2024-01-23 09:12:23 +02:00

network_topology_strategy.cc

Merge 'tablets: Make sure topology has enough endpoints for RF' from Pavel Emelyanov

2024-02-06 22:38:11 +01:00

network_topology_strategy.hh

tablet_allocator: Add initial tablets scale to config

2024-01-22 19:14:45 +03:00

production_snitch_base.cc

snitch: pass broadcast_address in snitch_config

2023-12-05 08:42:49 +02:00

production_snitch_base.hh

locator: do not include unused headers

2024-01-23 09:12:23 +02:00

rack_inferring_snitch.cc

locator: do not include unused headers

2024-01-23 09:12:23 +02:00

rack_inferring_snitch.hh

snitch: pass broadcast_address in snitch_config

2023-12-05 08:42:49 +02:00

simple_snitch.cc

…

simple_snitch.hh

locator: do not include unused headers

2024-01-23 09:12:23 +02:00

simple_strategy.cc

replication_strategy: Do not convert string RF into int twise

2024-02-02 14:38:17 +03:00

simple_strategy.hh

locator: do not include unused headers

2024-01-23 09:12:23 +02:00

snitch_base.cc

locator: do not include unused headers

2024-01-23 09:12:23 +02:00

snitch_base.hh

locator: do not include unused headers

2024-01-23 09:12:23 +02:00

tablet_metadata_guard.hh

token_metadata: drop the template

2023-12-12 23:19:54 +04:00

tablet_replication_strategy.hh

tablets: Remove tablet_aware_replication_strategy::parse_initial_tablets

2024-01-29 10:03:38 +02:00

tablet_sharder.hh

locator: do not include unused headers

2024-01-23 09:12:23 +02:00

tablets.cc

topology.tablets_migration: Add cleanup_target transition stage

2024-02-20 08:59:06 +03:00

tablets.hh

topology.tablets_migration: Add cleanup_target transition stage

2024-02-20 08:59:06 +03:00

token_metadata_fwd.hh

token_metadata: drop the template

2023-12-12 23:19:54 +04:00

token_metadata.cc

token_metadata: pass node id when formatting it

2023-12-15 16:43:44 +01:00

token_metadata.hh

locator: do not include unused headers

2024-01-23 09:12:23 +02:00

token_range_splitter.hh

token_metadata: drop the template

2023-12-12 23:19:54 +04:00

topology.cc

topology: print node* with node_printer

2024-02-20 14:35:56 +03:00

topology.hh

topology: Expand formatter<locator::node>

2024-02-09 13:49:15 +03:00

types.hh

dc_rack_fn: make it non-template

2023-12-12 23:19:54 +04:00

util.cc

Rename keyspace::get_effective_replication_map()

2024-02-13 20:22:02 +02:00

util.hh

storage_service, locator: extract describe_ring()

2022-12-10 12:51:05 +01:00