service: migration_manager: Run group0 barrier in gossip scheduling group

Fixes two issues. One is potential priority inversion. The barrier will be executed using scheduling group of the first fiber which triggers it, the rest will block waiting on it. For example, CQL statements which need to sync the schema on replica side can block on the barrier triggered by streaming. That's undesirable. This is theoretical, not proved in the field. The second problem is blocking the error path. This barrier is called from the streaming error handling path. If the streaming concurrency semaphore is exhausted, and streaming fails due to timeout on obtaining the permit in check_needs_view_update_path(), the error path will block too because it will also attempt to obtain the permit as part of the group0 barrier. Running it in the gossip scheduling group prevents this. Fixes #24925 (cherry picked from commit ee2fa58bd6)
2025-07-11 15:48:44 +02:00
parent 36d2f80f38
commit 434ecdee0e
1 changed files with 3 additions and 1 deletions
--- a/service/migration_manager.cc
+++ b/service/migration_manager.cc
@@ -56,7 +56,9 @@ migration_manager::migration_manager(migration_notifier& notifier, gms::feature_
        , _group0_barrier(this_shard_id() == 0 ?
            std::function<future<>()>([this] () -> future<> {
                // This will run raft barrier and will sync schema with the leader
-                (void)co_await start_group0_operation();
+                return with_scheduling_group(_storage_proxy.get_db().local().get_gossip_scheduling_group(), [this] {
+                    return start_group0_operation().discard_result();
+                });
            }) :
            std::function<future<>()>([this] () -> future<> {
                co_await container().invoke_on(0, [] (migration_manager& mm) -> future<> {