Merge 'raft: read_barrier: update local commit_idx to read_idx when it's safe' from Patryk Jędrzejczak

When the local entry with `read_idx` belongs to the current term, it's
safe to update the local `commit_idx` to `read_idx`.

The motivation for this change is to speed up read barriers. `wait_for_apply`
executed at the end of `read_barrier` is delayed until the follower learns
that the entry with `read_idx` is committed. It usually happens quickly in
the `read_quorum` message. However, non-voters don't receive this message,
so they have to wait for `append_entries`. If no new entries are being
added, `append_entries` can come only from `fsm::tick_leader()`. For group0,
this happens once every 100ms.

The issue above significantly slows down cluster setups in tests. Nodes
join group0 as non-voters, and then they are met with several read barriers
just after a write to group0. One example is `global_token_metadata_barrier`
in `write_both_read_new` performed just after `update_topology_state` in
`write_both_read_old`.

I tested the performance impact of this change with the following test:
```python
for _ in range(10):
    await manager.servers_add(3)
```
It consistently takes 44-45s with the change and 50-51s without the change
in dev mode.

No backport:
- non-critical performance improvement mostly relevant in tests,
- the change requires some soak time in master.

Closes scylladb/scylladb#28891

* github.com:scylladb/scylladb:
  raft: server: fix the repeating typo
  raft: clarify the comment about read_barrier_reply
  raft: read_barrier: update local commit_idx to read_idx when it's safe
  raft: log: clarify the specification of term_for
This commit is contained in:
Dawid Mędrek
2026-03-06 18:50:08 +01:00
6 changed files with 25 additions and 7 deletions

View File

@@ -1108,6 +1108,14 @@ std::optional<std::pair<read_id, index_t>> fsm::start_read_barrier(server_id req
return std::make_pair(id, _commit_idx);
}
void fsm::maybe_update_commit_idx_for_read(index_t read_idx) {
// read_idx from the leader might not be replicated to the local node yet.
const bool in_local_log = read_idx <= _log.last_idx();
if (in_local_log && log_term_for(read_idx) == get_current_term()) {
advance_commit_idx(read_idx);
}
}
void fsm::stop() {
if (is_leader()) {
// Become follower to stop accepting requests

View File

@@ -480,6 +480,15 @@ public:
std::optional<std::pair<read_id, index_t>> start_read_barrier(server_id requester);
// Update the commit index to the read index (a read barrier result from the leader) if the local entry with the
// read index belongs to the current term.
//
// Satisfying the condition above guarantees that the local log matches the current leader's log up to the read
// index (the Log Matching Property), so the current leader won't drop the local entry with the read index.
// Moreover, this entry has been committed by the leader, so future leaders also won't drop it (the Leader
// Completeness Property). Hence, updating the commit index is safe.
void maybe_update_commit_idx_for_read(index_t read_idx);
size_t in_memory_log_size() const {
return _log.in_memory_size();
}

View File

@@ -167,7 +167,7 @@ public:
// and non matching term is in second
std::pair<bool, term_t> match_term(index_t idx, term_t term) const;
// Return term number of the entry matching the index. If the
// entry is not in the log and does not match snapshot index,
// entry is in the truncated part of the log and does not match snapshot index,
// return an empty optional.
// Used to validate the log matching rule.
std::optional<term_t> term_for(index_t idx) const;

View File

@@ -505,7 +505,7 @@ using add_entry_reply = std::variant<entry_id, transient_error, commit_status_un
// std::monostate {} if the leader cannot execute the barrier because
// it did not commit any entries yet
// raft::not_a_leader if the node is not a leader
// index_t index that is safe to read without breaking linearizability
// index_t index that is safe to read once applied without breaking linearizability
using read_barrier_reply = std::variant<std::monostate, index_t, raft::not_a_leader>;
using rpc_message = std::variant<append_request,

View File

@@ -1561,6 +1561,7 @@ future<> server_impl::read_barrier(seastar::abort_source* as) {
co_return stop_iteration::no;
}
read_idx = std::get<index_t>(res);
_fsm->maybe_update_commit_idx_for_read(read_idx);
co_return stop_iteration::yes;
});

View File

@@ -77,7 +77,7 @@ public:
// committed locally means simply that the commit index is beyond this entry's index.
//
// The caller may pass a pointer to an abort_source to make the operation abortable.
// It it passes nullptr, the operation is unabortable.
// If it passes nullptr, the operation is unabortable.
//
// Successful `add_entry` with `wait_type::committed` does not guarantee that `state_machine::apply` will be called
// locally for this entry. Between the commit and the application we may receive a snapshot containing this entry,
@@ -125,7 +125,7 @@ public:
// returned even in case of a successful config change.
//
// The caller may pass a pointer to an abort_source to make the operation abortable.
// It it passes nullptr, the operation is unabortable.
// If it passes nullptr, the operation is unabortable.
//
// Exceptions:
// raft::conf_change_in_progress
@@ -206,7 +206,7 @@ public:
// future has resolved successfully.
//
// The caller may pass a pointer to an abort_source to make the operation abortable.
// It it passes nullptr, the operation is unabortable.
// If it passes nullptr, the operation is unabortable.
//
// Exceptions:
// raft::request_aborted
@@ -251,7 +251,7 @@ public:
// the call as before, but term should be different.
//
// The caller may pass a pointer to an abort_source to make the function abortable.
// It it passes nullptr, the function is unabortable.
// If it passes nullptr, the function is unabortable.
//
// Exceptions:
// raft::stopped_error
@@ -267,7 +267,7 @@ public:
// `raft::server_id`.
//
// The caller may pass a pointer to an abort_source to make the function abortable.
// It it passes nullptr, the function is unabortable.
// If it passes nullptr, the function is unabortable.
//
// Exceptions:
// raft::stopped_error