Merge 'raft: read_barrier: update local commit_idx to read_idx when it's safe' from Patryk Jędrzejczak
When the local entry with `read_idx` belongs to the current term, it's
safe to update the local `commit_idx` to `read_idx`.
The motivation for this change is to speed up read barriers. `wait_for_apply`
executed at the end of `read_barrier` is delayed until the follower learns
that the entry with `read_idx` is committed. It usually happens quickly in
the `read_quorum` message. However, non-voters don't receive this message,
so they have to wait for `append_entries`. If no new entries are being
added, `append_entries` can come only from `fsm::tick_leader()`. For group0,
this happens once every 100ms.
The issue above significantly slows down cluster setups in tests. Nodes
join group0 as non-voters, and then they are met with several read barriers
just after a write to group0. One example is `global_token_metadata_barrier`
in `write_both_read_new` performed just after `update_topology_state` in
`write_both_read_old`.
I tested the performance impact of this change with the following test:
```python
for _ in range(10):
await manager.servers_add(3)
```
It consistently takes 44-45s with the change and 50-51s without the change
in dev mode.
No backport:
- non-critical performance improvement mostly relevant in tests,
- the change requires some soak time in master.
Closes scylladb/scylladb#28891
* github.com:scylladb/scylladb:
raft: server: fix the repeating typo
raft: clarify the comment about read_barrier_reply
raft: read_barrier: update local commit_idx to read_idx when it's safe
raft: log: clarify the specification of term_for
This commit is contained in:
@@ -1108,6 +1108,14 @@ std::optional<std::pair<read_id, index_t>> fsm::start_read_barrier(server_id req
|
||||
return std::make_pair(id, _commit_idx);
|
||||
}
|
||||
|
||||
void fsm::maybe_update_commit_idx_for_read(index_t read_idx) {
|
||||
// read_idx from the leader might not be replicated to the local node yet.
|
||||
const bool in_local_log = read_idx <= _log.last_idx();
|
||||
if (in_local_log && log_term_for(read_idx) == get_current_term()) {
|
||||
advance_commit_idx(read_idx);
|
||||
}
|
||||
}
|
||||
|
||||
void fsm::stop() {
|
||||
if (is_leader()) {
|
||||
// Become follower to stop accepting requests
|
||||
|
||||
@@ -480,6 +480,15 @@ public:
|
||||
|
||||
std::optional<std::pair<read_id, index_t>> start_read_barrier(server_id requester);
|
||||
|
||||
// Update the commit index to the read index (a read barrier result from the leader) if the local entry with the
|
||||
// read index belongs to the current term.
|
||||
//
|
||||
// Satisfying the condition above guarantees that the local log matches the current leader's log up to the read
|
||||
// index (the Log Matching Property), so the current leader won't drop the local entry with the read index.
|
||||
// Moreover, this entry has been committed by the leader, so future leaders also won't drop it (the Leader
|
||||
// Completeness Property). Hence, updating the commit index is safe.
|
||||
void maybe_update_commit_idx_for_read(index_t read_idx);
|
||||
|
||||
size_t in_memory_log_size() const {
|
||||
return _log.in_memory_size();
|
||||
}
|
||||
|
||||
@@ -167,7 +167,7 @@ public:
|
||||
// and non matching term is in second
|
||||
std::pair<bool, term_t> match_term(index_t idx, term_t term) const;
|
||||
// Return term number of the entry matching the index. If the
|
||||
// entry is not in the log and does not match snapshot index,
|
||||
// entry is in the truncated part of the log and does not match snapshot index,
|
||||
// return an empty optional.
|
||||
// Used to validate the log matching rule.
|
||||
std::optional<term_t> term_for(index_t idx) const;
|
||||
|
||||
@@ -505,7 +505,7 @@ using add_entry_reply = std::variant<entry_id, transient_error, commit_status_un
|
||||
// std::monostate {} if the leader cannot execute the barrier because
|
||||
// it did not commit any entries yet
|
||||
// raft::not_a_leader if the node is not a leader
|
||||
// index_t index that is safe to read without breaking linearizability
|
||||
// index_t index that is safe to read once applied without breaking linearizability
|
||||
using read_barrier_reply = std::variant<std::monostate, index_t, raft::not_a_leader>;
|
||||
|
||||
using rpc_message = std::variant<append_request,
|
||||
|
||||
@@ -1561,6 +1561,7 @@ future<> server_impl::read_barrier(seastar::abort_source* as) {
|
||||
co_return stop_iteration::no;
|
||||
}
|
||||
read_idx = std::get<index_t>(res);
|
||||
_fsm->maybe_update_commit_idx_for_read(read_idx);
|
||||
co_return stop_iteration::yes;
|
||||
});
|
||||
|
||||
|
||||
@@ -77,7 +77,7 @@ public:
|
||||
// committed locally means simply that the commit index is beyond this entry's index.
|
||||
//
|
||||
// The caller may pass a pointer to an abort_source to make the operation abortable.
|
||||
// It it passes nullptr, the operation is unabortable.
|
||||
// If it passes nullptr, the operation is unabortable.
|
||||
//
|
||||
// Successful `add_entry` with `wait_type::committed` does not guarantee that `state_machine::apply` will be called
|
||||
// locally for this entry. Between the commit and the application we may receive a snapshot containing this entry,
|
||||
@@ -125,7 +125,7 @@ public:
|
||||
// returned even in case of a successful config change.
|
||||
//
|
||||
// The caller may pass a pointer to an abort_source to make the operation abortable.
|
||||
// It it passes nullptr, the operation is unabortable.
|
||||
// If it passes nullptr, the operation is unabortable.
|
||||
//
|
||||
// Exceptions:
|
||||
// raft::conf_change_in_progress
|
||||
@@ -206,7 +206,7 @@ public:
|
||||
// future has resolved successfully.
|
||||
//
|
||||
// The caller may pass a pointer to an abort_source to make the operation abortable.
|
||||
// It it passes nullptr, the operation is unabortable.
|
||||
// If it passes nullptr, the operation is unabortable.
|
||||
//
|
||||
// Exceptions:
|
||||
// raft::request_aborted
|
||||
@@ -251,7 +251,7 @@ public:
|
||||
// the call as before, but term should be different.
|
||||
//
|
||||
// The caller may pass a pointer to an abort_source to make the function abortable.
|
||||
// It it passes nullptr, the function is unabortable.
|
||||
// If it passes nullptr, the function is unabortable.
|
||||
//
|
||||
// Exceptions:
|
||||
// raft::stopped_error
|
||||
@@ -267,7 +267,7 @@ public:
|
||||
// `raft::server_id`.
|
||||
//
|
||||
// The caller may pass a pointer to an abort_source to make the function abortable.
|
||||
// It it passes nullptr, the function is unabortable.
|
||||
// If it passes nullptr, the function is unabortable.
|
||||
//
|
||||
// Exceptions:
|
||||
// raft::stopped_error
|
||||
|
||||
Reference in New Issue
Block a user