partition_version: fix violation of "older versions are evicted first" during schema upgrades
A schema upgrade appends a MVCC version B after an existing version A. The last dummy in B is added to the front of LRU, so it will be evicted after the entries in A. This alone doesn't quite violate the "older versions are evicted first" rule, because the new last dummy carries no information. But apply_monotonically generally assumes that entries on the same position have the obvious eviction order, even if they carry no information. Thus, after the merge, the rule can become broken. The proposed fix is as follows: - In the case where A is merged into B, the merged last dummy inherits the link of A. - The merging of B into anything is prevented until its merge with A is finished. This is relatively hacky, because it still involves a state that goes against some natural expectations granted by the "older versions..." rule. A less hacky fix would be to ensure that the new dummy is inserted into a proper place in the eviction order to begin with. Or, better yet, we could eliminate the rule altogether. Aside from being very hard to maintain, it also prevents the introduction of any eviction algorithm other than LRU.
This commit is contained in:
@@ -454,12 +454,12 @@ stop_iteration mutation_partition_v2::apply_monotonically(const schema& s, const
|
||||
if (same_schema) [[likely]] {
|
||||
// Newer evictable versions store complete rows
|
||||
i->row() = std::move(src_e.row());
|
||||
// Need to preserve the LRU link of the later version in case it's
|
||||
// the last dummy entry which holds the partition entry linked in LRU.
|
||||
i->swap(src_e);
|
||||
} else {
|
||||
i->apply_monotonically(s, p_s, std::move(src_e));
|
||||
}
|
||||
// Need to preserve the LRU link of the later version in case it's
|
||||
// the last dummy entry which holds the partition entry linked in LRU.
|
||||
i->swap(src_e);
|
||||
tracker->remove(src_e);
|
||||
} else {
|
||||
// Avoid row compaction if no newer range tombstone.
|
||||
|
||||
@@ -227,6 +227,17 @@ stop_iteration partition_snapshot::merge_partition_versions(mutation_application
|
||||
if (!_version_merging_state) {
|
||||
_version_merging_state = apply_resume();
|
||||
}
|
||||
if (prev->prev() && prev->prev()->_is_being_upgraded) [[unlikely]] {
|
||||
// Give up.
|
||||
//
|
||||
// While `prev->prev()` is being upgraded into `prev`,
|
||||
// `prev`'s last dummy violates the usual eviction order.
|
||||
// Merging it into `current` could break the "older versions are evicted first".
|
||||
//
|
||||
// There is no harm in giving up here. After the upgrade finishes,
|
||||
// `prev`'s snapshot will slide to `current` and pick up where we left.
|
||||
return stop_iteration::yes;
|
||||
}
|
||||
if (!prev->_is_being_upgraded && prev->get_schema()->version() != current->get_schema()->version()) {
|
||||
// The versions we are attempting to merge have different schemas.
|
||||
// In this scenario the older version has to be upgraded before
|
||||
|
||||
Reference in New Issue
Block a user