partition_version: fix violation of "older versions are evicted first" during schema upgrades

A schema upgrade appends a MVCC version B after an existing version A.

The last dummy in B is added to the front of LRU,
so it will be evicted after the entries in A.

This alone doesn't quite violate the "older versions are evicted first" rule,
because the new last dummy carries no information. But apply_monotonically
generally assumes that entries on the same position have the obvious
eviction order, even if they carry no information. Thus, after the merge,
the rule can become broken.

The proposed fix is as follows:

- In the case where A is merged into B, the merged last dummy
  inherits the link of A.
- The merging of B into anything is prevented until its merge with A is finished.

This is relatively hacky, because it still involves a state that
goes against some natural expectations granted by the "older versions..."
rule. A less hacky fix would be to ensure that the new dummy is inserted
into a proper place in the eviction order to begin with.

Or, better yet, we could eliminate the rule altogether.
Aside from being very hard to maintain, it also prevents the introduction
of any eviction algorithm other than LRU.
This commit is contained in:
Michał Chojnowski
2023-11-02 19:31:58 +01:00
parent 2aac8690c7
commit 9ccd4ea416
2 changed files with 14 additions and 3 deletions

View File

@@ -454,12 +454,12 @@ stop_iteration mutation_partition_v2::apply_monotonically(const schema& s, const
if (same_schema) [[likely]] {
// Newer evictable versions store complete rows
i->row() = std::move(src_e.row());
// Need to preserve the LRU link of the later version in case it's
// the last dummy entry which holds the partition entry linked in LRU.
i->swap(src_e);
} else {
i->apply_monotonically(s, p_s, std::move(src_e));
}
// Need to preserve the LRU link of the later version in case it's
// the last dummy entry which holds the partition entry linked in LRU.
i->swap(src_e);
tracker->remove(src_e);
} else {
// Avoid row compaction if no newer range tombstone.

View File

@@ -227,6 +227,17 @@ stop_iteration partition_snapshot::merge_partition_versions(mutation_application
if (!_version_merging_state) {
_version_merging_state = apply_resume();
}
if (prev->prev() && prev->prev()->_is_being_upgraded) [[unlikely]] {
// Give up.
//
// While `prev->prev()` is being upgraded into `prev`,
// `prev`'s last dummy violates the usual eviction order.
// Merging it into `current` could break the "older versions are evicted first".
//
// There is no harm in giving up here. After the upgrade finishes,
// `prev`'s snapshot will slide to `current` and pick up where we left.
return stop_iteration::yes;
}
if (!prev->_is_being_upgraded && prev->get_schema()->version() != current->get_schema()->version()) {
// The versions we are attempting to merge have different schemas.
// In this scenario the older version has to be upgraded before