mutation_partition: compare_row_marker_for_merge: consider ttl in case expiry is the same

As in compare_atomic_cell_for_merge, we want to consider
the row marker ttl for ordering, in case both are expiring
and have the same expiration time.

This was missed in a57c087c89
and a085ef74ff.

With that in mind, add documentation to compare_row_marker_for_merge
and a mutual note to both functions about their
equivalence.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This commit is contained in:
Benny Halevy
2023-06-13 10:00:10 +03:00
parent 6717e45ff0
commit 0aa13f70eb
2 changed files with 20 additions and 2 deletions

View File

@@ -68,6 +68,10 @@ atomic_cell::atomic_cell(const abstract_type& type, atomic_cell_view other)
// Based on Cassandra's resolveRegular function:
// - https://github.com/apache/cassandra/blob/e4f31b73c21b04966269c5ac2d3bd2562e5f6c63/src/java/org/apache/cassandra/db/rows/Cells.java#L79-L119
//
// Note: the ordering algorithm for cell is the same as for rows,
// except that the cell value is used to break a tie in case all other attributes are equal.
// See compare_row_marker_for_merge.
std::strong_ordering
compare_atomic_cell_for_merge(atomic_cell_view left, atomic_cell_view right) {
// Largest write timestamp wins.

View File

@@ -1108,20 +1108,34 @@ operator<<(std::ostream& os, const mutation_partition::printer& p) {
constexpr gc_clock::duration row_marker::no_ttl;
constexpr gc_clock::duration row_marker::dead;
// Note: the ordering algorithm for rows is the same as for cells,
// except that there is no cell value to break a tie in case all other attributes are equal.
// See compare_atomic_cell_for_merge.
int compare_row_marker_for_merge(const row_marker& left, const row_marker& right) noexcept {
// Largest write timestamp wins.
if (left.timestamp() != right.timestamp()) {
return left.timestamp() > right.timestamp() ? 1 : -1;
}
// Tombstones always win reconciliation with live rows of the same timestamp
if (left.is_live() != right.is_live()) {
return left.is_live() ? -1 : 1;
}
if (left.is_live()) {
// Prefer expiring rows (which will become tombstones at some future date) over live rows.
// See https://issues.apache.org/jira/browse/CASSANDRA-14592
if (left.is_expiring() != right.is_expiring()) {
// prefer expiring cells.
return left.is_expiring() ? 1 : -1;
}
if (left.is_expiring() && left.expiry() != right.expiry()) {
return left.expiry() < right.expiry() ? -1 : 1;
// If both are expiring, choose the cell with the latest expiry or derived write time.
if (left.is_expiring()) {
if (left.expiry() != right.expiry()) {
return left.expiry() < right.expiry() ? -1 : 1;
} else if (left.ttl() != right.ttl()) {
// The cell write time is derived by (expiry - ttl).
// Prefer row that was written later (and has a smaller ttl).
return left.ttl() < right.ttl() ? 1 : -1;
}
}
} else {
// Both are either deleted or missing