mutation_partition: compare_row_marker_for_merge: consider ttl in case expiry is the same
As in compare_atomic_cell_for_merge, we want to consider the row marker ttl for ordering, in case both are expiring and have the same expiration time. This was missed ina57c087c89anda085ef74ff. With that in mind, add documentation to compare_row_marker_for_merge and a mutual note to both functions about their equivalence. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This commit is contained in:
@@ -68,6 +68,10 @@ atomic_cell::atomic_cell(const abstract_type& type, atomic_cell_view other)
|
||||
|
||||
// Based on Cassandra's resolveRegular function:
|
||||
// - https://github.com/apache/cassandra/blob/e4f31b73c21b04966269c5ac2d3bd2562e5f6c63/src/java/org/apache/cassandra/db/rows/Cells.java#L79-L119
|
||||
//
|
||||
// Note: the ordering algorithm for cell is the same as for rows,
|
||||
// except that the cell value is used to break a tie in case all other attributes are equal.
|
||||
// See compare_row_marker_for_merge.
|
||||
std::strong_ordering
|
||||
compare_atomic_cell_for_merge(atomic_cell_view left, atomic_cell_view right) {
|
||||
// Largest write timestamp wins.
|
||||
|
||||
@@ -1108,20 +1108,34 @@ operator<<(std::ostream& os, const mutation_partition::printer& p) {
|
||||
constexpr gc_clock::duration row_marker::no_ttl;
|
||||
constexpr gc_clock::duration row_marker::dead;
|
||||
|
||||
// Note: the ordering algorithm for rows is the same as for cells,
|
||||
// except that there is no cell value to break a tie in case all other attributes are equal.
|
||||
// See compare_atomic_cell_for_merge.
|
||||
int compare_row_marker_for_merge(const row_marker& left, const row_marker& right) noexcept {
|
||||
// Largest write timestamp wins.
|
||||
if (left.timestamp() != right.timestamp()) {
|
||||
return left.timestamp() > right.timestamp() ? 1 : -1;
|
||||
}
|
||||
// Tombstones always win reconciliation with live rows of the same timestamp
|
||||
if (left.is_live() != right.is_live()) {
|
||||
return left.is_live() ? -1 : 1;
|
||||
}
|
||||
if (left.is_live()) {
|
||||
// Prefer expiring rows (which will become tombstones at some future date) over live rows.
|
||||
// See https://issues.apache.org/jira/browse/CASSANDRA-14592
|
||||
if (left.is_expiring() != right.is_expiring()) {
|
||||
// prefer expiring cells.
|
||||
return left.is_expiring() ? 1 : -1;
|
||||
}
|
||||
if (left.is_expiring() && left.expiry() != right.expiry()) {
|
||||
return left.expiry() < right.expiry() ? -1 : 1;
|
||||
// If both are expiring, choose the cell with the latest expiry or derived write time.
|
||||
if (left.is_expiring()) {
|
||||
if (left.expiry() != right.expiry()) {
|
||||
return left.expiry() < right.expiry() ? -1 : 1;
|
||||
} else if (left.ttl() != right.ttl()) {
|
||||
// The cell write time is derived by (expiry - ttl).
|
||||
// Prefer row that was written later (and has a smaller ttl).
|
||||
return left.ttl() < right.ttl() ? 1 : -1;
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// Both are either deleted or missing
|
||||
|
||||
Reference in New Issue
Block a user