database: fix overflow when computing data distribution over shards
We store the per-shard chunk count in a uint64_t vector
global_offset, and then convert the counts to offsets with
a prefix sum:
```c++
// [1, 2, 3, 0] --> [0, 1, 3, 6]
std::exclusive_scan(global_offset.begin(), global_offset.end(), global_offset.begin(), 0, std::plus());
```
However, std::exclusive_scan takes the accumulator type from the
initial value, 0, which is an int, instead of from the range being
iterated, which is of uint64_t.
As a result, the prefix sum is computed as a 32-bit integer value. If
it exceeds 0x8000'0000, it becomes negative. It is then extended to
64 bits and stored. The result is a huge 64-bit number. Later on
we try to find an sstable with this chunk and fail, crashing on
an assertion.
An example of the failure can be seen here: https://godbolt.org/z/6M8aEbo57
The fix is simple: the initial value is passed as uint64_t instead of int.
Fixes https://github.com/scylladb/scylladb/issues/27417
Closes scylladb/scylladb#27418
This commit is contained in:
committed by
Piotr Dulikowski
parent
9d2f7c3f52
commit
9696ee64d0
@@ -3704,7 +3704,7 @@ future<utils::chunked_vector<temporary_buffer<char>>> database::sample_data_file
|
||||
}), std::ref(state));
|
||||
|
||||
// [1, 2, 3, 0] --> [0, 1, 3, 6]
|
||||
std::exclusive_scan(global_offset.begin(), global_offset.end(), global_offset.begin(), 0, std::plus());
|
||||
std::exclusive_scan(global_offset.begin(), global_offset.end(), global_offset.begin(), uint64_t(0), std::plus());
|
||||
|
||||
// We can't generate random non-negative integers smaller than 0,
|
||||
// so let's just deal with the `total_chunks == 0` case with an early return.
|
||||
|
||||
Reference in New Issue
Block a user