test/cluster/test_view_building_coordinator: fix view_updates_drained predicate

The previous fix for the flakiness in test_file_streaming waited for
the scylla_database_view_update_backlog metric to drop to 0 via
wait_for(view_updates_drained, ...). However, the predicate returned
True/False, while wait_for treats any non-None result as 'done' and
keeps retrying only on None. So when the backlog was non-zero the
predicate returned False, which wait_for interpreted as success and
returned immediately - the test could then stop servers[0]/servers[1]
before the view updates generated by new_server from the migrated
staging sstable were actually delivered, leading to a partially
populated MV (e.g. 431/1000 rows) and a failing assertion.

Fix the predicate to return None instead of False when the backlog is
not yet drained, so wait_for will actually retry until the metric
reaches 0 (or the deadline is hit).

Fixes SCYLLADB-1182

Closes scylladb/scylladb#29587
This commit is contained in:
Michał Jadwiszczak
2026-04-21 17:14:41 +02:00
committed by Avi Kivity
parent 67b3ad94a0
commit 878f341338

View File

@@ -753,13 +753,15 @@ async def test_file_streaming(manager: ManagerClient):
# View updates generated by staging sstables aren't awaited in the consumer.
# So it's possible that the view building task is finished but not all view updates were
# written. To remove the flakiness we can wait until scylla_database_view_update_backlog metric drops to 0.
# The metric tracks the memory used by view updates generated by the local node, so we query
# it on new_server, which generated the updates from the migrated staging sstable.
# Fixes scylladb/scylladb#26683
async def view_updates_drained():
local_metrics = await manager.metrics.query(new_server.ip_addr)
for shard in range(smp):
backlog = local_metrics.get("scylla_database_view_update_backlog", {'shard':str(shard)})
if backlog > 0:
return False
return None
return True
await wait_for(view_updates_drained, deadline=time.time() + 30)