diff --git a/real_dirty_memory_accounter.hh b/real_dirty_memory_accounter.hh index 461ce10b39..6bd74fb65c 100644 --- a/real_dirty_memory_accounter.hh +++ b/real_dirty_memory_accounter.hh @@ -12,7 +12,81 @@ #include "row_cache.hh" #include "replica/dirty_memory_manager.hh" -// makes sure that cache update handles real dirty memory correctly. +// # Flush control overview +// +// It is good for memtables to grow as big as possible before they are flushed +// to disk, because this reduces the number of sstables, which in turn reduces +// the number of disk reads on queries and/or reduces the necessary compaction +// work. +// +// On the other hand, the flush should start early enough to leave enough space +// in RAM for new data incoming during the flush. Otherwise RAM could be +// overfilled and the database would have to be throttled until some space is +// freed. +// +// It is good for a memtable flush to happen as slow as possible, because it's +// a non-interactive task, which should take as little resources from +// interactive tasks as possible. +// +// On the other hand, the flush has to progress fast enough to keep up with +// incoming writes, otherwise the size of data in RAM will keep growing +// until RAM is overfilled. +// +// The balance of the above is kept by dirty_memory_manager and flush_controller. +// They attempt to make flushes as delayed and slow as possible without risking +// an OOM situation. +// +// # Flush control implementation +// +// Flush delay and speed is based on some formulas involving total available RAM, +// total ("real") memtable memory and "unspooled" (not flushed to disk yet) memtable +// memory. See dirty_memory_manager for details. +// +// ("Dirty" is a term borrowed from general caching terminology, which usually means +// cache entries which were modified and have to be written back before being discarded. +// In Scylla, "dirty memory" simply means "memory taken up by memtable data"). +// +// (In some context the implementation might find it more natural to talk about +// "spooled" memory instead of "unspooled". "Spooled" is just the difference between +// "real" and "unspooled"). +// +// Every memtable is kept in its own LSA region which tracks exact (the allocator is +// the source of truth about memory usage) "real" changes for that memtable. +// While the memtable is active, its "unspooled" is (obviously) equal to its "real". +// +// Once flush starts, remaining "unspooled" is tracked by flush_memory_accounter. +// As flush_reader reads a memtable and passes its data to an sstable writer, +// it asks flush_memory_accounter to decrement "unspooled" memory counters. +// +// The accounting by flush_memory_accounter is not exact (for example it +// doesn't account the memtable tree nodes, only the contents stored there) but +// it doesn't have to. It is only used by flush control formulas, so it only +// has to be accurate enough for the heurisitcs to work. +// +// When memtable flush is finished, the amount of "unspooled" memory is corrected +// from its inexact value to now-exact 0. +// +// After a memtable flush finishes, the memtable has to be merged into the +// cache, to update or invalidate existing cache entries, so combined RAM data +// stays up to date with sstables after the memtable disappears. +// +// During this merge, "real" decreases, and this has to be accounted for. +// Unfortunately we can't rely on LSA's counters (which are the source of +// truth for memory accounting) for "real", because the merge +// requires the memtable LSA region to be merged into the cache LSA region. +// As soon as the merge starts, all memtable data is considered a part of the cache +// by the LSA. +// +// So, similarly as with flush_memory_accounter earlier, we have real_dirty_memory_accounter +// whose job is to incrementally decrease "real" as data is merged into the cache. +// This might also be slightly inexact. +// +// As row_cache::update() progresses, it estimates the amount of processed memory +// as well as it can, and asks real_dirty_memory_accounter to "unpin" it from "real". +// +// Once the merge finishes, the "real" for the merged memtable is reduced from +// its slightly inexact value to now-exact 0. +// class real_dirty_memory_accounter { replica::dirty_memory_manager& _mgr; cache_tracker& _tracker;