The current implementation reclaims memory from SSTables only when a new
SSTable is created. An upcoming patch will move this reclaim logic into
the existing component reloader fiber. To support this change, the
`maybe_reclaim_components()` method is updated to reclaim memory until
the total memory consumption falls below the configured threshold.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Introduce `get_components_memory_reclaim_threshold()`, which returns
the components' memory threshold based on the total available memory.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Extract the code from
`increment_total_reclaimable_memory_and_maybe_reclaim()` that reclaims
the components memory into `maybe_reclaim_components()`. The extracted
new method will be used by a following patch to handle reclaim within
the components reload fiber.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Extract the logic that reloads reclaimed components into memory in the
`components_reloader_fiber()` method into a separate method. This is in
preparation for moving the reclaim logic into the same fiber.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Test now uses default internal part size, but for performance
comparisons its good to make it configurable.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The test picks up a file and uploads it into the bucket, then prints the
time it took and uploading speed. For now it's enough, with existing S3
latencies more timing details can be obtained by turning on trace
logging on s3 logger.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now this test is all about reading objects. Rename some bits in it so
that they can be re-used by future uploading test as well.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now the download test first creates a temporary object and then reads
data from it. It's good to have an option to download pre-existing file.
This option will also be used for uploading test (next patches)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Add the --sockets NR option that limits the number of sockets the
underlying http client is configured to have.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The test spawns several fibers that read the same file in parallel.
There's not much point in it, just makes the code harder to maintain.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In order to allow concurrent truncate table operations (for the time being,
only for a single table) we have to remove the limitation allowing only one
truncate table fiber per shard.
This change adds the ability to collect the active truncate fibers in
storage_proxy::remote into std::list<> instead of having just a single
truncate fiber. These fibers are waited for completion during
storage_proxy::remote::stop().
In the current scenario, topology_change_kind variable, was been handled using
_manage_topology_change_kind_from_group0 variable. This method was brittle
and had some bugs(e.g. for restart case, it led to a time gap between group0
server start and topology_change_kind being managed via group0)
Post _manage_topology_change_kind_from_group0 removal, careful management of
topology_change_kind variable was needed for maintaining correct
topology_change_kind in all scenarios. So this PR also performs a refactoring
to populate all init data to system tables even before group0 creation(via
raft_initialize_discovery_leader function). Now because raft_initialize_discovery_leader
happens before the group 0 creation, we write mutations directly to system
tables instead of a group 0 command. Hence, post group0 creation, the node
can read the correct values from system tables and correct values are
maintained throughout.
Added a new function initialize_done_topology_upgrade_state which takes
care of updating the correct upgrade state to system tables before starting
group0 server. This ensures that the node can read the correct values from
system tables and correct values are maintained throughout.
By moving raft_initialize_discovery_leader logic to happen before starting
group0 server, and not as group0 command post server start, we also get rid
of the potential problem of init group0 command not being the 1st command on
the server. Hence ensuring full integrity as expected by programmer.
Fixes: scylladb/scylladb#21114
Currently, the tablet repair scheduler repairs all replicas of a tablet.
It does not support hosts or DCs selection. It should be enough for most
cases. However, users might still want to limit the repair to certain
hosts or DCs in production. #21985 added the preparation work to add the
config options for the selection. This patch adds the hosts or DCs
selection support.
Fixes#22417
Previously, download_task_impl's destructor would destroy per-shard progress
elements on whatever shard the task was destroyed on. In multi-shard
environments, this caused "shared_ptr accessed on non-owner cpu" errors when
attempting to free memory allocated on a different shard.
Fix by:
- Convert progress_per_shard into a sharded service
- Stop the service on owner shards during cleanup using coroutines
- Add operator+= to stream_progress to leverage seastar's built-in adder
instead of a custom adder struct
Alternative approaches considered:
1. Using foreign_ptr: Rejected as it would require interface changes
that complicate stream delegation. foreign_ptr manages the underlying
pointee with another smart pointer but does not expose the smart
pointer instance in its APIs, making it impossible to use
shared_ptr<stream_progress> in the interface.
2. Using vector<stream_progress>: Rejected for similar interface
compatibility reasons.
This solution maintains the existing interfaces while ensuring proper
cross-shard cleanup.
Fixesscylladb/scylladb#22759
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Convert tasks::task_manager::task::impl::release_resources() to a coroutine
to prepare for upcoming changes that will implement asynchronous resource
release.
This is a preparatory refactoring that enables future coroutine-based
implementation of resource cleanup logic.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
The timeout of 10 seconds is too small for CI.
I didn't mean to make it so short, it was an accident.
Fix that by changing the timeout to 10 minutes.
Fixesscylladb/scylladb#22832Closesscylladb/scylladb#22836
Token metadata API now depend on gossiper to do ip to host id mappings,
so initialized it after the gossiper is initialized and de-initialized
it before gossiper is stopped.
Fixes: scylladb/scylladb#22743Closesscylladb/scylladb#22760
We need to allow replacing nodetool from scylla-enterprise-tools < 2024.2,
just like we did for scylla-tools < 5.5.
This is required to make packages able to upgrade from 2024.1.
Fixes#22820Closesscylladb/scylladb#22821
Serializing raft::append_request for transmission requires approximately
the same amount of memory as its size. This means when the Raft
library replicates a log item to M servers, the log item is
effectively copied M times. To prevent excessive memory usage
and potential out-of-memory issues, we limit the total memory
consumption of in-flight raft::append_request messages.
Fixes [scylladb/scylladb#14411](https://github.com/scylladb/scylladb/issues/14411)
test_complex_null_values is currently flaky, causing many failures
in CI. The reason for the failures is unclear, and a fix might not
be simple, so because UDFs are experimental, for now let's skip
this test until the corresponding issue is fixed.
Refs scylladb/scylladb#22799Closesscylladb/scylladb#22818
Now that we support suite subfolders,
As an example, this commit move mv tests into a separate folder
custom test.py lookup also works.
tests can be run as:
1. ./tools/toolchain/dbuild ./test.py --no-gather-metrics --mode=dev topology_custom/mv/tablets/test_mv_tablets_empty_ip
2. ./tools/toolchain/dbuild ./test.py --no-gather-metrics --mode=dev topology_custom/mv/tablets
3. ./tools/toolchain/dbuild ./test.py --no-gather-metrics --mode=dev topology_custom/mv
Creating an own folder used to be needed for two reasons:
- we want a separate test suite, with its own settings
- we want to structure tests, e.g. tablets, raft, schema, gossip.
We've been creating many folders recently. However, test suite
infrastructure is expensive in test.py - each suite has its own
pool of servers, concurrency settings and so on.
Make it possible to structure tests without too many suites,
by supporting subfolders within a suite.
Fixes#20570
Said method passes down its `diff` input to `mutate_internal()`, after
some std::ranges massaging. Said massaging is destructive -- it moves
items from the diff. If the output range is iterated-over multiple
times, only the first time will see the actual output, further
iterations will get an empty range.
When trace-level logging is enabled, this is exactly what happens:
`mutate_internal()` iterates over the range multiple times, first to log
its content, then to pass it down the stack. This ends up resulting in
a range with moved-from elements being pased down and consequently write
handlers being created with nullopt mutations.
Make the range re-entrant by materializing it into a vector before
passing it to `mutate_internal()`.
Fixes: scylladb/scylladb#21907Fixes: scylladb/scylladb#21714Closesscylladb/scylladb#21910
When building Scylla with ThinLTO enabled (default with Clang), the linker
spawns threads equal to the number of CPU cores during linking. This high
parallelism can cause out-of-memory (OOM) issues in CI environments,
potentially freezing the build host or triggering the OOM killer.
In this change:
1. Rename `LINK_MEM_PER_JOB` to `Scylla_RAM_PER_LINK_JOB` and make it
user-configurable
2. Add `Scylla_PARALLEL_LINK_JOBS` option to directly control concurrent
link jobs (useful for hosts with large RAM)
3. Increase the default value of `Scylla_PARALLEL_LINK_JOBS` to 16 GiB
when LTO is enabled
4. Default to 2 parallel link jobs when LTO is enabled if the calculated
number if less than 2 for faster build.
Notes:
- Host memory is shared across job pools, so pool separation alone doesn't help
- Ninja lacks per-job memory quota support
- Only affects link parallelism in LTO-enabled builds
See
https://clang.llvm.org/docs/ThinLTO.html#controlling-backend-parallelismFixesscylladb/scylladb#22275
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22383
Oversized materialized view and index names are rejected;
Materialized view names with invalid symbols are rejected.
fixes: #20755Closesscylladb/scylladb#21746
The Intel Optimizaton Manual states that branches with relative offsets
greater than 2GB suffer a penalty. They cite a 6% improvement when this
is avoided. Our code doesn't rely heavily on dynamically linked
libraries, so I don't expect a similar win, but it's still better to do
it than not.
Eliminate long branches by asking the dynamic linker to restrict itself
to the lower 4GB of the address space. I saw that it maps libraries
at 1GB+ addresses, so this satisfies the limitation.
Fix is from the Intel Optimization Manual as well.
This change was ported from ScyllaDB Enterprise.
Closesscylladb/scylladb#22498
This command exists but is not registered. There is a test for it, but
it happens to work only because scylla table is a prefix of scylla
tables (another command), so gdb invokes that other command instead.
Replace boost::remove_if() with the standard library's std::erase_if() or std::ranges::remove_if() to reduce external dependencies and simplify the codebase. This change eliminates the requirement for boost::range and makes the implementation more maintainable.
---
it's a cleanup, hence no need to backport.
Closesscylladb/scylladb#22788
* github.com:scylladb/scylladb:
service: migrate from boost::range::remove_if() to std::ranges::remove_if
sstable: migrate from boost::remove_if() to std::erase_if()
Replace boost::copy() with the standard library's std::ranges::copy()
to reduce external dependencies and simplify the codebase. This change
eliminates the requirement for boost::range and makes the implementation
more maintainable.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22789
Set true to wait for the repair to complete. Set false to skip waiting
for the repair to complete. When the option is not provided, it defaults
to false.
It is useful for management tool that wants the api to be async.
Fixes#22418Closesscylladb/scylladb#22436
prepare helpfully prints out the path where optimized clang is stored,
but a couple of typos mean it prints out an empty string. Fix that.
Closesscylladb/scylladb#22714
Developers are expected to run new cqlpy tests against Cassandra - to
verify that the new test itself is correct. Usually there is no need
to run the entire cqlpy test suite against Cassandra, but when users do
this, it isn't confidence-inspiring to see hundreds of tests failing.
In this patch I fix many but not all of these failures.
Refs #11690 (which will remain open until we fix all the failures on
Cassandra)
* Fixed the "compact_storage" fixture recently introduced to enable the
deprecated feature in Scylla for the tests. This fixture was broken on
Cassandra and caused all compact-storage related tests to fail
on Cassandra.
* Marked all tests in test_tombstone_limit.py as scylla_only - as they
check the Scylla-only query_tombstone_page_limit configuration option.
* Marked all tests in test_service_level_api.py as scylla_only - as they
check the Scylla-only service levels feature.
* Marked a test specific to the Scylla-only IncrementalCompactionStrategy
as scylla_only. Some tests mix STCS and ICS testing in one test - this
is a mistake and isn't fixed in this patch.
* Various tests in test_tablets.py forgot to use skip_without_tablets
to skip them on Cassandra or older Scylla that doesn't have the
tablets feature.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
x
Closesscylladb/scylladb#22561