tablet_repair_task_impl is run as a part of tablet repair. Make it
a child of tablet repair virtual task.
tablet_repair_task_impl started by /storage_service/repair_async API
(vnode repair) does not have a parent, as it is the top-level task
in that case.
Repair service is started after storage service, while storage service needs to reference repair one for its needs. Recently it was noticed, that this reverse order may cause troubles and was fixed with the help of an extra gate. That's not nice and makes the start-stop mess even worse. The correct fix is to fix the order both services start/stop in.
Closesscylladb/scylladb#22368
* github.com:scylladb/scylladb:
Revert "repair: add repair_service gate"
main: Start repair before storage service
repair: Check for sharded<view-builder> when constructing row_level_repair
Currently, data sync repair handles most no_such_keyspace exceptions,
but it omits the preparation phase, where the exception could be thrown
during make_global_effective_replication_map.
Skip the keyspace repair if no_such_keyspace is thrown during preparations.
Fixes: #22073.
Requires backport to 6.1 and 6.2 as they contain the bug
Closesscylladb/scylladb#22473
* github.com:scylladb/scylladb:
test: add test to check if repair handles no_such_keyspace
repair: handle keyspace dropped
The flush api could not detect if the node is down and fail the flush
before the timeout. This patch detects if there is down node and skip
the flush if so, since the flush will fail after the timeout in this
case anyway.
The slowness due to the flush timeout in
compaction_test.py::TestCompaction::test_delete_tombstone_gc_node_down
is fixed with this patch.
Fixes#22413Closesscylladb/scylladb#22445
Currently, data sync repair handles most no_such_keyspace exceptions,
but it omits the preparation phase, where the exception could be thrown
during make_global_effective_replication_map.
Skip the keyspace repair if no_such_keyspace is thrown during preparations.
This reverts commit 32ab58cdea.
Now repair service starts before and stops after storage server, so the
problem described in the commit is no longer relevant.
Currently initialization order of repair and view-builder is not
correct, so there are several places in repair code that check for v.b.
to be initialized before doing anything. There's one more place that
needs that care -- the construction of row_level_repair object.
The class instantiates helper objects that reply on view_builder to be
fully initialized and is itself created by many other task types from
repair code.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The repair_time in system.tablets will be updated when repair runs
successfully. We can now use it to update the repair time for tombstone
gc, i.e, when the system.tablets.repair_time is propagated, call
gc_state.update_repair_time() on the node that is the owner of the
tablet.
Since b3b3e880d3 ("repair: Reduce hints and batchlog flush"), the
repair time that could be used for tombstone gc might be smaller than
when the repair is started, so the actual repair time for tombstone gc
is returned by the repair rpc call from the repair master node.
Fixes#17507
New feature. No backport is needed.
Closesscylladb/scylladb#21896
* github.com:scylladb/scylladb:
repair: Stop using rpc to update repair time for repairs scheduled by scheduler
repair: Wire repair_time in system.tablets for tombstone gc
test: Disable flush_cache_time for two tablet repair tests
test: Introduce guarantee_repair_time_next_second helper
repair: Return repair time for repair_service::repair_tablet
service: Add tablet_operation.hh
If a tablet repair is scheduled by tablet repair scheduler, the repair
time for tombstone gc will be updated when the system.tablet.repair_time
is updated. Skip updating using rpc calls in this case.
The repair time returned by repair_service::repair_tablet considers the
hints and batchlog flush time, so it could be used for the tombstone gc
purpose.
One of its caller is in the RESTful API which gets ips from the user, so
we convert ips to ids inside the API handler using gossiper before
calling the function. We need to deprecate ip based API and move to host
id based.
In main.cc storage_service is started before and stopped after
repair_service. storage_service keeps a reference to sharded
repair_service and calls its methods, but nothing ensures that
repair_service's local instance would be alive for the whole
execution of the method.
Add a gate to repair_service and enter it in storage_service
before executing methods on local instances of repair_service.
Fixes: #21964.
Closesscylladb/scylladb#22145
this unused include was identifier by clang-include-cleaner. after
auditing task_manager_module.hh, the report has been confirmed.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22200
Currently task_manager_module::is_aborted checks the tasks local
to caller's shard on a given shard.
Fix the method to check the task map local to the given shard.
Fixes: #22156.
Closesscylladb/scylladb#22161
now that we are allowed to use C++23. we now have the luxury of using
`std::ranges::to`.
in this change, we:
- replace `boost::copy_range` to `std::ranges::to`
- remove unused `#include` of boost headers
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21880
Similar to 9ace191616 (repair: Enable
small table optimization for RBNO bootstrap and decommission), this
patch enables small table optimization for RBNO rebuild.
This is useful for rebuild ops which is used for building an empty DC.
Fixes: #21951Closesscylladb/scylladb#21952
these unused includes are identified by clang-include-cleaner. after
auditing the source files, all of the reports have been confirmed.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21837
"
The series moves node ops, repair and streaming verbs to IDL. Also
contains IDL related cleanups.
In addition to the CI tested manually by bootstrapping a node with the
series into a cluster of old nodes with repair and streaming both in
gossiper and raft mode. This exercises repair, streaming and node_ops
paths.
"
* 'gleb/move-more-rpcs-to-idl-v3' of github.com:scylladb/scylla-dev:
repair: repair_flush_hints_batchlog_request::target_nodes is not used any more, so mark it as such
streaming: move streaming verbs to IDL
messaging_service: move repair verbs to IDL
node_ops: move node_ops_cmd to IDL
idl: rename partition_checksum.dist.hh to repair.dist.hh
idl: move node_ops related stuff from the repair related IDL
This change goes thru locator:topology to use node&
instead of node* where nullptr is not possible. There are
places where the node object is used in unordered_set, in
those cases the node is wrapped in std::reference_wrapper.
Fixesscylladb/scylladb#20357Closesscylladb/scylladb#21863
"
This rather large patch series moves storage proxy and some adjacent
services (like migration manager) to use host ids to identify nodes rather
than ips. Messaging service gains a capability to address nodes by host
ids (which allows dropping translations from topology coordinator code
that worked on host ids already) and also makes sure that a node with
incorrect host id will reject a message (can happen during address
changes).
The series gets rid of the raft address map completely and replaces it with
the gossiper address map which is managed by the gossiper since translation
is now done in the layer below raft.
Fixes: scylladb/scylladb#6403
perf-simple-query -- smp 1 -m 1G output
Before:
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
64336.82 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 41291 insns/op, 24485 cycles/op, 0 errors)
62669.58 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 41277 insns/op, 24695 cycles/op, 0 errors)
69172.12 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41326 insns/op, 24463 cycles/op, 0 errors)
56706.60 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 41143 insns/op, 24513 cycles/op, 0 errors)
56416.65 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 41186 insns/op, 24851 cycles/op, 0 errors)
throughput: mean=61860.35 standard-deviation=5395.48 median=62669.58 median-absolute-deviation=5153.75 maximum=69172.12 minimum=56416.65
instructions_per_op: mean=41244.62 standard-deviation=76.90 median=41276.94 median-absolute-deviation=58.55 maximum=41326.19 minimum=41142.80
cpu_cycles_per_op: mean=24601.35 standard-deviation=167.39 median=24512.64 median-absolute-deviation=116.65 maximum=24851.45 minimum=24462.70
After:
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
65237.35 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 40733 insns/op, 23145 cycles/op, 0 errors)
59283.09 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 40624 insns/op, 23948 cycles/op, 0 errors)
70851.03 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 40625 insns/op, 23027 cycles/op, 0 errors)
70549.61 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 40650 insns/op, 23266 cycles/op, 0 errors)
68634.96 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 40622 insns/op, 22935 cycles/op, 0 errors)
throughput: mean=66911.21 standard-deviation=4814.60 median=68634.96 median-absolute-deviation=3638.40 maximum=70851.03 minimum=59283.09
instructions_per_op: mean=40650.89 standard-deviation=47.55 median=40624.60 median-absolute-deviation=27.11 maximum=40733.37 minimum=40622.33
cpu_cycles_per_op: mean=23264.16 standard-deviation=402.12 median=23145.29 median-absolute-deviation=237.63 maximum=23947.96 minimum=22934.59
CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/13531/
SCT (longevity-100gb-4h with nemesis_selector: ['topology_changes']): https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/gleb/job/move-to-host-id/3/
Tested mixed cluster manually.
"
* 'gleb/move-to-host-id-v2' of github.com:scylladb/scylla-dev: (55 commits)
group0: drop unused field from replace_info struct
test: rename raft_address_map_test to address_map_test and move if from raft tests
raft_address_map: remove raft address map
topology coordinator: do not modify expire state for left/new nodes any more in raft address map
topology coordinator: drop expiring entries in gossiper address map on error injections since raft one is no longer used
group0: drop raft address map dependency from raft_rpc
group0: move raft_ticker_type definition from raft_address_map.hh
storage_service: do not update raft address map on gossiper events
group0: drop raft address map dependency from raft_server_with_timeouts
group0: move group0 upgrade code to host ids
repair: drop raft address map dependency
group0: remove unused raft address map getter from raft_group0
group0: drop raft address map from group0_state_machine dependency since it is not used there any more
group0: remove dependency on raft address map from group0_state_id_handler
gossiper: add get_application_state_ptr that searches by host_id
gossiper: change get_live_token_owners to return host ids
view: move view building to host id
hints: use host id to send hints
storage_proxy: remove id_vector_to_addr since it is no longer used
db: consistency_level: change is_sufficient_live_nodes to work on host ids
...
now that we are allowed to use C++23. we now have the luxury of using
`std::views::transform`.
in this change, we:
- replace `boost::adaptors::transformed` with `std::views::transform`
- use `fmt::join()` when appropriate where `boost::algorithm::join()`
is not applicable to a range view returned by `std::view::transform`.
- use `std::ranges::fold_left()` to accumulate the range returned by
`std::view::transform`
- use `std::ranges::fold_left()` to get the maximum element in the
range returned by `std::view::transform`
- use `std::ranges::min()` to get the minimal element in the range
returned by `std::view::transform`
- use `std::ranges::equal()` to compare the range views returned
by `std::view::transform`
- remove unused `#include <boost/range/adaptor/transformed.hpp>`
- use `std::ranges::subrange()` instead of `boost::make_iterator_range()`,
to feed `std::views::transform()` a view range.
to reduce the dependency to boost for better maintainability, and
leverage standard library features for better long-term support.
this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.
limitations:
there are still a couple places where we are still using
`boost::adaptors::transformed` due to the lack of a C++23 alternative
for `boost::join()` and `boost::adaptors::uniqued`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21700
these unused includes are identified by clang-include-cleaner. after
auditing the source files, all of the reports have been confirmed.
please note, because `mutation/mutation.hh` does not include
`seastar/coroutine/maybe_yield.hh` anymore, and quite a few source
files were relying on this header to bring in the declaration of
`maybe_yield()`, we have to include this header in the places where
this symbol is used. the same applies to `seastar/core/when_all.hh`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
When users start an operation asynchronously with API, they are expected to check the operation's status. Hence, the status should be kept in task manager for reasonable time after the operation is done. The operations that are started internally usually don't need to stay in task manager for that long.
Add api_task_ttl that will be used for tasks started with API. By default it's 1 hour. The time for which non-API tasks stay in task manager isn't changed.
Fixes: #21499.
Refs: #21425.
No backport needed - previous versions may use task_ttl
Closesscylladb/scylladb#21505
* github.com:scylladb/scylladb:
test: add test to check user_task_ttl
tasks: api: move make_task method
docs: nodetool: update backup and restore commands docs
docs: update task manager docs
nodetool: add nodetool tasks user-ttl command
node_ops: use user task ttl for node ops virtual task
tasks: use user_task_ttl for tasks started by user
api: task_manager: add /task_manager/user_ttl to get and set user task ttl
tasks: add task_manager::task::is_user_task method
tasks: keep updateable_value of task_ttl in task manager
db: config: add user_task_ttl_seconds named value
Modernize the codebase by replacing Boost range adaptors with C++23 standard library views,
reducing external dependencies and leveraging modern C++ language features.
Key Changes:
- Replace `boost::adaptors::filtered` with `std::views::filter`
- Remove `#include <boost/range/adaptor/filtered.hpp>`
- Utilize standard library range views
Motivation:
- Reduce project's external dependency footprint
- Leverage standard library's range and view capabilities
- Improve long-term code maintainability
- Align with modern C++ best practices
Implementation Challenges and Considerations:
1. Range Conversion and Move Semantics
- `std::ranges::to` adaptor requires rvalue references
- Necessitated updates to variable and parameter constness
- Example: `cql3/restrictions/statement_restrictions.cc` modified to remove `const`
from `common` to enable efficient range conversion
2. Range Iteration and Mutation
- Range views may mutate internal state during iteration
- Cannot pass ranges by const reference in some scenarios
- Solution: Pass ranges by rvalue reference to explicitly indicate
state invalidation
Limitations:
- One instance of `boost::adaptors::filtered` temporarily preserved
due to lack of a C++23 alternative for `boost::join()`
- A comprehensive replacement will be addressed in a follow-up change
This change is part of our ongoing effort to modernize the codebase,
reducing external dependencies and adopting modern C++ practices.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21648
The non local strategy system keyspaces usually contain very litte data.
All the tables within them have to be repaired for all the token ranges,
which could be large in clusters with a large number of nodes. In multiple
DC setup, the repair in RBNO is dominated by the network latency. As a
result, it takes a long time to repair those tables even if they are
almost empty.
To speed up the RBNO bootstrap, especially for starting empty clusters,
this patch enables small table optimization for RBNO for system tables.
We could enable it for small user tables as a follow up.
Tests:
1) A 5ms latency is added to simulate cross dc network delay, 256 tokens
per node, 10 nodes:
- Before
topology_custom dev topology_custom.test_boot_time.1 1287.06s
- After
topology_custom dev topology_custom.test_boot_time.1 12.48s
The test shows 100X boot time improvement
2) A SCT test to bootstrap 3 DCs, 3 nodes in each DC.
- Before
Time to bootstrap = 1h23m
- After
Time to bootstrap = 13m
The test shows 6X bootstrap time improvement
Fixes#19131
This adds a new tablet migration kind: repair. It allows tablet repair
scheduler to use this migration kind to schedule repair jobs.
The current repair scheduler implementation does the following:
- A tablet is picked to be repaired when is requested by user
- The tablet repair can be scheduled along with tablet migration and
rebuild. It runs in the tablet_migration track.
- Repair jobs are scheduled in a smart way so that at any point in time,
there are no more than configured jobs per shard, which is similar to
scylla manager's control.
New feature. No backport is needed.
Closesscylladb/scylladb#21088
* github.com:scylladb/scylladb:
test: Add tests for tablet repair scheduler
repair: Add restful API for tablet repair
repair: Add tablet repair scheduler internal API support
docs: Update system_keyspace.md for tablet repair related info
docs: Add docs for tablet repair migration
repair: Add core tablet repair scheduler support
messaging_service: Introduce TABLET_REPAIR verb
tablet_allocator: Introduce stream_weight for tablet_migration_streaming_info
network_topology_strategy: Preserve fields of task_info in reallocate_tablets
Currently, task_manager_module::abort_all_repairs marks top-level repairs as aborted (but does not abort them) and aborts all existing shard tasks.
A running repair checks whether its id isn't contained in _aborted_pending_repairs and then proceeds to create shard tasks. If abort_all_repairs is executed after _aborted_pending_repairs is checked but before shard tasks are created, then those new tasks won't be aborted. The issue is the most severe for tablet_repair_task_impl that checks the _aborted_pending_repairs content from different shards, that do not see the top-level task. Hence the repair isn't stopped but it creates shard repair tasks on all shards but the one that initialized repair.
Abort top-level tasks in abort_all_repairs. Fix the shard on which the task abort is checked.
Fixes: #21612.
Needs backport to 6.1 and 6.2 as they contain the bug.
Closesscylladb/scylladb#21616
* github.com:scylladb/scylladb:
test: add test to check if repair is properly aborted
repair: add shard param to task_manager_module::is_aborted
repair: use task abort source to abort repair
repair: drop _aborted_pending_repairs and utilize tasks abort mechanism
repair: fix task_manager_module::abort_all_repairs
This adds a new tablet migration kind: repair. It allows tablet repair
scheduler to use this migration kind to schedule repair jobs.
The current repair scheduler implementation does the following:
- A tablet is picked to be repaired when the time since last repair is
bigger than a threshold (auto repair mode) or it is requested by user
(manual repair mode)
- The tablet repair can be scheduled along with tablet migration and
rebuild. It runs in the tablet_migration track.
- Repair jobs are scheduled in a smart way so that at any point in time,
there are no more than configured jobs per shard, which is similar to
scylla manager's control.
In this patch, both the manual repair and the auto repair are not
enabled yet.
Currently, task_manager_module::is_aborted checks whether a task
with given id was aborted on this shard.
In tablet_repair_task_impl::run, is_aborted method is called on all
shards to check if the parent task was aborted. However, even
for aborted parent, is_aborted will return true only on owner shard
of the parent.
Pass shard param to task_manager_module::is_aborted that indicates
which shard to check.
Currently, task_manager_module::abort_all_repairs marks top-level
repairs as aborted (but does not abort them) and aborts all shard
tasks. If after that a top-level repair creates a shard task,
the new shard repair won't be aborted.
Abort top-level repair tasks in abort_all_repairs. They will abort
their children and newly created shard tasks will be immediately
aborted.
tablet_repair_task_impl keeps a vector of tablet_repair_task_meta,
each of which keeps an effective_replication_map_ptr. So, after
the task completes, the token metadata version will not change for
task_ttl seconds.
Implement tablet_repair_task_impl::release_resources method that clears
tablet_repair_task_meta vector when the task finishes.
Set task_ttl to 1h in test_tablet_repair to check whether the test
won't time out.
Fixes: #21503.
Closesscylladb/scylladb#21504