scylla

Author	SHA1	Message	Date
Avi Kivity	d237d0a4ea	Update seastar submodule * seastar 71036ebcc0...5b95d1d798 (3): > rpc stream: do not abort stream queue if stream connection was closed without error > resource: fallback to sysconf when failed to detect memory size from hwloc > Merge 'scheduling_group: improve scheduling group creation exception safety' from Michael Litvak scylla-gdb.py adjusted for scheduling_group_specific data structure changes in Seastar. As part of that, a gratuitous dereference of std::unique_ptr, which fails for std::unique_ptr<void*, ...>, was removed.	2025-02-03 00:10:38 +02:00
Botond Dénes	b70dccb638	sstables: disk_types: disk_set_of_tagged_union: boost::variant -> std::variant In the spirit of using standard-library types, instead of boost ones where possible. Although a disk type, it is serialized/deserialized with custom code, so the change shouldn't cause any changes in the disk representation.	2025-01-27 09:29:26 -05:00
Botond Dénes	a095b3bb80	scylla-gdb.py: std_variant: fix get() It calls self.get_with_type() with one too many params.	2025-01-27 09:29:26 -05:00
Piotr Dulikowski	7383013f43	replica/database: add reader concurrency semaphore groups Replace the reader concurrency semaphores for user reads and view updates with the newly introduced reader concurrency semaphore group, which assigns a semaphore for each service level. Each group is statically assigned to some pool of memory on startup and dynamically distribute this memory between the semaphores, relative to the number of shares of the corresponding scheduling group. The intent of having a separate reader concurrency semaphore for each scheduling group is to prevent priority inversion issues due to reads with different priorities waiting on the same semaphore, as well as make memory allocation more fair between service levels due to the adjusted number of shares.	2025-01-02 07:13:34 +01:00
Botond Dénes	e942c074f2	compaction/compaction_manager: make _tasks an intrusive list _tasks is currently std::list<shared_ptr<compaction_task_executor>>, but it has no role in keeping the instances alive, this is done by the fibers which create the task (and pin a shared ptr instance). This lends itself to an intrusive list, avoiding that extra allocation upon push_back(). Using an intrusive list also makes it simpler and much cheaper (O(1) vs. O(N)) to remove tasks from the _tasks list. This will be made use of in the next patch. Code using _task has to be updated because the value_type changes from shared_ptr<compaction_task_executor> to compaction_task_executor&.	2024-11-03 10:17:11 +02:00
Avi Kivity	b9df3aec12	gdb: avoid @classmethod/@property combinations The @classmethod/@property combination was deprecated in Python 3.11 and removed[1] in Python 3.13. It's used in scylla-gdb.py, breaking it with Python 3.13. To fix, just make all users (size_t and _vptr_type) top-level functions. The definitions are all identical and don't need to be in class scope. [1] https://docs.python.org/3.13/library/functions.html#classmethod Closes scylladb/scylladb#21349	2024-10-29 19:37:07 +02:00
Wojciech Mitros	242079d70b	mv: add a dedicated read concurrency semaphore for view update read before writes When writing to some tables with materialized views, we need to read from the base table first to perform a delete of the old view row. When doing so, the memory used for the read is tracked by the user read concurrency semaphore. When we have a large number of such reads, we may use up all of the semaphore units, causing the following reads to be queued. When we have some user reads coming at the same time, these reads can have very high latency due to the write workload on the base table. We want to avoid this, so that the write workload doesn't have a high impact on the latency of the read workload. This is fixed in this patch by adding a separate read concurrency semaphore just for view update read-before-writes. With the new semaphore, even if there are many view update read-before-writes, they will be queued on a different semaphore than the user reads, and they won't impact their latency. The second issue fixed by this patch is the concurrency of the view updates that is currently unlimited. Because of that view updates may take up so much memory that they we may run out of memory. This is fixed by using the read admission on the view update concurrency semaphore. This limits the number of concurrent view update reads to max_count_concurrent_view_update_reads, all other incoming view update reads are queued using just a small chunk of memory. Without this, the reads would also get queued after exceeding view_update_reader_concurrency_semaphore_serialize_limit_multiplier, but they would take much more memory while staying in the queue. The new semaphore has half the capacity of the regular user read concurrency semahpore and is currently used only for user writes - is't used independently of the scheduling group on which we base the read semaphore selection, but we use a different code path for streaming (not database::do_apply) and we shouldn't have view updates in system writes or during compaction. Fixes https://github.com/scylladb/scylladb/issues/8873 Fixes https://github.com/scylladb/scylladb/issues/15805	2024-10-21 11:02:06 +02:00
Botond Dénes	38088daa1f	scylla-gdb.py: drop compatibility code for EOL releases Any release < 6.0 or < 2023.1 is EOL and need not be supported by scylla-gdb.py anymore. Remove compatibility code for these releases. Closes scylladb/scylladb#20918	2024-10-03 15:42:08 +03:00
Tomasz Grabiec	8e047e8fff	gdb: Add std::set wrapper Allows accessing std::set fields from gdb, e.g.: (gdb) python for e in std_set(_promoted_index._blocks): print(e) Closes scylladb/scylladb#20650	2024-09-20 08:24:15 +03:00
Tomasz Grabiec	e70ce4d6ed	gdb: Introduce "scylla sstable-dump-cached-index" command	2024-09-17 14:41:18 +02:00
Tomasz Grabiec	9f0eed263d	gdb: Introduce "scylla sstable-promoted-index" command	2024-09-17 14:41:13 +02:00
Tomasz Grabiec	2c463ead59	gdb: Fix range printer for singular ranges Before, it printed [x, +inf) instead of {x}	2024-09-17 14:30:28 +02:00
Kefu Chai	7dd63c891f	scylla-gdb.py: lazy-evaluate the constants instead of evaluating the constants in-class, accessing them via a cached class property. it would be handy if we could source `scylla-gdb.py` in `.gdbinit`, but this script accesses some symbols which are not available with a file being debugged. so when gdb fails to load init script: ``` Traceback (most recent call last): File "/home/kefu/dev/scylladb/scylla-gdb.py", line 167, in <module> class intrusive_slist: File "/home/kefu/dev/scylladb/scylla-gdb.py", line 168, in intrusive_slist size_t = gdb.lookup_type('size_t') ^^^^^^^^^^^^^^^^^^^^^^^^^ gdb.error: No type named size_t. ``` so we have to `file path/to/scylla` and then `source scylla-gdb.py` every time when we debug scylla or a seastar application, instead of loading `scylla-gdb.py` in `.gdbinit`. the reason is that the script access the debug symbols like `gdb.lookup_type('size_t')` in-class. so when the python interpreter reads the script, it evaluates this statement, but at that moment, the debug symbols are not loaded, so `source scylla-gdb.py` fails in `.gdbinit`. in this change, we transform all these class variables to cached property, so that they * are evaluated on-demand * are evaluated only once at most this addresses the pain at the expense of verbosity. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-30 17:05:29 +08:00
Kefu Chai	5cffb23aa3	scylla-gdb.py: use chunked_fifo to represent _sink._pending_io we switched from `circular_buffer` to `chunked_fifo` to present `io_sink::_pending_io` in the latest seastar now. to be prepared for this change, let's * add `chunked_fifo` class in `scylla-gdb.py`. * use `circular_buffer` as a fallback of `chunked_fifo`. instead of doing this the other way around, we try to send the message that the latest seastar uses `chunked_fifo`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20280	2024-08-27 08:44:56 +03:00
Botond Dénes	53a6ec05ed	Merge 'replica: remove rwlock for protecting iteration over storage group map' from Raphael "Raph" Carvalho rwlock was added to protect iterations against concurrent updates to the map. the updates can happen when allocating a new tablet replica or removing an old one (tablet cleanup). the rwlock is very problematic because it can result in topology changes blocked, as updating token metadata takes the exclusive lock, which is serialized with table wide ops like split / major / explicit flush (and those can take a long time). to get rid of the lock, we can copy the storage group map and guard individual groups with a gate (not a problem since map is expected to have a maximum of ~100 elements). so cleanup can close that gate (carefully closed after stopping individual groups such that migrations aren't blocked by long-running ops like major), and ongoing iterations (e.g. triggered by nodetool flush) can skip a group that was closed, as such a group is being migrated out. Fixes #18821. ``` WRITE ===== ./build/release/scylla perf-simple-query --smp 1 --memory 2G --initial-tablets 10 --tablets --write - BEFORE 65559.52 tps ( 59.6 allocs/op, 16.4 logallocs/op, 14.3 tasks/op, 52841 insns/op, 30946 cycles/op, 0 errors) 67408.05 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53018 insns/op, 30874 cycles/op, 0 errors) 67714.72 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53026 insns/op, 30881 cycles/op, 0 errors) 67825.57 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53015 insns/op, 30821 cycles/op, 0 errors) 67810.74 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53009 insns/op, 30828 cycles/op, 0 errors) throughput: mean=67263.72 standard-deviation=967.40 median=67714.72 median-absolute-deviation=547.02 maximum=67825.57 minimum=65559.52 instructions_per_op: mean=52981.61 standard-deviation=79.09 median=53014.96 median-absolute-deviation=36.54 maximum=53025.79 minimum=52840.56 cpu_cycles_per_op: mean=30869.90 standard-deviation=50.23 median=30874.06 median-absolute-deviation=42.11 maximum=30945.94 minimum=30820.89 - AFTER 65448.76 tps ( 59.5 allocs/op, 16.4 logallocs/op, 14.3 tasks/op, 52788 insns/op, 31013 cycles/op, 0 errors) 67290.83 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53025 insns/op, 30950 cycles/op, 0 errors) 67646.81 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53025 insns/op, 30909 cycles/op, 0 errors) 67565.90 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53058 insns/op, 30951 cycles/op, 0 errors) 67537.32 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 52983 insns/op, 30963 cycles/op, 0 errors) throughput: mean=67097.93 standard-deviation=931.44 median=67537.32 median-absolute-deviation=467.97 maximum=67646.81 minimum=65448.76 instructions_per_op: mean=52975.85 standard-deviation=108.07 median=53024.55 median-absolute-deviation=49.45 maximum=53057.99 minimum=52788.49 cpu_cycles_per_op: mean=30957.17 standard-deviation=37.43 median=30951.31 median-absolute-deviation=7.51 maximum=31013.01 minimum=30908.62 READ ===== ./build/release/scylla perf-simple-query --smp 1 --memory 2G --initial-tablets 10 --tablets - BEFORE 79423.36 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41840 insns/op, 26820 cycles/op, 0 errors) 81076.70 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41837 insns/op, 26583 cycles/op, 0 errors) 80927.36 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41829 insns/op, 26629 cycles/op, 0 errors) 80539.44 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41841 insns/op, 26735 cycles/op, 0 errors) 80793.10 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41864 insns/op, 26662 cycles/op, 0 errors) throughput: mean=80551.99 standard-deviation=661.12 median=80793.10 median-absolute-deviation=375.37 maximum=81076.70 minimum=79423.36 instructions_per_op: mean=41842.20 standard-deviation=13.26 median=41840.14 median-absolute-deviation=5.68 maximum=41864.50 minimum=41829.29 cpu_cycles_per_op: mean=26685.88 standard-deviation=93.31 median=26662.18 median-absolute-deviation=56.47 maximum=26820.08 minimum=26582.68 - AFTER 79464.70 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41799 insns/op, 26761 cycles/op, 0 errors) 80954.58 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41803 insns/op, 26605 cycles/op, 0 errors) 81160.90 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41811 insns/op, 26555 cycles/op, 0 errors) 81263.10 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41814 insns/op, 26527 cycles/op, 0 errors) 81162.97 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41806 insns/op, 26549 cycles/op, 0 errors) throughput: mean=80801.25 standard-deviation=755.54 median=81160.90 median-absolute-deviation=361.72 maximum=81263.10 minimum=79464.70 instructions_per_op: mean=41806.47 standard-deviation=5.85 median=41806.05 median-absolute-deviation=4.05 maximum=41813.86 minimum=41799.36 cpu_cycles_per_op: mean=26599.22 standard-deviation=94.84 median=26554.54 median-absolute-deviation=50.51 maximum=26761.06 minimum=26527.05 ``` Closes scylladb/scylladb#19469 * github.com:scylladb/scylladb: replica: remove rwlock for protecting iteration over storage group map replica: get rid of fragile compaction group intrusive list	2024-07-12 15:45:36 +03:00
Michał Chojnowski	fdd8b03d4b	scylla-gdb.py: add $coro_frame() Adds a convenience function for inspecting the coroutine frame of a given seastar task. Short example of extracting a coroutine argument: ``` (gdb) p $coro_frame(seastar::local_engine->_current_task) $1 = { __resume_fn = 0x2485f80 <sstables::parse(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::statistics&)>, ... PointerType_7 = 0x601008e67880, ... __coro_index = 0 '\000' ... (gdb) p $downcast_vptr($->PointerType_7) $2 = (schema ) 0x601008e67880 ``` Closes scylladb/scylladb#19479	2024-07-10 21:46:27 +03:00
Raphael S. Carvalho	c539b7c861	replica: remove rwlock for protecting iteration over storage group map rwlock was added to protect iterations against concurrent updates to the map. the updates can happen when allocating a new tablet replica or removing an old one (tablet cleanup). the rwlock is very problematic because it can result in topology changes blocked, as updating token metadata takes the exclusive lock, which is serialized with table wide ops like split / major / explicit flush (and those can take a long time). to get rid of the lock, we can copy the storage group map and guard individual groups with a gate (not a problem since map is expected to have a maximum of ~100 elements). so cleanup can close that gate (carefully closed after stopping individual groups such that migrations aren't blocked by long-running ops like major), and ongoing iterations (e.g. triggered by nodetool flush) can skip a group that was closed, as such a group is being migrated out. Check documentation added to compaction_group.hh to understand how concurrent iterations and updates to the map work without the rwlock. Yielding variants that iterate over groups are no longer returning group id since id stability can no longer be guaranteed without serializing split finalization and iteration. Fixes #18821. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-07-09 16:59:24 -03:00
Botond Dénes	9544c364be	scylla-gdb.py: introduce scylla large-objects The equivalent of small-objects, but for large objects (spans). Allows listing object of a large-class, and therefore investigating a run-away class, by attempting to identify the owners of the objects in it. Written to investigate #16493 Closes scylladb/scylladb#16711	2024-07-09 10:21:09 +03:00
Michał Chojnowski	c7dc3b9b58	scylla-gdb.py: add line information to coroutine names in `scylla fiber` For convenience. Note that this line info only points to the function as a whole, not to the current suspend point. I think there's no facility for converting the `__coro_index` to the current suspend point automatically. Before: ``` (gdb) scylla fiber seastar::local_engine->_current_task [shard 1] #0 (task) 0x0000601008e8e970 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (.resume is seastar::future<void> sstables::parse<unsigned int, std::pair<sstables::metadata_type, unsigned int> >(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::disk_array<unsigned int, std::pair<sstables::metadata_type, unsigned int> >&) [clone .resume] ) [shard 1] #1 (task) 0x00006010092acf10 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (.resume is sstables::parse(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::statistics&) [clone .resume] ) [shard 1] #2 (task) 0x0000601008e648d0 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (.resume is sstables::sstable::read_simple<(sstables::component_type)8, sstables::statistics>(sstables::statistics&)::{lambda(sstables::sstable_version_types, seastar::file&&, unsigned long)#1}::operator()(sstables::sstable_version_types, seastar::file&&, unsigned long) const [clone .resume] ) ``` After: ``` (gdb) scylla fiber seastar::local_engine->_current_task [shard 1] #0 (task) 0x0000601008e8e970 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (sstables::parse<unsigned int, std::pair<sstables::metadata_type, unsigned int> >(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::disk_array<unsigned int, std::pair<sstables::metadata_type, unsigned int> >&) at sstables/sstables.cc:352) [shard 1] #1 (task) 0x00006010092acf10 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (sstables::parse(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::statistics&) at sstables/sstables.cc:570) [shard 1] #2 (task) 0x0000601008e648d0 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (sstables::sstable::read_simple<(sstables::component_type)8, sstables::statistics>(sstables::statistics&)::{lambda(sstables::sstable_version_types, seastar::file&&, unsigned long)#1}::operator()(sstables::sstable_version_types, seastar::file&&, unsigned long) const at sstables/sstables.cc:992) ``` Closes scylladb/scylladb#19478	2024-06-25 13:55:10 +03:00
Avi Kivity	fdc1449392	treewide: rename flat_mutation_reader_v2 to mutation_reader flat_mutation_reader_v2 was introduced in a pair of commits in 2021: `e3309322c3` "Clone flat_mutation_reader related classes into v2 variants" `08b5773c12` "Adapt flat_mutation_reader_v2 to the new version of the API" as a replacement for flat_mutation_reader, using range_tombstone_change instead of range_tombstone to represent represent range tombstones. See those commits for more information. The transition was incremental; the last use of the original flat_mutation_reader was removed in 2022 in commit `026f8cc1e7` "db: Use mutation_partition_v2 in mvcc" In turn, flat_mutation_reader was introduced in 2017 in commit `748205ca75` "Introduce flat_mutation_reader" To transition from a mutation_reader that nested rows within a partition in a separate stream, to a flat reader that streamed partitions and rows in the same stream. Here, we reclaim the original name and rename the awkward flat_mutation_reader_v2 to mutation_reader. Note that mutation_fragment_v2 remains since we still use the original for compatibilty, sometimes. Some notes about the transition: - files were also renamed. In one case (flat_mutation_reader_test.cc), the rename target already existed, so we rename to mutation_reader_another_test.cc. - a namespace 'mutation_reader' with two definitions existed (in mutation_reader_fwd.hh). Its contents was folded into the mutation_reader class. As a result, a few #includes had to be adjusted. Closes scylladb/scylladb#19356	2024-06-21 07:12:06 +03:00
Michał Chojnowski	c901139d07	scylla-gdb.py: print coroutine names in `scylla fiber` Enriches the output of `scylla fiber` with resolved names of coroutine resume functions. Before: ``` [shard 2] #0 (task) 0x0000602004c9fbf0 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 [shard 2] #1 (task) 0x0000602000344c90 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 [shard 2] #2 (task) 0x0000602004b30c50 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 ``` After: ``` [shard 2] #0 (task) 0x0000602004c9fbf0 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (.resume is seastar::future<void> sstables::parse<unsigned int, std::pair<sstables::metadata_type, unsigned int> >(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::disk_array<unsigned int, std::pair<sstables::metadata_type, unsigned int> >&) [clone .resume] ) [shard 2] #1 (task) 0x0000602000344c90 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (.resume is sstables::parse(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::statistics&) [clone .resume] ) [shard 2] #2 (task) 0x0000602004b30c50 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (.resume is sstables::sstable::read_simple<(sstables::component_type)8, sstables::statistics>(sstables::statistics&)::{lambda(sstables::sstable_version_types, seastar::file&&, unsigned long)#1}::operator()(sstables::sstable_version_types, seastar::file&&, unsigned long) const [clone .resume] ) ``` Closes scylladb/scylladb#19091	2024-06-04 22:32:17 +03:00
Marcin Maliszkiewicz	2ab143fb40	db: auth: move auth tables to system keyspace Separate keyspace which also behaves as system brings little benefit while creating some compatibility problems like schema digest mismatch during rollback. So we decided to move auth tables into system keyspace. Fixes https://github.com/scylladb/scylladb/issues/18098 Closes scylladb/scylladb#18769	2024-05-26 22:30:42 +03:00
Tomasz Grabiec	4d84451cf1	sstables, gdb: Track readers in a linked list For the purpose of scylla-gdb.py command "scylla active-sstables". Before the patch, readers were located by scanning the heap for live objects with vtable pointers corresponding to readers. It was observed that the test scylla_gdb/test_misc.py::test_active_sstables started failing like this: gdb.error: Error occurred in Python: Cannot access memory at address 0x300000000000000 This could be explained by there being a live object on the heap which used to be a reader but now is a different object, and the _sst field contains some other data which is not a pointer. To fix, track readers explicitly in a linked list so that the gdb script can reliably walk readers. Fixes #18618.	2024-05-16 00:28:46 +02:00
Lakshmi Narayanan Sreethar	3ef2f79d14	sstable: renamed intrusive list link type Renamed the intrusive list link type to differentiate it from the set link type that will be added in an upcoming patch. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Kefu Chai	0b0e661a85	build: bring abseil submodule back because of https://bugzilla.redhat.com/show_bug.cgi?id=2278689, the rebuilt abseil package provided by fedora has different settings than the ones if the tree is built with the sanitizer enabled. this inconsistency leads to a crash. to address this problem, we have to reinstate the abseil submodule, so we can built it with the same compiler options with which we build the tree. in this change * Revert "build: drop abseil submodule, replace with distribution abseil" * update CMake building system with abseil header include settings * bump up the abseil submodule to the latest LTS branch of abseil: lts_2024_01_16 * update scylla-gdb.py to adapt to the new structure of flat_hash_map This reverts commit `8635d24424`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18511	2024-05-05 23:31:09 +03:00
Kefu Chai	3b50c39a83	scylla-gdb: access io_queue::_streams and io_queue::_fgs with static_vector in seastar's b28342fa5a301de3facf5e83dc691524a6b20604, we switched * `io_queue::_streams` from `boost::container::small_vector<fair_queue, 2>` to `boost::container::static_vector<fair_queue, 2>` * `io_queue::_fgs` from `std::vector<std::unique_ptr<fair_group>>` to `boost::container::static_vector<fair_group, 2>` so we need to update the gdb script accordingly to reflect this change, and to avoid the nested try-except blocks, we switch to a `while` statement to simplify the code structure. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18165	2024-04-04 11:39:10 +03:00
Kefu Chai	50c6fc1141	scylla-gdb: use current_scheduling_group_ptr instead of task_queue._current Seastar removed `task_queue::_current` in 258b11220d343d8c7ae1a2ab056fb5e202723cc8 . let's adapt scylla-gdb.py accordingly. despite that `current_scheduling_group_ptr()` is an internal API, it's been around for a while, and relatively stable. so let's use it instead. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17720	2024-03-11 13:13:59 +02:00
Nadav Har'El	a36c8b28dd	Merge 'scylla-gdb.py: fixes warnings raised by flake8' from Kefu Chai this changeset addresses some warnings raised by flake8 in hope to improve the readability of this script in general. Closes scylladb/scylladb#17668 * github.com:scylladb/scylladb: scylla-gdb: s/if not foo is None/if foo is not None/ scylla-gdb.py: add space after keyword scylla-gdb.py: remove extraneous spaces scylla-gdb.py: use 2 empty lines between top-level funcs/classes scylla-gdb.py: replace <tab> with 4 spaces scylla-gdb: fix the indent	2024-03-07 10:41:15 +02:00
Kefu Chai	4f8b618be7	scylla-gdb: s/if not foo is None/if foo is not None/ more readable this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 13:46:38 +08:00
Kefu Chai	643a6d5bda	scylla-gdb.py: add space after keyword it'd be more pythonic to just put an expression after `assert`, instead of quoting it with a pair of parenthesis. and there is no need to add `;` after `break`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 13:46:38 +08:00
Kefu Chai	8c65f92f1f	scylla-gdb.py: remove extraneous spaces Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 13:46:38 +08:00
Kefu Chai	12c06c39c3	scylla-gdb.py: use 2 empty lines between top-level funcs/classes and 1 empty line for nested functions/classes, to be more PEP8 compliant. for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 13:46:38 +08:00
Kefu Chai	8e3b22c76a	scylla-gdb.py: replace <tab> with 4 spaces do not mix tab and spaces for indent Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 13:46:38 +08:00
Kefu Chai	c4b679fe3b	scylla-gdb: fix the indent indent should be multiple of 4 spaces. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 13:46:38 +08:00
Avi Kivity	c5f01349b1	Merge 'Add specialized tablet_sstable_set' from Benny Halevy Make a specialized sstable_set for tablets via tablet_storage_group_manager::make_sstable_set. This sstable set takes a snapshot of the storage_groups (compound) sstable_sets and maps the selected tokens directly into the tablet compound_sstable_set. This sstable_set provides much more efficient access to the table's sstable sets as it takes advantage of the disjointness of sstable sets between tablets/storage_groups, and making it is cheaper that rebuilding a complete partitioned_sstable_set from all sstables in the table. Fixes #16876 Cassandra-stress setup: ``` $ sudo cpupower frequency-set -g userspace $ build/release/scylla (developer-mode options) --smp=16 --memory=8G --experimental-features=consistent-topology-changes --experimental-features=tablets cqlsh> CREATE KEYSPACE keyspace1 WITH replication={'class':'NetworkTopologyStrategy', 'replication_factor':1} AND tablets={'initial':2048}; $ ./tools/java/tools/bin/cassandra-stress write no-warmup n=10000000 -pop 'seq=1...10000000' -rate threads=128 $ scylla-api-client system drop_sstable_caches POST $ ./tools/java/tools/bin/cassandra-stress read no-warmup duration=60s -pop 'dist=uniform(1..10000000)' -rate threads=128 $ scylla-api-client system drop_sstable_caches POST $ ./tools/java/tools/bin/cassandra-stress mixed no-warmup duration=60s -pop 'dist=uniform(1..10000000)' -rate threads=128 ``` Baseline (`0a7854ea4d`) vs. fix (`0c2c00f01b`) Throughput (op/s): workload \| baseline \| fix ---------\|----------\|---------- write \| 76,806 \| 100,787 read \| 34,330 \| 106,099 mixed \| 32,195 \| 79,246 Closes scylladb/scylladb#17149 * github.com:scylladb/scylladb: table: tablet_storage_group_manager: make tablet_sstable_set storage_group_manager: add make_sstable_set tablet_storage_group_manager: handle_tablet_split_completion: pre-calc new_tablet_count table: tablet_storage_group_manager: storage_group_of: do not validate in release build mode table: move compaction_group_list and storage_group_vector to storage_group_manager compaction_group::table_state: get_group_id: become self-sufficient compaction_group, table: make_compound_sstable_set: declare as const tablet_storage_group_manager: precalculate my_host_id and _tablet_map table: coroutinize update_effective_replication_map	2024-03-06 23:59:39 +02:00
Benny Halevy	7f203f0551	table: move compaction_group_list and storage_group_vector to storage_group_manager So the storage_group_manager can be used later by table_sstable_set. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-06 10:35:33 +02:00
Marcin Maliszkiewicz	9144d8203b	db: add system_auth_v2 keyspace New keyspace is added similarly as system_schema keyspace, it's being registred via system_keyspace::make which calls all_tables to build its schema. Dummy table 'roles' is added as keyspaces are being currently registered by walking through their tables. Full table schemas will be added in subsequent commits. Change can be observed via cqlsh: cassandra@cqlsh> describe keyspaces; system_auth_v2 system_schema system system_distributed_everywhere system_auth system_distributed system_traces cassandra@cqlsh> describe keyspace system_auth_v2; CREATE KEYSPACE system_auth_v2 WITH replication = {'class': 'LocalStrategy'} AND durable_writes = true; CREATE TABLE system_auth_v2.roles ( role text PRIMARY KEY ) WITH bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'} AND comment = 'comment' AND compaction = {'class': 'SizeTieredCompactionStrategy'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.0 AND default_time_to_live = 0 AND gc_grace_seconds = 604800 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE';	2024-03-01 10:40:29 +01:00
Nadav Har'El	b0233c0833	Merge 'interval: rename nonwrapping_interval to interval' from Avi Kivity Our interval template started life as `range`, and was supported wrapping to follow Cassandra's convention of wrapping around the maximum token. We later recognized that an interval type should usually be non-wrapping and split it into wrapping_range and nonwrapping_range, with `range` aliasing wrapping_range to preserve compatibility. Even later, we realized the name was already taken by C++ ranges and so renamed it to `interval`. Given that intervals are usually non-wrapping, the default `interval` type is non-wrapping. We can now simplify it further, recognizing that everyone assumes that an interval is non-wrapping and so doesn't need the nonwrapping_interval_designation. We just rename nonwrapping_interval to `interval` and remove the type alias. Closes scylladb/scylladb#17455 * github.com:scylladb/scylladb: interval: rename nonwrapping_interval to interval interval: rename interval_test to wrapping_interval_test	2024-02-22 14:03:43 +02:00
Kefu Chai	9ee728dab9	scylla-gdb: use raw string when '\' is not used in an escape sequence when '\' does not start an escape sequence, Python complains at seeing it. but it continues anyway by considering '\' as a separate char. but the warning message is still annoying: ``` scylla-gdb.py: 2417: SyntaxWarning: invalid escape sequence '\-' branches = (r" \|-- ", " \-- ") ``` when sourcing this script. so, let's mark these strings as raw strings. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17466	2024-02-22 09:03:26 +02:00
Avi Kivity	51df8b9173	interval: rename nonwrapping_interval to interval Our interval template started life as `range`, and was supported wrapping to follow Cassandra's convention of wrapping around the maximum token. We later recognized that an interval type should usually be non-wrapping and split it into wrapping_range and nonwrapping_range, with `range` aliasing wrapping_range to preserve compatibility. Even later, we realized the name was already taken by C++ ranges and so renamed it to `interval`. Given that intervals are usually non-wrapping, the default `interval` type is non-wrapping. We can now simplify it further, recognizing that everyone assumes that an interval is non-wrapping and so doesn't need the nonwrapping_interval_designation. We just rename nonwrapping_interval to `interval` and remove the type alias.	2024-02-21 19:43:17 +02:00
Michał Chojnowski	5a3e4a1cc0	utils: managed_bytes: optimize memory usage for small buffers managed_bytes is implemented as chain of blob_storage objects. Each blob_storage contains 24 bytes of metadata. But in the most common case -- when there is only a single element in the chain -- 16 bytes of this metadata is trivial/unused. This is regrettable waste because managed_bytes is used for every database cell in the memtables and cache. It means that every value of size >= 7 bytes (smaller ones fit in the inline storage of managed_bytes) receives 16 bytes of useless overhead. To correct that, this patch adds to managed_bytes an alternative storage layout -- used for buffers small enough to fit in one contiguous fragment -- which only stores the necessary minimum of metadata. (That is: a pointer to the parent, to facilitate moving the storage during memory defragmentation).	2024-02-09 20:56:20 +01:00
Lakshmi Narayanan Sreethar	76f0d5e35b	reader_permit: store schema_ptr instead of raw schema pointer Store schema_ptr in reader permit instead of storing a const pointer to schema to ensure that the schema doesn't get changed elsewhere when the permit is holding on to it. Also update the constructors and all the relevant callers to pass down schema_ptr instead of a raw pointer. Fixes #16180 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#16658	2024-01-11 08:37:56 +02:00
Benny Halevy	cdd5605d81	gms: endpoint_state: change application_state_map to std::unordered_map State changes are processed as a batch and there is no reason to maintain them as an ordered map. Instead, use a std::unordered_map that is more efficient. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Raphael S. Carvalho	15de1cdcbc	replica: Introduce concept of storage group Storage group is the storage of tablets. This new concept is helpful for tablet splitting, where the storage of tablet will be split in multiple compaction groups, where each can be compacted independently. The reason for not going with arena concept is that it added complexity, and it felt much more elegant to keep compaction group unchanged which at the end of the day abstracts the concept of a set of sstables that can be compacted and operated independently. When splitting, the storage group for a tablet may therefore own multiple compaction groups, left, right, and main, where main keeps the data that needs splitting. When splitting completes, only left and right compaction groups will be populated. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Avi Kivity	2b8392b8b8	Merge 'database, reader_concurrency_semaphore: deduplicate reader_concurrency_semaphore metrics ' from Botond Dénes Reduce code duplication by defining each metric just once, instead of three times, by having the semaphore register metrics by itself. This also makes the lifecycle of metrics contained in that of the semaphore. This is important on enterprise where semaphores are added and removed, together with service levels. We don't want all semaphores to export metrics, so a new parameter is introduced and all call-sites make a call whether they opt-in or not. Fixes: https://github.com/scylladb/scylladb/issues/16402 Closes scylladb/scylladb#16383 * github.com:scylladb/scylladb: database, reader_concurrency_sempaphore: deduplicate reader_concurrency_sempaphore metrics reader_concurrency_semaphore: add register_metrics constructor parameter sstables: name sstables_manager	2023-12-14 18:26:24 +02:00
Avi Kivity	7fce057cda	database, reader_concurrency_sempaphore: deduplicate reader_concurrency_sempaphore metrics reader_concurrency_sempaphore are triplicated: each metrics is registered for streaming, user, and system classes. To fix, just move the metrics registration from database to reader_concurrency_sempaphore, so each reader_concurrency_sempaphore instantiated will register its metrics (if its creator asked for it). Adjust the names given to reader_concurrency_sempaphore so we don't change the labels. scylla-gdb is adjusted to support the new names.	2023-12-13 09:16:18 -05:00
Yaniv Kaul	0b0a3ee7fc	Typos: fix typos in code Last batch, hopefully, sing codespell, went over the docs and fixed some typos. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#16388	2023-12-13 10:45:21 +02:00
Pavel Emelyanov	0c69a312db	Update seastar submodule * seastar bab1625c...17183ed4 (73): > thread_pool: Reference reactor, not point to > sstring: inherit publicly from string_view formatter > circleci: use conditional steps > weak_ptr: include used header > build: disable the -Wunused-* warnings for checkheaders > resource: move variable into smaller lexical scope > resource: use structured binding when appropriate > httpd: Added server and client addresses to request structure > io_queue: do not dereference moved-away shared pointer > treewide: explicitly define ctor and assignment operator > memory: use `err` for the error string > doc: Add document describing all the math behind IO scheduler > io_queue: Add flow-rate based self slowdown backlink > io_queue: Make main throttler uncapped > io_queue: Add queue-wide metrics > io_queue: Introduce "flow monitor" > io_queue: Count total number of dispatched and completed requests so far > io_queue: Introduce io_group::io_latency_goal() > tests: test the vector overload for when_all_succeed > core: add a vector overload to when_all_succeed > loop: Fix iterator_range_estimate_vector_capacity for random iters > loop: Add test for iterator_range_estimate_vector_capacity > core/posix return old behaviour using non-portable pthread_attr_setaffinity_np when present > memory: s/throw()/noexcept/ > build: enable -Wdeprecated compiler option > reactor: mark kernel_completion's dtor protected > tests: always wait for promise > http, json, net: define-generated copy ctor for polymorphic types > treewide: do not define constexpr static out-of-line > reactor: do not define dtor of kernel_completion > http/exception: stop using dynamic exception specification > metrics: replace vector with deque > metrics: change metadata vector to deque > utils/backtrace.hh: make simple_backtrace formattable > reactor: Unfriend disk_config_params > reactor: Move add_to_flush_poller() to internal namespace > reactor: Unfriend a bunch of sched group template calls > rpc_test: Test rpc send glitches > net: Implement batch flush support for existing sockets > iostream: Configure batch flushes if sink can do it > net: Added remote address accessors > circleci: update the image to CircleCI "standard" image > build: do not add header check target if no headers to check > build: pass target name to seastar_check_self_contained > build: detect glibc features using CMake > build: extract bits checking libc into CheckLibc.cmake > http/exception: add formatter for httpd::base_exception > http/client: Mark write_body() const > http/client: Introduce request::_bytes_written > http/client: Mark maybe_wait_for_continue() const > http/client: Mark send_request_head() const > http/client: Detach setup_request() > http/api_docs: copy in api_docs's copy constructor > script: do not inherit from object > scripts: addr2line: change StdinBacktraceIterator to a function > scripts: addr2line: use yield instead defining a class > tests: skip tests that require backtrace if execinfo.h is not found > backtrace: check for existence of execinfo.h > core: use ino_t and off_t as glibc sets these to 64bit if 64bit api is used > core: add sleep_abortable instantiation for manual_clock > tls: Return EPIPE exception when writing to shutdown socket > http/client: Don't cache connection if server advertises it > http/client: Mark connection as "keep in cache" > core: fix strerror_r usage from glibc extension > reactor: access sigevent.sigev_notify_thread_id with a macro > posix: use pthread_setaffinity_np instead of pthread_attr_setaffinity_np > reactor: replace __mode_t with mode_t > reactor: change sys/poll.h to posix poll.h > rpc: Add unit test for per-domain metrics > rpc: Report client connections metrics > rpc: Count dead client stats > rpc: Add seastar::rpc::metrics > rpc: Make public queues length getters io-scheduler fixes refs: #15312 refs: #11805 http client fixes refs: #13736 refs: #15509 rpc fixes refs: #15462 Closes scylladb/scylladb#15774	2023-10-19 20:52:37 +03:00
Benny Halevy	d00e49a1bb	gossiper: keep and serve shared endpoint_state_ptr in map This commit changes the interface to using endpoint_state_ptr = lw_shared_ptr<const endpoint_state> so that users can get a snapshot of the endpoint_state that they must not modify in-place anyhow. While internally, gossiper still has the legacy helpers to manage the endpoint_state. Fixes scylladb/scylladb#14799 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-31 09:34:36 +03:00
Pavel Emelyanov	5c95b1cb7f	scylla-gdb: Remove _cost_capacity from fair-group debug This field is about to be removed in newer seastar, so it shouldn't be checked in scylla-gdb (see also `ae6fdf1599`) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #15203	2023-08-29 16:13:50 +03:00

1 2 3 4 5 ...

454 Commits