scylla

Author	SHA1	Message	Date
Tomasz Grabiec	fbc6076e6a	storage_service, tablets: Move get_leaving_replica() to tablets.cc For better encapsulation of tablet-specific code.	2023-07-31 01:45:23 +02:00
Tomasz Grabiec	18a59ab5ff	locator: tablets: Move std::hash definition earlier Will be needed in order to define a struct which has unordered_set<tablet_replica> as a field.	2023-07-31 01:45:23 +02:00
Tomasz Grabiec	889f2ceb1e	storage_service: Advance tablets independently This change makes the topology state machine advance each tablet independently which allows them to finish migrations at different speeds, not at the speed of the slowest tablet. It will also open the possibility of starting new transitions concurrently with already active ones. This is implemented by having a single transition state "tablet migration", and handling it by scanning all the transitions and advancing tablet state machines. Updates and barriers are batched for all tablets in each cycle. One complication is the tracking of streaming sessions. The operations are no longer nested in the scope of a single handle method, and cannot be waited on explicitly, as that would inhibit progress of the coordinator, which starts later migrations. They live as independent fibers, which associated with tablets in a transient data structure which lives within the coordinator instance. This data structure is consulted for a given tablet in each cycle of the handle_tablet_migration() pump to check if streaming has finished and we can move the tablet to the next stage. If the pump has no work, only then it waits for any streaming to finish by blocking on the _topo_sm.event.	2023-07-31 01:45:23 +02:00
Tomasz Grabiec	2811b1df0a	topology_coordinator: Fix missed notification on abort If _as is aborted while the coordinator is in the middle of handling, and decides to go to sleep, it may go to sleep without noticing that it was aborted. Fix by checking before blocking on the condition variable. In general, every condition which can cause signal() should be checked before when(). This patch doesn't fix all the cases. For example, signal() can be called when there arrives a new topology request. This can happen after the coordinator checked because it releases the guard before calling when().	2023-07-31 01:45:23 +02:00
Tomasz Grabiec	e338679266	tablets: Add formatter for tablet_migration_info	2023-07-31 01:45:23 +02:00
Avi Kivity	accd6271bc	Merge 'tools: introduce tool_app_template and migrate all tools to it' from Botond Dénes The scaffolding required to have a working scylla tool app, is considerable, leading to a large amount of boilerplate code in each such app. This logic is also very similar across the two tool apps we have and would presumably be very similar in any future app. This PR extracts this logic into `tools/utils.hh` and introduces `tool_app_template`, which is similar to `seastar::app_template` in that it centralizes all the option handling and more in a single class, that each tool has to just instantiate and then call `run()` to run the app. This cuts down on the repetition and boilerplate in our current tool apps and make prototyping new tool apps much easier. Closes #14855 * github.com:scylladb/scylladb: tools/utils.hh: remove unused headers tools/utils: make get_selected_operation() and configure_tool_mode() private tools/utils.hh: de-template get_selected_operation() tools/scylla-types: migrate to tools_app_template tools/scylla-types: prepare for migration to tool_app_template tools/scylla-sstable.cc: fix indentation tools/scylla-sstables: migrate to tool_app_template tools/scylla-sstables: prepare for migration to tool_app_template tools: extract tool app skeleton to utils.hh	2023-07-30 18:31:10 +03:00
Pavel Emelyanov	b8d1c7fc0b	sstables-format-selector: Add and use system_keyspace dependency The selector keeps selected format in system.local and uses static db::system_keyspace::(get\|set)_scylla_local_param() helpers to access it. The helpers are turning into non-static so the selector should call those on system_keyspace object, not class Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #14871	2023-07-30 18:12:16 +03:00
Avi Kivity	1c3d22b717	build: update frozen toolchain to Fedora 38 This refreshes clang to 16.0.6 and libstdc++ to 13.1.1. compiler-rt, libasan, and libubsan are added to install-dependencies.sh since they are no longer pulled in as depdendencies. Closes #13730	2023-07-30 03:08:48 +03:00
Avi Kivity	14dee7a946	Revert "build: build with -O0 if Clang >= 16 is used" This reverts commit `fb05fddd7d`. After `1554b5cb61` ("Update seastar submodule"), which fixed a coroutine bug in Seastar, it is no longer necessary. Also revert the related "build: drop the warning on -O0 might fail tests" (`894039d444`).	2023-07-29 08:07:04 +03:00
Avi Kivity	1554b5cb61	Update seastar submodule * seastar c0e618bbb...0784da876 (11): > Revert "metrics: Remove registered_metric::operator()" > build: use new behavior defined by CMP0127 > build: pass -DBOOST_NO_CXX98_FUNCTION_BASE to C++ compiler > coroutine: fix a use-after-free in maybe_yield Ref #13730. > Merge 'sstring: add more accessors' from Kefu Chai > Merge 'semaphore: semaphore_units: return units when reassigned' from Benny Halevy > metrics: do not define defaulted copy assignment operator > HTTP headers in http_response are now case insensitive > rpc: Make server._proto a reference > Merge 'Cleanup class metrics::registered_metrics' from Pavel Emelyanov > core: undefine fallthrough to fix compilation error Closes #14862	2023-07-28 23:45:30 +03:00
Tomasz Grabiec	4e9d95d78c	Merge 'Compact data before streaming' from Botond Dénes Currently, streaming and repair processes and sends data as-is. This is wasteful: streaming might be sending data which is expired or covered by tombstones, taking up valuable bandwidth and processing time. Repair additionally could be exposed to artificial differences, due to different nodes being in different states of compactness. This PR adds opt-in compaction to `make_streaming_reader()`, then opts in all users. The main difference being in how these choose the current compaction time to use: * Load'n'stream and streaming uses the current time on the local node. * Repair uses a centrally chosen compaction time, generated on the repair master and propagated to al repair followers. This is to ensure all repair participants work with the exact state of compactness. Importantly, this compaction does not purge tombstones (tombstone GC is disabled completely). Fixes: https://github.com/scylladb/scylladb/issues/3561 Closes #14756 * github.com:scylladb/scylladb: replica: make_[multishard_]streaming_reader(): make compaction_time mandatory repair/row_level: opt in to compacting the stream streaming: opt-in to compacting the stream sstables_loader: opt-in for compacting the stream replica/table: add optional compacting to make_multishard_streaming_reader() replica/table: add optional compacting to make_streaming_reader() db/config: add config item for enabling compaction for streaming and repair repair: log the error which caused the repair to fail readers: compacting_reader: use compact_mutation_state::abandon_current_partition() mutation/mutation_compactor: allow user to abandon current partition	2023-07-28 16:42:13 +02:00
Pavel Emelyanov	24fdd4297b	schema_tables: Use query_processor argument in save_system_schema() ... instead of global qctx. The now used qctx->execute_cql() just calls the query_processor::execute_internal with cache_internal::yes Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #14874	2023-07-28 15:55:16 +02:00
Kefu Chai	cc2bbde8f1	test: use BOOST_CHECK_EQUAL when appropriate in compaction_manager_basic_test compaction_manager_basic_test checks the stats of compaction_manager to verify that there are no ongoing or pending compactions after the triggering the compaction and waiting for its completion. but in #14865, there are still active compaction(s) after the compaction_manager's stats shows there is at least one task completed. to understand this issue better, let's use `BOOST_CHECK_EQUAL()` instead of `BOOST_REQUIRE()`, so that the test does not error out when the check fails, and we can have better understanding of the status when the test fails. Refs #14865 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14872	2023-07-28 15:45:07 +03:00
Botond Dénes	1eca60fe10	tools/utils.hh: remove unused headers	2023-07-28 08:41:34 -04:00
Botond Dénes	cbcb20f0f9	tools/utils: make get_selected_operation() and configure_tool_mode() private Their only user is in tools/utils.cc, so move them there, into an anonymous namespace.	2023-07-28 08:41:34 -04:00
Botond Dénes	fc0c87002c	tools/utils.hh: de-template get_selected_operation() It now has a single user, so it doesn't have to be a template. For now, make the method inline, so it can stay in the header. It will be moved to utils.cc in the next patch.	2023-07-28 08:41:16 -04:00
Botond Dénes	8caf258539	tools/scylla-types: migrate to tools_app_template Discard the locally coded app skeleton and reuse the tool app template instead. Reduces boilerplate greatly.	2023-07-28 08:30:53 -04:00
Botond Dénes	68a452be00	tools/scylla-types: prepare for migration to tool_app_template Make options more declarative and create a local reference to app.configuration() in the main lambda. To faciliate further patching.	2023-07-28 08:30:53 -04:00
Botond Dénes	7598c23359	tools/scylla-sstable.cc: fix indentation Broken in the previous patch.	2023-07-28 08:30:53 -04:00
Botond Dénes	d082622ab9	tools/scylla-sstables: migrate to tool_app_template Removing a great amount of boilerplate, streamlinging the main method.	2023-07-28 08:30:53 -04:00
Botond Dénes	092650b20b	tools/scylla-sstables: prepare for migration to tool_app_template Make options more declarative. To facilitate further patching.	2023-07-28 08:30:53 -04:00
Botond Dénes	89d7d80fce	tools: extract tool app skeleton to utils.hh The skeleton of the two existing scylla-native tools (scylla-types and scylla-sstable) is very similar. By skeleton, I mean all the boilerplate around creating and configuring a seastar::app_template, representing operations/command and their options, and presenting and selecting these. To facilitate code-sharing and quick development of any new tools, extract this skeleton from scylla-sstable.cc into tools/utils.hh, in the form of a new tool_app_template, which wraps a seastar::app_template and centralizes all the boilerplate logic in a single place. The extracted code is not a simple copy-paste, although many elements are simply copied. The original code is not removed yet.	2023-07-28 08:30:53 -04:00
Botond Dénes	3a51053e66	Merge 'De-static system_keyspace::_group0_ methods' from Pavel Emelyanov These are users of global `qctx` variable or call `(get\|set)_scylla_local_param(_as)?` which, in turn, also reference the `qctx`. Unfortunately, the latter(s) are still in use by other code and cannot be marked non-static in this PR Closes #14869 * github.com:scylladb/scylladb: system_keyspace: De-static set_raft_group0_id() system_keyspace: De-static get_raft_group0_id() system_keyspace: De-static get_last_group0_state_id() system_keyspace: De-static group0_history_contains() raft: Add system_keyspace argument to raft_group0::join_group0()	2023-07-28 14:53:22 +03:00
Kefu Chai	df041c7dc8	build: cmake: add missing source file TLS certificate authenticator registers itself using a `class_registrator`. that's why CMake is able to build without compiling this source file. but for the sake of completeness, and to be sync with configure.py, let's add it to CMake. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14866	2023-07-28 14:30:58 +03:00
Pavel Emelyanov	d311784721	system_keyspace: De-static set_raft_group0_id() The caller is group0 code with sys_ks local variable Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-07-28 13:13:59 +03:00
Pavel Emelyanov	7837bc7d5a	system_keyspace: De-static get_raft_group0_id() The callers are in group0 code that have sys_ks local variable/argument Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-07-28 13:13:11 +03:00
Pavel Emelyanov	26dd7985a8	system_keyspace: De-static get_last_group0_state_id() The caller is raft_group0_client with sys.ks. dependency reference and group0_state_machine with raft_group0_client exporing its sys.ks. This makes it possible to instantly drop one more qctx reference Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-07-28 13:12:04 +03:00
Pavel Emelyanov	3de0efd32c	system_keyspace: De-static group0_history_contains() The caller is raft_group0_client with sys.ks. dependency reference. This allows to drop one qctx reference right at once Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-07-28 13:11:08 +03:00
Pavel Emelyanov	0dbe83ce89	raft: Add system_keyspace argument to raft_group0::join_group0() The method will need one to access db::system_keyspace methods. The sys.ks. is at hand and in use in both callers Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-07-28 13:10:24 +03:00
Avi Kivity	d73a393670	main: increase Seastar reactor task quota in debug mode Debug mode is so slow that the work:poll ratio decreases, leading to even more slowness as more polling is done for the same amount of work. Increase the task quota to recover some performance. Ref #14752. Closes #14820	2023-07-28 10:34:18 +03:00
Avi Kivity	cf81eef370	Merge 'schema_mutations, migration_manager: Ignore empty partitions in per-table digest' from Tomasz Grabiec Schema digest is calculated by querying for mutations of all schema tables, then compacting them so that all tombstones in them are dropped. However, even if the mutation becomes empty after compaction, we still feed its partition key. If the same mutations were compacted prior to the query, because the tombstones expire, we won't get any mutation at all and won't feed the partition key. So schema digest will change once an empty partition of some schema table is compacted away. Tombstones expire 7 days after schema change which introduces them. If one of the nodes is restarted after that, it will compute a different table schema digest on boot. This may cause performance problems. When sending a request from coordinator to replica, the replica needs schema_ptr of exact schema version request by the coordinator. If it doesn't know that version, it will request it from the coordinator and perform a full schema merge. This adds latency to every such request. Schema versions which are not referenced are currently kept in cache for only 1 second, so if request flow has low-enough rate, this situation results in perpetual schema pulls. After `ae8d2a550d` (5.2.0), it is more liekly to run into this situation, because table creation generates tombstones for all schema tables relevant to the table, even the ones which will be otherwise empty for the new table (e.g. computed_columns). This change inroduces a cluster feature which when enabled will change digest calculation to be insensitive to expiry by ignoring empty partitions in digest calculation. When the feature is enabled, schema_ptrs are reloaded so that the window of discrepancy during transition is short and no rolling restart is required. A similar problem was fixed for per-node digest calculation in c2ba94dc39e4add9db213751295fb17b95e6b962. Per-table digest calculation was not fixed at that time because we didn't persist enabled features and they were not enabled early-enough on boot for us to depend on them in digest calculation. Now they are enabled before non-system tables are loaded so digest calculation can rely on cluster features. Fixes #4485. Manually tested using ccm on cluster upgrade scenarios and node restarts. Closes #14441 * github.com:scylladb/scylladb: test: schema_change_test: Verify digests also with TABLE_DIGEST_INSENSITIVE_TO_EXPIRY enabled schema_mutations, migration_manager: Ignore empty partitions in per-table digest migration_manager, schema_tables: Implement migration_manager::reload_schema() schema_tables: Avoid crashing when table selector has only one kind of tables	2023-07-28 00:01:33 +03:00
Anna Stuchlik	8ee6f6ecb6	doc: add the requirement to upgrade drivers This commit adds a requirement to upgrade ScyllaDB drivers before upgrading ScyllaDB. The requirement to upgrade the Monitoring Stack has been moved to the new section so that both prerequisites are documented together. NOTE: The information is added to the 5.2-to-5.3 upgrade guide because all future upgrade guides will be based on this one (as it's the latest one). If 5.3 is released, this commit should be backported to branch-5.3. Refs https://github.com/scylladb/scylladb/issues/13958 Closes #14771	2023-07-27 15:21:38 +02:00
Patryk Jędrzejczak	b81a6037f1	test: pylib: ensure ScyllaCluster.add_server does not start a second cluster If the cluster isn't empty and all servers are stopped, calling ScyllaCluster.add_server can start a new cluster. That's because ScyllaCluster._seeds uses the running servers to calculate the seed node list, so if all nodes are down, the new node would select only itself as a seed, starting a new cluster. As a single ScyllaCluster should describe a single cluster, we make ScyllaCluster.add_server fail when called on a non-empty cluster with all its nodes stopped. Closes #14804	2023-07-27 13:27:23 +02:00
Botond Dénes	7351c8424d	mutation/mutation_rebuilder: add comment about validity of returned mutation reference Closes #14853	2023-07-27 12:13:46 +03:00
Alexey Novikov	ff721ec3e3	make timestamp string format cassandra compatible when we convert timestamp into string it must look like: '2017-12-27T11:57:42.500Z' it concerns any conversion except JSON timestamp format JSON string has space as time separator and must look like: '2017-12-27 11:57:42.500Z' both formats always contain milliseconds and timezone specification Fixes #14518 Fixes #7997 Closes #14726	2023-07-27 12:01:09 +03:00
Botond Dénes	b599f15b26	replica: make_[multishard_]streaming_reader(): make compaction_time mandatory Now that all users have opted in unconditionally, there is no point in keeping this optional. Make it mandatory to make sure there are no opt-out by mistake. The global override via enable_compacting_data_for_streaming_and_repair config item still remains, allowing compaction to be force turned-off.	2023-07-27 04:57:52 -04:00
Botond Dénes	fdaf908967	repair/row_level: opt in to compacting the stream Using a centrally generated compaction-time, generated on the repair master and propagated to all repair followers. For repair it is imperative that all participants use the exact same compaction time, otherwise there can be artificial differences between participants, generating unnecessary repair activity. If a repair follower doesn't get a compaction-time from the repair master, it uses a locally generated one. This is no worse than the previous state of each node being on some undefined state of compaction.	2023-07-27 04:57:50 -04:00
Botond Dénes	5452fd1ce4	streaming: opt-in to compacting the stream Use locally generated compaction time on each node. This could lead to different nodes making different decisions on what is expired or not. But this is already the case for streaming, as what exactly is expired depends on when compaction last run.	2023-07-27 03:22:11 -04:00
Botond Dénes	5a73c3374e	sstables_loader: opt-in for compacting the stream No point in loading expired/covered data.	2023-07-27 03:22:11 -04:00
Botond Dénes	2f8d77e97b	replica/table: add optional compacting to make_multishard_streaming_reader() Doing to make_multishard_streaming_reader() what the previous commit did to make_streaming_reader(). In fact, the new compaction_time parameter is simply forwarded to the make_streaming_reader() on the shard readers. Call sites are updated, but none opt in just yet.	2023-07-27 03:22:11 -04:00
Botond Dénes	42b0dd5558	replica/table: add optional compacting to make_streaming_reader() Opt-in is possible by passing an engaged `compaction_time` (gc_clock::time_point) to the method. When this new parameter is disengaged, no compaction happens. Note that there is a global override, via the enable_compacting_data_for_streaming_and_repair config item, which can force-disable this compaction. Compaction done on the output of the streaming reader does not garbage-collect tombstones! All call-sites are adjusted (the new parameter is not defaulted), but none opt in yet. This will be done in separate commit per user.	2023-07-27 03:22:11 -04:00
Botond Dénes	9e3987fc96	db/config: add config item for enabling compaction for streaming and repair Compacting can greatly reduce the amount of data to be processed by streaming and repair, but with certain data shapes, its effectiveness can be reduced and its CPU overhead might outweight the benefits. This should very rarely be the case, but leave an off switch in case this becomes a problem in a deployment. Not wired yet.	2023-07-27 03:22:11 -04:00
Botond Dénes	a22446afe0	repair: log the error which caused the repair to fail Instead of just a boolean _failed flag, persist the error message of the exception which caused the repair to fail, and include it in the log message announcing the failure.	2023-07-27 03:22:11 -04:00
Botond Dénes	ac44efea11	readers: compacting_reader: use compact_mutation_state::abandon_current_partition() When next_partition() or fast_forward_to() is called. Instead of trying to simulate a properly closed partition by injecting synthetic mutation fragments to properly close it.	2023-07-27 02:50:44 -04:00
Botond Dénes	326c3b92e5	mutation/mutation_compactor: allow user to abandon current partition Currently, the compactor requires a valid stream and thus abandoning a partition in the middle was not possible. This causes some complications for the compacting reader, which implements methods such as `next_partition()` which is possibly called in the middle of a partition. In this case the compacting reader attempts to close the partition properly by inserting a synthetic partition-end fragment into the stream. This is not enough however as it doesn't close any range tombstone changes that might be active. Instead of piling on more complexity, add an API to the compactor which allows abandoning the current partition.	2023-07-27 02:50:44 -04:00
Kefu Chai	1b7bde2e9e	compaction_manager: use range in compacting_sstable_registration simpler than the "begin, end" iterator pair. and also tighten the type constraints, now require the value type to be sstables::shared_sstable. this matches what we are expecting in the implementation. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14678	2023-07-27 09:40:20 +03:00
Pavel Emelyanov	e9218e6873	system_keyspace: Don't update schema version in .setup() The db.get_version() called that early returns value that database got construction-time, i.e. -- empty_version thing. It makes little sense committing it into the system k.s. all the more so the "real" version is calculated and updated few steps after .setup(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #14833	2023-07-27 09:38:57 +03:00
Pavel Emelyanov	c017117340	system_keyspace: Remove qctx usage from load_topology_state() Fortunately, this is pretty simple -- the only caller is storage_service that has sharded<system_keysace> dependency reference Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #14824	2023-07-27 08:56:40 +03:00
Raphael S. Carvalho	050ce9ef1d	cached_file: Evict unused pages that aren't linked to LRU yet It was found that cached_file dtor can hit the following assert after OOM cached_file_test: utils/cached_file.hh:379: cached_file::~cached_file(): Assertion _cache.empty()' failed.` cached_file's dtor iterates through all entries and evict those that are linked to LRU, under the assumption that all unused entries were linked to LRU. That's partially correct. get_page_ptr() may fetch more than 1 page due to read ahead, but it will only call cached_page::share() on the first page, the one that will be consumed now. share() is responsible for automatically placing the page into LRU once refcount drops to zero. If the read is aborted midway, before cached_file has a chance to hit the 2nd page (read ahead) in cache, it will remain there with refcount 0 and unlinked to LRU, in hope that a subsequent read will bring it out of that state. Our main user of cached_file is per-sstable index caching. If the scenario above happens, and the sstable and its associated cached_file is destroyed, before the 2nd page is hit, cached_file will not be able to clear all the cache because some of the pages are unused and not linked. A page read ahead will be linked into LRU so it doesn't sit in memory indefinitely. Also allowing for cached_file dtor to clear all cache if some of those pages brought in advance aren't fetched later. A reproducer was added. Fixes #14814. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14818	2023-07-27 00:01:46 +02:00
Anna Stuchlik	3ed6754afc	doc: update info about cassandra superuser Fixes https://github.com/scylladb/scylla-docs/issues/4028 The goal of this update is to discourage the use of the default cassandra superuser in favor of a custom super user - and explain why it's a good practice. The scope of this commit: - Adding a new page on creating a custom superuser. The page collects and clarifies the information about the cassandra superuser from other pages. - Remove the (incomplete) information about superuser from the Authorization and Authentication pages, and add the link to the new page instead. Additionaly, this update will result in better searchability and ensures language clarity. Closes #14829	2023-07-26 23:15:31 +03:00

1 2 3 4 5 ...

38149 Commits