scylla

Author	SHA1	Message	Date
Petr Gusev	5a1418fdba	token_metadata: get_endpoint_for_host_id -> get_endpoint_for_host_id_if_known This commit fixes an inconsistency in method names: get_host_id and get_host_id_if_known are (internal_error, returns null), but there was only one method for the opposite conversion - get_endpoint_for_host_id, and it returns null. In this commit we change it to on_internal_error if it can't find the argument and add another method get_endpoint_for_host_id_if_known which returns null in this case. We can't use get_endpoint_for_host_id/get_host_id in host_id_or_endpoint::resolve since it's called from storage_service::parse_node_list -> token_metadata::parse_host_id_and_endpoint, and exceptions are caught and handled in `storage_service::parse_node_list`.	2023-12-11 12:51:34 +04:00
Petr Gusev	63f64f3303	token_metadata: make it a template with NodeId=inet_address/host_id NodeId is used in all internal token_metadata data structures, that previously used inet_address. We choose topology::key_kind based on the value of the template parameter. generic_token_metadata::update_topology overload with host_id parameter is added to make update_topology_change_info work, it now uses NodeId as a parameter type. topology::remove_endpoint(host_id) is added to make generic_token_metadata::remove_endpoint(NodeId) work. pending_endpoints_for and endpoints_for_reading are just removed - they are not used and not implemented. The declarations were left by mistake from a refactoring in which these methods were moved to erm. generic_token_metadata_base is extracted to contain declarations, common to both token_metadata versions. Templates are explicitly instantiated inside token_metadata.cc, since implementation part is also a template and it's not exposed to the header. There are no other behavioral changes in this commit, just syntax fixes to make token_metadata a template.	2023-12-11 12:51:34 +04:00
Nadav Har'El	300e549267	tablets, mv: disable self-pairing when tablets are used A write to a base table can generate one or more writes to a materialized view. The write to RF base replicas need to cause writes to RF view replicas. Our MV implementation, based on Cassandra's implementation, does this via "pairing": Each one of the base replicas involved in this write sends each view update to exactly one view replica. The function get_view_natural_endpoint() tells a base replica which of the view replicas it should send the update to. The standard pairing is based on the ring order: The first owner of the base token sends to the first owner of the view token, the second to the second, and so on. However, the existing code also uses an optimization we call self-pairing: If a single node is both a base replica and a base replica, the pairing is modified so this node sends the update to itself. This patch disables the self-pairing optimization in keyspaces that use tablets: The self-pairing optimization can cause the pairing to change after token ranges are moved between nodes, so it can break base-view consistency in some edge cases, leading to "ghost rows". With tablets, these range movements become even more frequent - they can happen even if the cluster doesn't grow. This is why we want to solve this problem for tablets. For backward compatibility and to avoid sudden inconsistencies emerging during upgrades, we decided to continue using the self-pairing optimization for keyspaces that are not using tablets (i.e., using vnoodes). Currently, we don't introduce a "CREATE MATERIALIZED VIEW" option to override these defaults - i.e., we don't provide a way to disable self-pairing with vnodes or to enable them with tablets. We could introduce such a schema flag later, if we ever want to (and I'm not sure we want to). It's important to note, that in some cases, this change has implications on when view updates become synchronous, in the tablets case. For example: * If we have 3 nodes and RF=3, with the self-pairing optimization each node is paired with itself, the view update is local, and is implicitly synchronous (without requiring a "synchronous_updates" flag). * In the same setup with tablets, without the self-pairing optimization (due to this patch), this is not guaranteed. Some view updates may not be synchronous, i.e., the base write will not wait for the view write. If the user really wants synchronous updates, they should be requested explicitly, with the "synchronous_updates" view option. Fixes #16260. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16272	2023-12-06 17:11:17 +02:00
Benny Halevy	63b556123b	db/view: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:55:46 +02:00
Nadav Har'El	4505a86f46	tablets, mv: fix base-view pairing to consider base replication map In the view update code, the function get_view_natural_endpoint() determines which view replica this base replica should send an update to. It currently gets the view table's replication map (i.e., the map from view tokens to lists of replicas holding the token), but assumes that this is also the base table's replication map. This assumption was true with vnodes, but is no longer true with tablets - the base table's replication map can be completely different from the view table's. By looking at the wrong mapping, get_view_natural_endpoint() can believe that this node isn't really a base-replica and drop the view update. Alternatively, it can think it is a base replica - but use the wrong base-view pairing and create base-view inconsistencies. This patch solves this bug - get_view_natural_endpoint() now gets two separate replication maps - the base's and the view's. The callers need to remember what the base table was (in some cases they didn't care at the point of the call), and pass it to the function call. This patch also includes a simple test that reproduces the bug, and confirms it is fixed: The test has a 6-node cluster using tablets and a base table with RF=1, and writes one row to it. Before this patch, the code usually gets confused, thinking the base replica isn't a replica and loses the view update. With this patch, the view update works. Fixes #16227. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16228	2023-12-04 16:38:54 +02:00
Yaniv Kaul	2b73793a39	Update db/view/view.cc	2023-12-03 10:07:45 +02:00
Yaniv Kaul	c658bdb150	Typos: fix typos in comments Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2023-12-02 22:37:22 +02:00
Nadav Har'El	62f89d49e5	tablets, mv: fix on_internal_error on write to base table This situation before this patch is that when tablets are enabled for a keyspace, we can create a materialized view but later any write to the base table fails with an on_internal_error(), saying that: "Tried to obtain per-keyspace effective replication map of test but it's per-table." Indeed, with tablets, the replication is different for each table - it's not the same for the entire keyspace. So this patch changes the view update code to take the replication map from the specific base table, not the keyspace. This is good enough to get materialized-views reads and writes working in a simple single-node case, as the included test demonstrates (the test fails with on_internal_error() before this patch, and passes afterwards). But this fix is not perfect - the base-view pairing code really needs to consider not only the base table's replication map, but also the view table's replication map - as those can be different. We'll fix this remaining problem as a followup in a separate patch - it will require a substantially more elaborate test to reproduce the need for the different mapping and to verify that fix. Fixes #16209. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16211	2023-11-29 15:29:17 +01:00
Pavel Emelyanov	3471f30b58	view_update_generator: Unplug from database later Patch `967ebacaa4` (view_update_generator: Move abort kicking to do_abort()) moved unplugging v.u.g from database from .stop() to .do_abort(). The latter call happens very early on stop -- once scylla receives SIGINT. However, database may still need v.u.g. plugged to flush views. This patch moves unplug to later, namely to .stop() method of v.u.g. which happens after database is drained and should no longer continue view updates. fixes: #16001 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16091	2023-11-20 11:47:55 +02:00
Benny Halevy	a1acf6854b	everywhere: reduce dependencies on i_partitioner.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-05 20:47:44 +02:00
Avi Kivity	ef7db6df99	Merge 'schema_tables: turn view schema fixing code into a sanity check' from Kamil Braun The purpose of `maybe_fix_legacy_secondary_index_mv_schema` was to deal with legacy materialized view schemas used for secondary indexes, schemas which were created before the notion of "computed columns" was introduced. Back then, secondary index schemas would use a regular "token" column. Later it became a computed column and old schemas would be migrated during rolling upgrade. The migration code was introduced in 2019 (`db8d4a0cc6`) and then fixed in 2020 (`d473bc9b06`). The fix was present in Enterprise 2022.1 and in OSS 4.5. So, assuming that users don't try crazy things like upgrading from 2021.X to 2023.X (which we do not support), all clusters will have already executed the migration code once they upgrade to 2023.X, meaning we can get rid of it. The main motivation of this PR is to get rid of the `db::schema_tables::merge_schema` call in `parse_schema_tables`. In Raft mode this was the only call to `merge_schema` outside "group 0 code" and in fact it is unsafe -- it uses locally generated mutations with locally generated timestamp (`api::new_timestamp()`), so if we actually did it, we would permanently diverge the group 0 state machine across nodes (the schema pulling code is disabled in Raft mode). Fortunately, this should be dead code by now, as explained in the previous paragraph. The migration code is now turned into a sanity check, if the users try something crazy, they will get an error instead of silent data corruption. Closes scylladb/scylladb#15695 * github.com:scylladb/scylladb: view: remove unused `_backing_secondary_index` schema_tables: turn view schema fixing code into a sanity check schema_tables: make comment more precise feature_service: make COMPUTED_COLUMNS feature unconditionally true	2023-10-31 13:23:19 +02:00
Marcin Maliszkiewicz	020a9c931b	db: view: run local materialized view mutations on a separate smp service group When base write triggers mv write and it needs to be send to another shard it used the same service group and we could end up with a deadlock. This fix affects also alternator's secondary indexes. Testing was done using (yet) not committed framework for easy alternator performance testing: https://github.com/scylladb/scylladb/pull/13121. I've changed hardcoded max_nonlocal_requests config in scylla from 5000 to 500 and then ran: ./build/release/scylla perf-alternator-workloads --workdir /tmp/scylla-workdir/ --smp 2 \ --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write_gsi \ --duration 60 --ring-delay-ms 0 --skip-wait-for-gossip-to-settle 0 --continue-after-error true --concurrency 2000 Without the patch when scylla is overloaded (i.e. number of scheduled futures being close to max_nonlocal_requests) after couple seconds scylla hangs, cpu usage drops to zero, no progress is made. We can confirm we're hitting this issue by seeing under gdb: p seastar::get_smp_service_groups_semaphore(2,0)._count $1 = 0 With the patch I wasn't able to observe the problem, even with 2x concurrency. I was able to make the process hang with 10x concurrency but I think it's hitting different limit as there wasn't any depleted smp service group semaphore and it was happening also on non mv loads. Fixes https://github.com/scylladb/scylladb/issues/15844 Closes scylladb/scylladb#15845	2023-10-29 18:30:32 +02:00
Kamil Braun	db49ccccb0	view: remove unused `_backing_secondary_index` This boolean was only used for a sanity check which was replaced with a stronger sanity check in the previous commit that doesn't require the boolean.	2023-10-24 13:33:36 +02:00
Kamil Braun	3976808b12	schema_tables: turn view schema fixing code into a sanity check The purpose of `maybe_fix_legacy_secondary_index_mv_schema` was to deal with legacy materialized view schemas used for secondary indexes, schemas which were created before the notion of "computed columns" was introduced. Back then, secondary index schemas would use a regular "token" column. Later it became a computed column and old schemas would be migrated during rolling upgrade. The migration code was introduced in 2019 (`db8d4a0cc6`) and then fixed in 2020 (`d473bc9b06`). The fix was present in Enterprise 2022.1 and in OSS 4.5. So, assuming that users don't try crazy things like upgrading from 2021.X to 2023.X (which we do not support), all clusters will have already executed the migration code once they upgrade to 2023.X, meaning we can get rid of it. The main motivation of this patch is to get rid of the `db::schema_tables::merge_schema` call in `parse_schema_tables`. In Raft mode this was the only call to `merge_schema` outside "group 0 code" and in fact it is unsafe -- it uses locally generated mutations with locally generated timestamp (`api::new_timestamp()`), so if we actually did it, we would permanently diverge the group 0 state machine across nodes (the schema pulling code is disabled in Raft mode). Fortunately, this should be dead code by now, as explained in the previous paragraph. The migration code is now turned into a sanity check, if the users try something crazy, they will get an error instead of silent data corruption.	2023-10-24 13:33:35 +02:00
Jan Ciolek	940e44f887	db/view: change log level of failed view updates to WARN When a remote view update doesn't succeed there's a log message saying "Error applying view update...". This message had log level ERROR, but it's not really a hard error. View updates can fail for a multitude of reasons, even during normal operation. A failing view update isn't fatal, it will be saved as a view hint a retried later. Let's change the log level to WARN. It's something that shouldn't happen too much, but it's not a disaster either. ERROR log level causes trouble in tests which assume that an ERROR level message means that the test has failed. Refs: https://github.com/scylladb/scylladb/issues/15046#issuecomment-1712748784 For local view updates the log level stays at "ERROR", local view updates shouldn't fail. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com> Closes scylladb/scylladb#15640	2023-10-11 18:19:23 +03:00
Pavel Emelyanov	becd960ae8	view_update_generator: Add logging to do_abort() Just tell the logs that the guy is aborting refs: #10941 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-21 13:34:21 +03:00
Pavel Emelyanov	967ebacaa4	view_update_generator: Move abort kicking to do_abort() When v.u.g. stops is first aborts the generation background fiber by requesting abort on the internal abort source and signalling the fiber in case it's waiting. Right now v.u.g.::stop() is defer-scheduled last in main(), so this move doesn't change much -- when stop_signal fires, it will kick the v.u.g.::do_abort() just a bit earlier, there's nothing that would happen after it before real ::stop() is called that depends on it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-21 13:32:45 +03:00
Pavel Emelyanov	e34220ebb7	view_update_generator: Add early abort subscription Subscribe v.u.g. to the main's stop_signal. For now a no-op callback. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-21 13:32:45 +03:00
Benny Halevy	2c54d7a35a	view, storage_proxy: carry effective_replication_map along with endpoints When sending mutation to remote endpoint, the selected endpoints must be in sync with the current effective_replication_map. Currently, the endpoints are sent down the storage_proxy stack, and later on an effective_replication_map is retrieved again, and it might not match the target or pending endpoints, similar to the case seen in https://github.com/scylladb/scylladb/issues/15138 The correct way is to carry the same effective replication map used to select said endpoints and pass it down the stack. See also https://github.com/scylladb/scylladb/pull/15141 Fixes scylladb/scylladb#15144 Fixes scylladb/scylladb#14730 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #15142	2023-08-29 09:08:42 +03:00
Benny Halevy	8fbcf1ab9f	view: start: ignore also abort_requested_exception We see the abort_requested_exception error from time to time, instead of sleep_aborted that was expected and quietly ignored (in debug log level). Treat abort_requested_exception the same way since the error is expected on shutdown and to reduce test flakiness, as seen for example, in https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/3033/artifact/logs-full.release.010/1691896356104_repair_additional_test.py%3A%3ATestRepairAdditional%3A%3Atest_repair_schema/node2.log ``` INFO 2023-08-13 03:12:29,151 [shard 0] compaction_manager - Asked to stop WARN 2023-08-13 03:12:29,152 [shard 0] gossip - failure_detector_loop: Got error in the loop, live_nodes={}: seastar::sleep_aborted (Sleep is aborted) INFO 2023-08-13 03:12:29,152 [shard 0] gossip - failure_detector_loop: Finished main loop WARN 2023-08-13 03:12:29,152 [shard 0] cdc - Aborted update CDC description table with generation (2023/08/13 03:12:17, d74aad4b-6d30-4f22-947b-282a6e7c9892) INFO 2023-08-13 03:12:29,152 [shard 1] compaction_manager - Asked to stop INFO 2023-08-13 03:12:29,152 [shard 1] compaction_manager - Stopped INFO 2023-08-13 03:12:29,153 [shard 0] init - Signal received; shutting down INFO 2023-08-13 03:12:29,153 [shard 0] init - Shutting down view builder ops INFO 2023-08-13 03:12:29,153 [shard 0] view - Draining view builder INFO 2023-08-13 03:12:29,153 [shard 1] view - Draining view builder INFO 2023-08-13 03:12:29,153 [shard 0] compaction_manager - Stopped ERROR 2023-08-13 03:12:29,153 [shard 0] view - start failed: seastar::abort_requested_exception (abort requested) ERROR 2023-08-13 03:12:29,153 [shard 1] view - start failed: seastar::abort_requested_exception (abort requested) ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #15029	2023-08-13 18:39:09 +03:00
Botond Dénes	4a02865ea1	Merge 'Prevent invalidation of iterators over database::_column_families' from Aleksandra Martyniuk Maps related to column families in database are extracted to a column_families_data class. Access to them is possible only through methods. All methods which may preempt hold rwlock in relevant mode, so that the iterators can't become invalid. Fixes: #13290 Closes #13349 * github.com:scylladb/scylladb: replica: make tables_metadata's attributes private replica: add methods to get a filtered copy of tables map replica: add methods to check if given table exists replica: add methods to get table or table id replica: api: return table_id instead of const table_id& replica: iterate safely over tables related maps replica: pass tables_metadata to phased_barrier_top_10_counts replica: add methods to safely add and remove table replica: wrap column families related maps into tables_metadata replica: futurize database::add_column_family and database::remove	2023-07-31 15:31:59 +03:00
Botond Dénes	d66b07823b	db/view/view_updating_consumer: account for the size of mutations All partitions will have a corresponding mutation object in the buffer. These objects have non-negligible sizes, yet the consumer did not bump the _buffer_size when a new partition was consumer. This resulted in empty partitions not moving the _buffer_size at all, and thus they could accumulate without bounds in the buffer, never triggering a flush just by themselves. We have recently seen this causing OOM. This patch fixes that by bumping the _buffer_size with the size of the freshly created mutation object.	2023-07-26 03:07:25 -04:00
Aleksandra Martyniuk	cdbfa0b2f5	replica: iterate safely over tables related maps Loops over _column_families and _ks_cf_to_uuid which may preempt are protected by reader mode of rwlock so that iterators won't get invalid.	2023-07-25 17:13:04 +02:00
Aleksandra Martyniuk	52afd9d42d	replica: wrap column families related maps into tables_metadata As a preparation for ensuring access safety for column families related maps, add tables_metadata, access to members of which would be protected by rwlock.	2023-07-25 16:13:00 +02:00
Asias He	c29e7e4644	Revert "Revert "view_update_generator: Increase the registration_queue_size"" This reverts commit `4cee8206f8`. The test is fixed. Closes #14750	2023-07-19 11:46:28 +03:00
Botond Dénes	21ff6efd74	test/boost/view_build_test: improve test_view_update_generator_register_semaphore_unit_leak By making it independent of the number of units the view update generator's registration semaphore is created with. We want to increase this number significantly and that would destabilize this test significantly. To prevent this, detach the test from the number of units completely, while stil preserving the original intent behind it, as best as it could be determined. Closes #14727	2023-07-18 09:18:28 +03:00
Botond Dénes	4cee8206f8	Revert "view_update_generator: Increase the registration_queue_size" This reverts commit `d3034e0fab`. The test modified by this commit (view_build_test.test_view_update_generator_register_semaphore_unit_leak) often fails, breaking build jobs.	2023-07-13 16:48:50 +03:00
Asias He	d3034e0fab	view_update_generator: Increase the registration_queue_size When repair writes a sstable to disk, we check if the sstable needs view update processing. If yes, the sstable will be placed into the staging dir for processing, with the _registration_sem semaphore to prevent too many pending unprocessed sstables. We have seen multiple cases in the field where view update processing is inefficient and way too slow which blocks the base table repair to finish on time. This patch increases the registration_queue_size to a bigger number to mitigate the problem that slow view update processing blocks repair. It is better to have a consistent base table + inconsistent view table than inconsistent base table + inconsistent view table. Currently, sstables in staging dir are not compacted. So we could not increase the _registration_sem with too big number to avoid accumulate too many sstables. The view_build_test.cc is updated to make the test pass. Closes #14241	2023-07-12 15:51:35 +03:00
Michał Chojnowski	ac29b6f198	view_updating_consumer: make buffer limit a variable The limit doesn't change at runtime, but we this patch makes it variable for unit testing purposes.	2023-07-05 17:33:47 +02:00
Michał Chojnowski	5ad0846bff	view: fix range tombstone handling on flushes in view_updating_consumer View update routines accept `mutation` objects. But what comes out of staging sstable readers is a stream of mutation_fragment_v2 objects. To build view updates after a repair/streaming, we have to convert the fragment stream into `mutation`s. This is done by piping the stream to mutation_rebuilder_v2. To keep memory usage limited, the stream for a single partition might have to be split into multiple partial `mutation` objects. view_update_consumer does that, but in improper way -- when the split/flush happens inside an active range tombstone, the range tombstone isn't closed properly. This is illegal, and triggers an internal error. This patch fixes the problem by closing the active range tombstone (and reopening in the same position in the next `mutation` object). The tombstone is closed just after the last seen clustered position. This is not necessary for correctness -- for example we could delay all processing of the range tombstone until we see its end bound -- but it seems like the most natural semantic. Fixes #14503	2023-07-04 20:33:21 +02:00
Raphael S. Carvalho	1ff8645eaa	view_update_generator: Dump throughput and duration for view update from staging Very helpful for user to understand how fast view update generation is processing the staging sstables. Today, logs are completely silent on that. It's not uncommon for operators to peek into staging dir and deduce the throughput based on removal of files, which is terrible. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-06-26 21:58:23 -03:00
Avi Kivity	b858a4669d	cql3: expr: break up expression.hh header Adding a function declaration to expression.hh causes many recompilations. Reduce that by: - moving some restrictions-related definitions to the existing expr/restrictions.hh - moving evaluation related names to a new header expr/evaluate.hh - move utilities to a new header expr/expr-utilities.hh expression.hh contains only expression definitions and the most basic and common helpers, like printing.	2023-06-22 14:21:03 +03:00
Avi Kivity	32b27d6a08	cql3: expr: change evaluation_input vector components to take spans Spans are slightly cleaner, slightly faster (as they avoid an indirection), and allow for replacing some of the arguments with small_vector:s. Closes #14313	2023-06-22 11:28:01 +02:00
Pavel Emelyanov	c68c154fb6	code: Reduce tracing/hh fanout There are some headers that include tracing/.hh ones despite all they need is forward-declared trace_state_ptr Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #14155	2023-06-07 19:19:22 +03:00
Pavel Emelyanov	66e43912d6	code: Switch to seastar API level 7 In that level no io_priority_class-es exist. Instead, all the IO happens in the context of current sched-group. File API no longer accepts prio class argument (and makes io_intent arg mandatory to impls). So the change consists of - removing all usage of io_priority_class - patching file_impl's inheritants to updated API - priority manager goes away altogether - IO bandwidth update is performed on respective sched group - tune-up scylla-gdb.py io_queues command The first change is huge and was made semi-autimatically by: - grep io_priority_class \| default_priority_class - remove all calls, found methods' args and class' fields Patching file_impl-s is smaller, but also mechanical: - replace io_priority_class& argument with io_intent* one - pass intent to lower file (if applicatble) Dropping the priority manager is: - git-rm .cc and .hh - sed out all the #include-s - fix configure.py and cmakefile The scylla-gdb.py update is a bit hairry -- it needs to use task queues list for IO classes names and shares, but to detect it should it checks for the "commitlog" group is present. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13963	2023-06-06 13:29:16 +03:00
Avi Kivity	ffce6d94fc	Merge 'service: storage_proxy: make hint write handlers cancellable' from Kamil Braun The `view_update_write_response_handler` class, which is a subclass of `abstract_write_response_handler`, was created for a single purpose: to make it possible to cancel a handler for a view update write, which means we stop waiting for a response to the write, timing out the handler immediately. This was done to solve issue with node shutdown hanging because it was waiting for a view update to finish; view updates were configured with 5 minute timeout. See #3966, #4028. Now we're having a similar problem with hint updates causing shutdown to hang in tests (#8079). `view_update_write_response_handler` implements cancelling by adding itself to an intrusive list which we then iterate over to timeout each handler when we shutdown or when gossiper notifies `storage_proxy` that a node is down. To make it possible to reuse this algorithm for other handlers, move the functionality into `abstract_write_response_handler`. We inherit from `bi::list_base_hook` so it introduces small memory overhead to each write handler (2 pointers) which was only present for view update handlers before. But those handlers are already quite large, the overhead is small compared to their size. Use this new functionality to also cancel hint write handlers when we shutdown. This fixes #8079. Closes #14047 * github.com:scylladb/scylladb: test: reproducer for hints manager shutdown hang test: pylib: ScyllaCluster: generalize config type for `server_add` test: pylib: scylla_cluster: add explicit timeout for graceful server stop service: storage_proxy: make hint write handlers cancellable service: storage_proxy: rename `view_update_handlers_list` service: storage_proxy: make it possible to cancel all write handler types	2023-05-30 01:36:50 +03:00
Kamil Braun	0ef35ceed4	service: storage_proxy: make hint write handlers cancellable Whether a write handler should be cancellable is now controlled by a parameter passed to `create_write_response_handler`. We plumb it down from `send_to_endpoint` which is called by hints manager. This will cause hint write handlers to immediately timeout when we shutdown or when a destination node is marked as dead. Fixes #8079	2023-05-29 11:03:18 +02:00
Pavel Emelyanov	5861d15912	Merge 'Small gossiper and migration_manager cleanups' from Gleb Some assorted cleanups here: consolidation of schema agreement waiting into a single place and removing unused code from the gossiper. CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/1458/ Reviewed-by: Konstantin Osipov <kostja@scylladb.com> * gleb/gossiper-cleanups of github.com:scylladb/scylla-dev: storage_service: avoid unneeded copies in on_change storage_service: remove check that is always true storage_service: rename handle_state_removing to handle_state_removed storage_service: avoid string copy storage_service: delete code that handled REMOVING_TOKENS state gossiper: remove code related to advertising REMOVING_TOKEN state migration_manager: add wait_for_schema_agreement() function	2023-05-27 10:49:54 +03:00
Gleb Natapov	a429018a8a	migration_manager: add wait_for_schema_agreement() function Several subsystems re-implement the same logic for waiting for schema agreement. Provide the function in the migration_manager and use it instead.	2023-05-25 14:44:53 +03:00
Tomasz Grabiec	51e3b9321b	Merge ' mvcc: make schema upgrades gentle' from Michał Chojnowski After a schema change, memtable and cache have to be upgraded to the new schema. Currently, they are upgraded (on the first access after a schema change) atomically, i.e. all rows of the entry are upgraded with one non-preemptible call. This is a one of the last vestiges of the times when partition were treated atomically, and it is a well known source of numerous large stalls. This series makes schema upgrades gentle (preemptible). This is done by co-opting the existing MVCC machinery. Before the series, all partition_versions in the partition_entry chain have the same schema, and an entry upgrade replaces the entire chain with a single squashed and upgraded version. After the series, each partition_version has its own schema. A partition entry upgrade happens simply by adding an empty version with the new schema to the head of the chain. Row entries are upgraded to the current schema on-the-fly by the cursor during reads, and by the MVCC version merge ongoing in the background after the upgrade. The series: 1. Does some code cleanup in the mutation_partition area. 2. Adds a schema field to partition_version and removes it from its containers (partition_snapshot, cache_entry, memtable_entry). 3. Adds upgrading variants of constructors and apply() for `row` and its wrappers. 4. Prepares partition_snapshot_row_cursor, mutation_partition_v2::apply_monotonically and partition_snapshot::merge_partition_versions for dealing with heterogeneous version chains. 5. Modifies partition_entry::upgrade to perform upgrades by extending the version chain with a new schema instead of squashing it to a single upgraded version. Fixes #2577 Closes #13761 * github.com:scylladb/scylladb: test: mvcc_test: add a test for gentle schema upgrades partition_version: make partition_entry::upgrade() gentle partition_version: handle multi-schema snapshots in merge_partition_versions mutation_partition_v2: handle schema upgrades in apply_monotonically() partition_version: remove the unused "from" argument in partition_entry::upgrade() row_cache_test: prepare test_eviction_after_schema_change for gentle schema upgrades partition_version: handle multi-schema entries in partition_entry::squashed partition_snapshot_row_cursor: handle multi-schema snapshots partiton_version: prepare partition_snapshot::squashed() for multi-schema snapshots partition_version: prepare partition_snapshot::static_row() for multi-schema snapshots partition_version: add a logalloc::region argument to partition_entry::upgrade() memtable: propagate the region to memtable_entry::upgrade_schema() mutation_partition: add an upgrading variant of lazy_row::apply() mutation_partition: add an upgrading variant of rows_entry::rows_entry mutation_partition: switch an apply() call to apply_monotonically() mutation_partition: add an upgrading variant of rows_entry::apply_monotonically() mutation_fragment: add an upgrading variant of clustering_row::apply() mutation_partition: add an upgrading variant of row::row partition_version: remove _schema from partition_entry::operator<< partition_version: remove the schema argument from partition_entry::read() memtable: remove _schema from memtable_entry row_cache: remove _schema from cache_entry partition_version: remove the _schema field from partition_snapshot partition_version: add a _schema field to partition_version mutation_partition: change schema_ptr to schema& in mutation_partition::difference mutation_partition: change schema_ptr to schema& in mutation_partition constructor mutation_partition_v2: change schema_ptr to schema& in mutation_partition_v2 constructor mutation_partition: add upgrading variants of row::apply() partition_version: update the comment to apply_to_incomplete() mutation_partition_v2: clean up variants of apply() mutation_partition: remove apply_weak() mutation_partition_v2: remove a misleading comment in apply_monotonically() row_cache_test: add schema changes to test_concurrent_reads_and_eviction mutation_partition: fix mixed-schema apply()	2023-05-24 22:58:43 +02:00
Petr Gusev	87307781c4	effective_replication_map: use new get_pending_endpoints and get_endpoints_for_reading We already use the new pending_endpoints from erm though the get_pending_ranges virtual function, in this commit we update all the remaining places to use the new implementation in erm, as well as remove the old implementation in token_metadata.	2023-05-21 13:17:42 +04:00
Michał Chojnowski	a70c5704df	mutation_partition: change schema_ptr to schema& in mutation_partition constructor Cosmetic change. See the preceding commit for details.	2023-05-04 02:37:29 +02:00
Pavel Emelyanov	5e201b9120	database: Remove compaction_manager.hh inclusion into database.hh The only reason why it's there (right next to compaction_fwd.hh) is because the database::table_truncate_state subclass needs the definition of compaction_manager::compaction_reenabler subclass. However, the former sub is not used outside of database.cc and can be defined in .cc. Keeping it outside of the header allows dropping the compaction_manager.hh from database.hh thus greatly reducing its fanout over the code (from ~180 indirect inclusions down to ~20). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13622	2023-04-23 16:27:11 +03:00
Avi Kivity	1cd6d59578	Merge 'Remove global proxy usage from view_info::select_statement()' from Pavel Emelyanov The method needs proxy to get data_dictionary::database from to pass down to select_statement::prepare(). And a legacy bit that can come with data_dictionary::database as well. Fortunately, all the call traces that end up at select_statement() start inside table:: methods that have view_update_generator, or at view_builder::consumer that has reference to view_builder. Both services can share the database reference. However, the call traces in question pass through several code layers, so the PR adds data_dictionary::database to those layers one by one. Closes #13591 * github.com:scylladb/scylladb: view_info: Drop calls to get_local_storage_proxy() view_info: Add data_dictionary argument to select_statement() view_info: Add data_dictionary argument to partition_slice() method view_filter_checking_visitor: Construct with data_dictionary view: Carry data_dictionary arg through standalone helpers view_updates: Carry data_dictionary argument throug methods view_update_builder: Construct with data dictionary table: Push view_update_generator arg to affected_views() view: Add database getters to v._update_generator and v._builder	2023-04-20 16:40:06 +03:00
Pavel Emelyanov	bda2aea5be	view: Get topology via database tokens The view_builder::view_build_statuses() needs topology to walk its nodes. Now it gets one from global proxy via its token metadata, but database also has tokens and view_builder has reference to database. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 13:18:14 +03:00
Pavel Emelyanov	403463d7eb	view: Indentation fix after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 13:18:14 +03:00
Pavel Emelyanov	257814f443	view: Coroutinuze view_builder::view_build_statuses() Easier to patch it this way further. Indentation is deliberately left broken until next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 13:17:59 +03:00
Pavel Emelyanov	edcce7d8dd	view_info: Drop calls to get_local_storage_proxy() In both cases the proxy is called to get data_dictionary from. Now its available as the call argument. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 11:17:46 +03:00
Pavel Emelyanov	3e4fb7cad6	view_info: Add data_dictionary argument to select_statement() This method needs data_dictionary to work. Fortunately, all callers of it already have the dictionary at hand and can just pass it as argument. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 11:17:46 +03:00
Pavel Emelyanov	4375835cdd	view_info: Add data_dictionary argument to partition_slice() method The caller is calculate_affected_clustering_ranges() with dictionary arg, the method needs dictionary to call view_info::select_statement() later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 11:17:46 +03:00

1 2 3 4 5 ...

488 Commits