scylla

Author	SHA1	Message	Date
Benny Halevy	2c54d7a35a	view, storage_proxy: carry effective_replication_map along with endpoints When sending mutation to remote endpoint, the selected endpoints must be in sync with the current effective_replication_map. Currently, the endpoints are sent down the storage_proxy stack, and later on an effective_replication_map is retrieved again, and it might not match the target or pending endpoints, similar to the case seen in https://github.com/scylladb/scylladb/issues/15138 The correct way is to carry the same effective replication map used to select said endpoints and pass it down the stack. See also https://github.com/scylladb/scylladb/pull/15141 Fixes scylladb/scylladb#15144 Fixes scylladb/scylladb#14730 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #15142	2023-08-29 09:08:42 +03:00
Botond Dénes	47ce69e9bf	Merge 'paxos_response_handler: carry effective replication map' from Benny Halevy As `create_write_response_handler` on this path accepts an `inet_address_vector_replica_set` that corresponds to the effective_replication_map_ptr in the paxos_response_handler, but currently, the function retrieves a new effective_replication_map_ptr that may not hold all the said endpoints. Fixes scylladb/scylladb#15138 Closes #15141 * github.com:scylladb/scylladb: storage_proxy: create_write_response_handler: carry effective_replication_map_ptr from paxos_response_handler storage_proxy: send_to_live_endpoints: throw on_internal_error if node not found	2023-08-28 11:42:38 +03:00
Benny Halevy	4a2e367e92	storage_proxy: create_write_response_handler: carry effective_replication_map_ptr from paxos_response_handler As `create_write_response_handler` on this path accepts an `inet_address_vector_replica_set` that corresponds to the effective_replication_map_ptr in the paxos_response_handler, but currently, the function retrieves a new effective_replication_map_ptr that may not hold all the said endpoints. Fixes scylladb/scylladb#15138 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-24 11:45:13 +03:00
Benny Halevy	6af0b281a6	storage_proxy: mutate_atomically_result: carry effective_replication_map down to create_write_response_handler The effective_replication_map_ptr passed to `create_write_response_handler` by `send_batchlog_mutation` must be synchronized with the one used to calculate _batchlog_endpoints to ensure they use the same topology. Fixes scylladb/scylladb#15147 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-24 10:43:40 +03:00
Benny Halevy	098dd5021a	storage_proxy: mutate_atomically_result: keep schema of batchlog mutation in context The batchlog mutation is for system.batchlog. Rather than looking the schema up in multiple places do that once and keep it in the context object. It will be used in the next patch to get a respective effective_replication_map_ptr. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-24 10:43:23 +03:00
Benny Halevy	27c33015a5	storage_proxy: send_to_live_endpoints: throw on_internal_error if node not found Return error in production rather than crashing as in https://github.com/scylladb/scylladb/issues/15138 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-24 08:59:38 +03:00
Tomasz Grabiec	0239ba4527	Merge 'fencing: handle counter_mutations' from Gusev Petr In this PR we add proper fencing handling to the `counter_mutation` verb. As for regular mutations, we do the check twice in `handle_counter_mutation`, before and after applying the mutations. The last is important in case fence was moved while we were handling the request - some post-fence actions might have already happened at this time, so we can't treat the request as successful. For example, if topology change coordinator was switching to `write_both_read_new`, streaming might have already started and missed this update. In `mutate_counters` we can use a single `fencing_token` for all leaders, since all the erms are processed without yields and should underneath share the same `token_metadata`. We don't pass fencing token for replication explicitly in `replicate_counter_from_leader` since `mutate_counter_on_leader_and_replicate` doesn't capture erm and if the drain on the coordinator timed out the erm for replication might be different and we should use the corresponding (maybe the new one) topology version for outgoing write replication requests. This delayed replication is similar to any other background activity (e.g. writing hints) - it takes the current erm and the current `token_metadata` version for outgoing requests. Closes #14564 * github.com:scylladb/scylladb: counter_mutation: add fencing encode_replica_exception_for_rpc: handle the case when result type is a single exception_variant counter_mutation: add replica::exception_variant to signature	2023-08-01 12:41:22 +02:00
Benny Halevy	2902a4136f	storage_proxy: query_partition_key_range_concurrent: maybe_yield in loop Add calls to `maybe_yield` in the per-range loops to prevent stalls if the loop never yields. Note: originally the stalls were detected in nested calls to `query_partition_key_range_concurrent` (see #14008). This series turned the tail-recursion into iteration, but still the inner loop(s) never yield and do quite a lot of computations - so they mioght stall when called with a large number of ranges. Fixes #14008 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-07-31 09:54:34 +03:00
Benny Halevy	8d5020b8f6	storage_proxy: query_partition_key_range_concurrent: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-07-31 09:43:33 +03:00
Benny Halevy	3c122a87b5	storage_proxy: query_partition_key_range_concurrent: turn tail recursion to iteration Update the function state and loop for the next ranges instead of nesting it oneself. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-07-31 09:43:33 +03:00
Benny Halevy	fd119469d8	storage_proxy: coroutinize query_partition_key_range Prepare for coroutinizing query_partition_key_range_concurrent. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-07-31 08:22:24 +03:00
Avi Kivity	ff1f461a42	Merge 'Introduce tablet load balancer' from Tomasz Grabiec After this series, tablet replication can handle the scenario of bootstrapping new nodes. The ownership is distributed indirectly by the means of a load-balancer which moves tablets around in the background. See docs/dev/topology-over-raft.md for details. The implementation is by no means meant to be perfect, especially in terms of performance, and will be improved incrementally. The load balancer will be also kicked by schema changes, so that allocation/deallocation done during table creation/drop will be rebalanced. Tablet data is streamed using existing `range_streamer`, which is the infrastructure for "the old streaming". This will be later replaced by sstable transfer once integration of tablets with compaction groups is finished. Also, cleanup is not wired yet, also blocked by compaction group integration. Closes #14601 * github.com:scylladb/scylladb: tests: test_tablets: Add test for bootstraping a node storage_service: topology_coordinator: Implement tablet migration state machine tablets: Introduce tablet_mutation_builder service: tablet_allocator: Introduce tablet load balancer tablets: Introduce tablet_map::for_each_tablet() topology: Introduce get_node() token_metadata: Add non-const getter of tablet_metadata storage_service: Notify topology state machine after applying schema change storage_service: Implement stream_tablet RPC tablets: Introduce global_tablet_id stream_transfer_task, multishard_writer: Work with table sharder tablets: Turn tablet_id into a struct db: Do not create per-keyspace erm for tablet-based tables tablets: effective_replication_map: Take transition stage into account when computing replicas tablets: Store "stage" in transition info doc: Document tablet migration state machine and load balancer locator: erm: Make get_endpoints_for_reading() always return read replicas storage_service: topology_coordinator: Sleep on failure between retries storage_service: topology_coordinator: Simplify coordinator loop main: Require experimental raft to enable tablets	2023-07-26 12:30:29 +03:00
Botond Dénes	ad2ddffb22	Merge 'Remove qctx from system_keyspace::save_truncation_record()' from Pavel Emelyanov The method is called by db::truncate_table_on_all_shards(), its call-chain, in turn, starts from - proxy::remote::handle_truncate() - schema_tables::merge_schema() - legacy_schema_migrator - tests All of the above are easy to get system_keyspace reference from. This, in turn, allows making the method non-static and use query_processor reference from system_keyspace object in stead of global qctx Closes #14778 * github.com:scylladb/scylladb: system_keyspace: Make save_truncation_record() non-static code: Pass sharded<db::system_keyspace>& to database::truncate() db: Add sharded<system_keyspace>& to legacy_schema_migrator	2023-07-26 08:48:49 +03:00
Tomasz Grabiec	6d545b2f9e	storage_service: Implement stream_tablet RPC Performs streaming of data for a single tablet between two tablet replicas. The node which gets the RPC is the receiving replica.	2023-07-25 21:08:51 +02:00
Tomasz Grabiec	7851694eaa	locator: erm: Make get_endpoints_for_reading() always return read replicas Just a simplification. Drop the test case from token_metadata which creates pending endpoints without normal tokens. It fails after this change with exception: "sorted_tokens is empty in first_token_index!" thrown from token_metadata::first_token_index(), which is used when calculating normal endpoints. This test case is not valid, first node inserts its tokens as normal without going through bootstrap procedure.	2023-07-25 21:08:01 +02:00
Petr Gusev	116444a01b	counter_mutation: add fencing As for regular mutations, we do the check twice in handle_counter_mutation, before and after applying the mutations. The last is important in case fence was moved while we were handling the request - some post-fence actions might have already happened at this time, so we can't treat the request as successful. For example, if topology change coordinator was switching to write_both_read_new, streaming might have already started and missed this update. In mutate_counters we can use a single fencing_token for all leaders, since all the erms are processed without yields and should underneath share the same token_metadata. We don't pass fencing token for replication explicitly in replicate_counter_from_leader since mutate_counter_on_leader_and_replicate doesn't capture erm and if the drain on the coordinator timed out the erm for replication might be different and we should use the corresponding (maybe the new one) topology version for outgoing write replication requests. This delayed replication is similar to any other background activity (e.g. writing hints) - it takes the current erm and the current token_metadata version for outgoing requests.	2023-07-25 12:10:03 +04:00
Petr Gusev	edbb5cbb5f	encode_replica_exception_for_rpc: handle the case when result type is a single exception_variant We will need it in later commit to return exceptions from handle_counter_mutation. We also add utils::Tuple concept restriction for add_replica_exception_to_query_result since its type parameters are always tuples.	2023-07-25 12:09:21 +04:00
Petr Gusev	f2cbdc7f18	counter_mutation: add replica::exception_variant to signature We are going to add fencing for counter mutations, this means handle_counter_mutation will sometimes throw stale_topology_exception. RPC doesn't marshall exceptions transparently, exceptions thrown by server are delivered to the client as a general remote_verb_error, which is not very helpful. The common practice is to embed exceptions into handler result type. In this commit we use already existing exception_variant as an exception container. We mark exception_variant with [[version]] attribute in the idl file, this should handle the case when the old replica (without exception_variant in the signature) is replying to the new one.	2023-07-25 12:09:19 +04:00
Petr Gusev	5fb8da4181	hints: add fencing In this commit we just pass a fencing_token through hint_mutation RPC verb. The hints manager uses either storage_proxy::send_hint_to_all_replicas or storage_proxy::send_hint_to_endpoint to send a hint. Both methods capture the current erm and use the corresponding fencing token from it in the mutation or hint_mutation RPC verb. If these verbs are fenced out, the server stale_topology_exception is translated to a mutation_write_failure_exception on the client with an appropriate error message. The hint manager will attempt to resend the failed hint from the commitlog segment after a delay. However, if delivery is unsuccessful, the hint will be discarded after gc_grace_seconds. Closes #14580	2023-07-24 18:12:48 +02:00
Pavel Emelyanov	eaeffcdb81	code: Pass sharded<db::system_keyspace>& to database::truncate() The arguments goes via the db::(drop\|truncate)_table_on_all_shards() pair of calls that start from - storage_proxy::remote: has its sys.ks reference already - schema_tables::merge_schema: has sys.ks argument already - legacy_schema_migrator: the reference was added by previous patch - tests: run in cql_test_env with sys.ks on board Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-07-21 13:11:59 +03:00
Pavel Emelyanov	d9ba8eb8df	service/paxos: Add db::system_keyspace& argument to some methods The paxos_state's .prepare(), .accept(), .learn() and .prune() methods access system keyspace via its static methods. The only caller of those (storage_proxy::remote) already has the sharded system k.s. reference and can pass its .local() one as argument Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-07-19 19:32:10 +03:00
Pavel Emelyanov	b0b91bf5ec	proxy/remote: Keep sharded<db::system_keyspace>& dependency This dependency will be needed to call service::paxos_state:: calls and all of them are done in storage_proxy::remote() methods only Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-07-19 17:36:42 +03:00
Patryk Jędrzejczak	56bd9b5db3	service: storage_proxy: do not report abort requests in handle_write We don't want to report aborts in storage_proxy::handle_write, because it can be only triggered by shutdowns and timeouts. Before this change, such reports flooded logs when a drained node still received the write RPCs.	2023-07-17 12:27:36 +02:00
Patryk Jędrzejczak	f9db9f5943	service: storage_proxy: encode abort_requested_exception in handle_read storage_proxy::handle_read now makes sure that abort_requested_exception is encoded in a way that preserves its type information. This allows the coordinator to properly deserialize and handle it. Before this change, if a drained replica was still receiving the read RPCs, it would flood the coordinator's logs with std::runtime_error reports.	2023-07-17 12:27:36 +02:00
Patryk Jędrzejczak	68bd0424c2	service: storage_proxy: refactor encode_replica_exception_for_rpc To properly handle abort_requested_exception thrown from migration_manager::get_schema_for_read in storage_proxy::handle_read (we do in the next commit) we have to somehow encode and return it. The encode_replica_exception_for_rpc function is not suitable for that because it requires the SourceTuple type (of a value returned by do_query()) which we don't know when calling get_schema_for_read. We move the part of encode_replica_exception_for_rpc responsible for handling exceptions to a new function and rewrite it in a way that doesn't require the SourceTuple type. As this function fits the name encode_replica_exception_for_rpc better, we name it this way and rename the previous encode_replica_exception_for_rpc.	2023-07-17 12:27:33 +02:00
Patryk Jędrzejczak	a21c4abad7	replica: add abort_requested_exception to exception_variant If migration_manager::get_schema_for_write is called after migration_manager::drain, it throws abort_requested_exception. This exception is not present in replica::exception_variant, which means that RPC doesn't preserve information about its type. If it is thrown on the replica side, it is deserialized as std::runtime_error on the coordinator. Therefore, abstract_read_resolver::error logs information about this exception, even though we don't want it (aborts are triggered on shutdown and timeouts). To solve this issue, we add abort_requested_exception to replica::exception_variant and, in the next commits, refactor storage_proxy::handle_read so that abort_requested_exception thrown in migration_manager::get_schema_for_write is properly serialized. Thanks to this change, unchanged abstract_read_resolver::error correctly handles abort_requested_exception thrown on the replica side by not reporting it.	2023-07-13 16:57:10 +02:00
Kefu Chai	5443bf69f7	storage_proxy: print the expected ex.what() before this change, the format string contains two placeholders, but only one extra argument is passed in. if we actually format this logging message, fmtlib would throw. after this change, we pass the exception's error message as yet another argument. this logging message is printed with "trace" level, guess that's why we haven't have the exception thrown by fmtlib. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14628	2023-07-12 12:34:51 +03:00
Petr Gusev	1e851262f2	storage_proxy: handler responses, use pointers to default constructed values instead of nulls The current Seastar RPC infrastructure lacks support for null values in tuples in handler responses. In this commit we add the make_default_rpc_tuple function, which solves the problem by returning pointers to default-constructed values for smart pointer types rather than nulls. The problem was introduced in this commit `2d791a5ed4`. The function `encode_replica_exception_for_rpc` used `default_tuple_maker` callback to create tuples containing exceptions. Callers returned pointers to default-constructed values in this callback, e.g. `foreign_ptr(make_lw_shared<reconcilable_result>())`. The commit changed this to just `SourceTuple{}`, which means nullptr for pointer types. Fixes: #14282 Closes #14352	2023-06-26 11:10:38 +03:00
Gleb Natapov	94fcba5662	storage_proxy: remove unused variable	2023-06-22 15:26:20 +03:00
Avi Kivity	e233f471b8	Merge 'Respect tablet shard assignment' from Tomasz Grabiec This PR changes the system to respect shard assignment to tablets in tablet metadata (system.tablets): 1. The tablet allocator is changed to distribute tablets evenly across shards taking into account currently allocated tablets in the system. Each tablet has equal weight. vnode load is ignored. 2. CDC subsystem was not adjusted (not supported yet) 3. sstable sharding metadata reflects tablet boundaries 5. resharding is NOT supported yet (the node will abort on boot if there is a need to reshard tablet-based tables) 6. The system is NOT prepared to handle tablet migration / topology changes in a safe way. 7. Sstable cleanup is not wired properly yet After this PR, dht::shard_of() and schema::get_sharder() are deprecated. One should use table::shard_of() and effective_replication_map::get_sharder() instead. To make the life easier, support was added to obtain table pointer from the schema pointer: ``` schema_ptr s; s->table().shard_of(...) ``` Closes #13939 * github.com:scylladb/scylladb: locator: network_topology_startegy: Allocate shards to tablets locator: Store node shard count in topology service: topology: Extract topology updating to a lambda test: Move test_tablets under topology_experimental sstables: Add trace-level logging related to shard calculation schema: Catch incorrect uses of schema::get_sharder() dht: Rename dht::shard_of() to dht::static_shard_of() treewide: Replace dht::shard_of() uses with table::shard_of() / erm::shard_of() storage_proxy: Avoid multishard reader for tablets storage_proxy: Obtain shard from erm in the read path db, storage_proxy: Drop mutation/frozen_mutation ::shard_of() forward_service: Use table sharder alternator: Use table sharder db: multishard: Obtain sharder from erm sstable_directory: Improve trace-level logging db: table: Introduce shard_of() helper db: Use table sharder in compaction sstables: Compute sstable shards using sharder from erm when loading sstables: Generate sharding metadata using sharder from erm when writing test: partitioner: Test split_range_to_single_shard() on tablet-like sharder dht: Make split_range_to_single_shard() prepared for tablet sharder sstables: Move compute_shards_for_this_sstable() to load() dht: Take sharder externally in splitting functions locator: Make sharder accessible through effective_replication_map dht: sharder: Document guarantees about mapping stability tablets: Implement tablet sharder tablets: Include pending replica in get_shard() dht: sharder: Introduce next_shard() db: token_ring_table: Filter out tablet-based keyspaces db: schema: Attach table pointer to schema schema_registry: Fix SIGSEGV in learn() when concurrent with get_or_load() schema_registry: Make learn(schema_ptr) attach entry to the target schema test: lib: cql_test_env: Expose feature_service test: Extract throttle object to separate header	2023-06-21 10:20:41 +03:00
Calle Wilund	f18e967939	storage_proxy: Make split_stats resilient to being called from different scheduling group Fixes #11017 When doing writes, storage proxy creates types deriving from abstract_write_response_handler. These are created in the various scheduling groups executing the write inducing code. They pick up a group-local reference to the various metrics used by SP. Normally all code using (and esp. modifying) these metrics are executed in the same scheduling group. However, if gossip sees a node go down, it will notify listeners, which eventually calls get_ep_stat and register_metrics. This code (before this patch) uses _active_ scheduling group to eventually add metrics, using a local dict as guard against double regs. If, as described above, we're called in a different sched group than the original one however, this can cause double registrations. Fixed here by keeping a reference to creating scheduling group and using this, not active one, when/if creating new metrics. Closes #14294	2023-06-21 10:08:27 +03:00
Tomasz Grabiec	21198e8470	treewide: Replace dht::shard_of() uses with table::shard_of() / erm::shard_of() dht::shard_of() does not use the correct sharder for tablet-based tables. Code which is supposed to work with all kinds of tables should use erm::get_sharder().	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	fb0bdcec0c	storage_proxy: Avoid multishard reader for tablets Currently, the coordinator splits the partition range at vnode (or tablet) boundaries and then tries to merge adjacent ranges which target the same replica. This is an optimization which makes less sense with tablets, which are supposed to be of substantial size. If we don't merge the ranges, then with tablets we can avoid using the multishard reader on the replica side, since each tablet lives on a single shard. The main reason to avoid a multishard reader is avoiding its complexity, and avoiding adapting it to work with tablet sharding. Currently, the multishard reader implementation makes several assumptions about shard assignment which do not hold with tablets. It assumes that shards are assigned in a round-robin fashion.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	10e05eec66	storage_proxy: Obtain shard from erm in the read path dht::shard_of() does not use the correct sharder for tablet-based tables. Code which is supposed to work with all kinds of tables should use erm::get_sharder().	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	e48ec6fed3	db, storage_proxy: Drop mutation/frozen_mutation ::shard_of() dht::shard_of() does not use the correct sharder for tablet-based tables. Code which is supposed to work with all kinds of tables should use erm::get_sharder().	2023-06-21 00:58:24 +02:00
Kamil Braun	732feca115	storage_proxy: query_partition_key_range_concurrent: don't access empty range `query_partition_range_concurrent` implements an optimization when querying a token range that intersects multiple vnodes. Instead of sending a query for each vnode separately, it sometimes sends a single query to cover multiple vnodes - if the intersection of replica sets for those vnodes is large enough to satisfy the CL and good enough in terms of the heat metric. To check the latter condition, the code would take the smallest heat metric of the intersected replica set and compare them to smallest heat metrics of replica sets calculated separately for each vnode. Unfortunately, there was an edge case that the code didn't handle: the intersected replica set might be empty and the code would access an empty range. This was catched by an assertion added in `8db1d75c6c` by the dtest `test_query_dc_with_rf_0_does_not_crash_db`. The fix is simple: check if the intersected set is empty - if so, don't calculate the heat metrics because we can decide early that the optimization doesn't apply. Also change the `assert` to `on_internal_error`. Fixes #14284 Closes #14300	2023-06-20 07:56:40 +03:00
Petr Gusev	94605e4839	storage_proxy.cc: add fencing to read RPCs On the call site we use the version captured in read_executor/erm/token_metadata. In the handlers we use apply_fence twice just like in mutation RPC. Fencing was also added to local query calls, such as query_result_local in make_data_request. This is for the case when query coordinator was isolated from topology change coordinator and didn't receive barrier_and_drain.	2023-06-15 15:52:50 +04:00
Petr Gusev	4004ce1f44	storage_proxy.cc: extract handle_read We continue the refactoring by introducing the common implementation for all read methods.	2023-06-15 15:52:50 +04:00
Petr Gusev	2d791a5ed4	storage_proxy.cc: refactor encode_replica_exception_for_rpc We are going to add fencing to read RPCs, it would be easier to do it once for all three of them. This refactoring enables this since it allows to use encode_replica_exception_for_rpc for handle_read_digest.	2023-06-15 15:52:50 +04:00
Petr Gusev	6b115e902b	storage_proxy: fix indentation	2023-06-15 15:52:50 +04:00
Petr Gusev	46f73fcaa6	storage_proxy: add fencing for mutation At the call site, we use the version, captured in erm/token_metadata. In the handler, we use double checking, apply_fence after the local write guarantees that no mutations succeed on coordinators if the fence version has been updated on the replica during the write. Fencing was also added to mutate_locally calls on request coordinator, for the case if this coordinator was isolated from the topology change coordinator and missed the barrier_and_drain command.	2023-06-15 15:52:49 +04:00
Petr Gusev	d34da12240	storage_proxy: add fencing_token and related infrastructure A new stale_topology_exception was introduced, it's raised in apply_fence when an RPC comes with a stale fencing_token. An overload of apply_fence with future will be used to wrap the storage_proxy methods which need to be fenced.	2023-06-15 15:48:00 +04:00
Kamil Braun	a740fbf58a	storage_proxy: rename `init_messaging_service` to `start_remote` The function now has more responsibilities than before, rename it and add a comment to better illustrate this.	2023-06-14 11:41:36 +02:00
Kamil Braun	f26e98c3be	storage_proxy: don't pass `gossiper&` and `messaging_service&` during initialization These services are now passed during `init_messaging_service`, and that's when the `remote` object is constructed. The `remote` object is then destroyed in `uninit_messaging_service`. Also, `migration_manager*` became `migration_manager&` in `init_messaging_service`.	2023-06-14 11:41:36 +02:00
Kamil Braun	10f11b89ea	storage_proxy: prepare for missing `remote` Prepare the users of `remote` for the possibility that it's gone. The `remote()` accessor throws an error if it's gone. Observe that `remote()` is only used in places where it's verified that we really want to send a message to a remote node, with a small exception: `truncate_blocking`, which truncates locally by sending an RPC to ourselves (and truncate always sends RPC to the whole cluster; we might want to change this behavior in the future, see #11087). Other places are easy to check (it's either implementations of `apply_remotely` which is only called for remote nodes, or there's an `if` that checks we don't apply the operation to ourselves). There is one direct access to `_remote` which checks first if `_remote` is available: `storage_proxy::is_alive`. If `_remote` is unavailable, we consider nodes other than us dead. Indeed, if `gossiper` is unavailable, we didn't have a chance to gossip with other nodes and mark them alive.	2023-06-14 11:41:36 +02:00
Kamil Braun	8db1d75c6c	storage_proxy: don't access `remote` during local queries in `query_partition_key_range_concurrent` In `query_partition_key_range_concurrent` there's a calculation of cache hit rates which requires accessing `gossiper` through `remote`. We want to support local queries when `remote` is unavailable. Check if it's a local query and only if not, fetch `gossiper` from `remote`.	2023-06-14 11:41:36 +02:00
Kamil Braun	ddcbade919	storage_proxy: don't access `remote` when calculating target replicas for local queries We only want to access `remote` when it's necessary - when we're performing a query that involves remote nodes. We want to support local queries when `remote` (in particular, `gossiper&`) is unavailable. Add a helper, `storage_proxy::filter_replicas_for_read`, which will check if it's a local query and return early in that case without accessing `remote`.	2023-06-14 11:41:34 +02:00
Kamil Braun	ff8d88a228	storage_proxy: introduce const version of `remote()` One version is implemented using the other (with `const_cast`) because some additional safety checks will be added in later commit.	2023-06-13 12:44:03 +02:00
Kamil Braun	c5c78a7922	storage_proxy: `endpoint_filter`: remove gossiper dependency The function used `gossiper&` to check whether an endpoint is considered alive. Abstract this out through `noncopyable_function`. This will allow us to use `endpoint_filter` during local queries when `remote` (which contains the `gossiper` reference) is unavailable.	2023-06-12 15:23:48 +02:00
Petr Gusev	3a88c7769f	tracing::trace_info: pass by ref sizeof(std::optional<tracing::trace_info>) == 64 bytes, so it should be more efficient.	2023-05-30 14:32:10 +04:00

1 2 3 4 5 ...

1063 Commits