There is no need to call get_natural_endpoints for every token
in sorted_tokens order, since we can just get the precalculated
per-token endpoints already in the _replication_map member.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Reduce large allocations and reactor stalls seen in #11005
by open coding `get_address_ranges` and using std::vector::insert
to efficiently appending the ranges returned by `get_primary_ranges_for`
onto the returned token_range_vector in contrast to building
an unordered_multimap<inet_address, dht::token_range> first in
`get_address_ranges` and traversing it and adding one token_range
at a time.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
And use that in sights using the endpoint set
returned by abstract_replication_strategy::calculate_natural_endpoints.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
"
There are several helpers in this .cc file that need to get datacenter
for endpoints. For it they use global snitch, because there's no other
place out there to get that data from.
The whole dc/rack info is now moving to topology, so this set patches
the consistency_level.cc to get the topology. This is done two ways.
First, the helpers that have keyspace at hand may get the topology via
ks's effective_replication_map.
Two difficult cases are db::is_local() and db.count_local_endpoints()
because both have just inet_address at hand. Those are patched to be
methods of topology itself and all their callers already mess with
token metadata and can get topology from it.
"
* 'br-consistency-level-over-topology' of https://github.com/xemul/scylla:
consistency_level: Remove is_local() and count_local_endpoints()
storage_proxy: Use topology::local_endpoints_count()
storage_proxy: Use proxy's topology for DC checks
storage_proxy: Keep shared_ptr<proxy> on digest_read_resolver
storage_proxy: Use topology local_dc_filter in its methods
storage_proxy: Mark some digest_read_resolver methods private
forwarding_service: Use topology local_dc_filter
storage_service: Use topology local_dc_filter
consistency_level: Use topology local_dc_filter
consitency-level: Call count_local_endpoints from topology
consistency_level: Get datacenter from topology
replication_strategy: Remove hold snitch reference
effective_replication_map: Get datacenter from topology
topology: Add local-dc detection shugar
In some of db/consistency_level.cc helpers the topology can be
obtained from keyspace's effective replication map
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When the strategy is constructed there's no place to get snitch from
so the global instance is used. However, after previous patch the
replication strategy no longer needs snitch, so this dependency can
be dropped
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now it gets it from snitch, but the dc/rack info is being relocated
onto topology. The topology is in turn already there
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
"
The helper is in charge of receiving INTERNAL_IP app state from
gossiper join/change notifications, updating system.peers with it
and kicking messaging service to update its preferred ip cache
along with initiating clients reconnection.
Effectively this helper duplicates the topology tracking code in
storage-service notifiers. Removing it makes less code and drops
a bunch of unwanted cross-components dependencies, in particular:
- one qctx call is gone
- snitch (almost) no longer needs to get messaging from gossiper
- public:private IP cache becomes local to messaging and can be
moved to topology at low cost
Some nice minor side effect -- this helper was left unsubscribed
from gossiper on stop and snitch rename. Now its all gone.
"
* 'br-remove-reconnectible-snitch-helper-2' of https://github.com/xemul/scylla:
snitch: Remove reconnectable snitch helper
snitch, storage_service: Move reconnect to internal_ip kick
snitch, storage_service: Move system.peers preferred_ip update
snitch: Export prefer-local
Given #11146, we see a 10ms stall when calculate_natural_endpoints
calls get_all_endpoints that up until this patch performed a
similar loop on the `_token_to_endpoint_map`, so to prevent such
a stall with large number of tokens, turn update_normal_token_owners
async, and allow yielding in the per-token tight loop.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
We don't need to recalculate the unique set of normal token
everytime we change `_token_to_endpoint_map`.
Similarly, this doesn't have to be done in `get_all_endpoints`.
Instead we can maintain it inexpensively in
`remove_endpoint`, and let `count_normal_token_owners`
just return its size and `get_all_endpoints` just return
the saved set.
Note that currently topology is not updated accurately
in update_normal_token() and it may contain endpoint
that do no longer own any tokens.
If we did update topology accurately there, we
could use its locations map instead as its keys are equivalent
to the unordered_set<inet_address> we implement here.
Closes#11128
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
It's often needed to check if an endpoint sits in the same DC as the
current node. It can be done by
topo.get_datacenter() == topo.get_datacenter(endpoint)
but in some cases a RAII filter function can be helpful.
Also there's a db::count_local_endpoints() that is surprisingly in use,
so add it to topology as well. Next patches will make use of both.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently, if token_metadata_impl::update_normal_tokens
throws an exception before it's done, it leaves the
token_metadata_impl members partially updated
and we have no way of recovering from that.
The existing use cases take that into account
and always call it on a cloned, temporary copy of the token
metadata, so if it throws, the temporary copy is tossed away
without being applied back.
So just cement this, by adding cautions in the token_metadata
class declaration.
Closes#11127
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220728144821.130518-1-bhalevy@scylladb.com>
To get the list of tokens for a given node, we loop through all the
tokens and calculate the nodes that are responsible for the token.
In case of the everywhere_topology, we know any node that is part of the
the ring will be responsible for all tokens.
This patch adds a fast path for everywhere_topology to avoid calculating
natural endpoints.
Refs #10337
Refs #10817
Refs #10836
Refs #10837
If the number of nodes in the cluster is smaller than the desired
replication factor we should return the loop when endpoints already
contains all the nodes in the cluster because no more nodes could be
added to endpoints lists
Refs #10337
Refs #10817
Refs #10836
Refs #10837
Currently, a set of nodes is built from _token_to_endpoint_map to get
the number of nodes in _token_to_endpoint_map.
To make it faster so we can call it on a fast path in the following
patch, a _nr_normal_token_owners member is introduced to track the
number.
Refs #10337
Refs #10817
Refs #10836
Refs #10837
The same thing as in previous patch -- when gossiper issues
on_join/_change notification, storage service can kick messaging
service to update its internal_ip cache and reconnect to the peer.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently the INTERNAL_IP state is updated using reconnectable helper
by subscribing on on_join/on_change events from gossiper. The same
subscription exists in storage service (it's a bit more elaborated by
checking if the node is the part of the ring which is OK).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The boolean bit says whether "the system" should prefer connecting to
the address gossiper around via INTERNAL_IP. Currently only gossiping
property file snitch allows to tune it and ec2-multiregion snitch
prefers internal IP unconditionally.
So exporting consists of 2 pieces:
- add prefer_local() snitch method that's false by default or returns
the (existing) _prefer_local bit for production snitch base
- set the _prefer_local to true by ec2-multiregion snitch
While at it the _prefer_local is moved to production_snitch_base for
uniformity with the new prefer_local() call
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
For now they just forward the request to snitch. Once topology is
properly updated boot-time dc/rack info and knows internal IP
it will be able to serve request on its own.
For convenience overloads without arguments return dc/rack for
current node.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
If there are zero leaving nodes, no need to calculate anything. This
saves time for calculating pending ranges in large clusters
significantly to avoid unnecessary calculation.
Refs #10337Closes#10822
The replication happens on all shards but current one. There's a special
helper in seastar for such cases
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
All snitch drivers are supposed to snitch info on some shard and
replicate the dc/rack info across others. All, but azure really do so.
The azure one gets dc/rack on all shards, which's excessive but not
terrible, but when all shards start to replicate their data to all the
others, this may lead to use-after-frees.
fixes: #10494
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
After fcb8d040 ("treewide: use Software Package Data Exchange
(SPDX) license identifiers"), many dual-licensed files were
left with empty comments on top. Remove them to avoid visual
noise.
Closes#10562
Each driver has a pointer to this shard snitch_ptr which, in turn, has
the reference on gossiper. This lets drivers stop using the global
gossiper instance.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The reference is put on the snitch_ptr because this is the sharded<>
thing and because gossiper reference is the same for different snitch
drivers. Also, getting gossiper from snitch_ptr by driver will look
simpler than getting it from any base class.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It is typical in switch statements to select on an enum type and
rely on the compliler to complain if an enum value was missed. But
gcc isn't satisified since the enum could have a value outside the
declared list. Call abort() in this impossible situation to pacify
it.
After previous patches both, create_snitch() and stop_snitch() no look
like the classica sharded service start/stop sequence. Finally both
helpers can be removed and the rest of the user can just call start/stop
on locally obtained sharded references.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Both first stop/pause snitch driver on io-ing shard, then proceed with
the rest. This sequence is pretty pointless and here's why.
The only non-trivial stop()/pause_io() method out there is in the
property-file snitch driver. In it, both methods check if the current
shard is the io-ing one, if no -- return back the resolved future, if
yes -- go ahead and stop/pause some IO. With this, for all shards but
io-ing one there's no point in starting after io-ing one is stopped,
they all can start (and finish) in parallel.
So what this patch does is just removes the pre-stop/pause kicking of
the io-ing shard.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Current API to create snitch is not like other services -- there's a
dedicated helper that does sharded<>.start() + invoke_on_all(&start)
calls. These helpers complicate do-globalization of snitch and rework
of services start-stop sequence, things get simpler if snitch uses
the same start-stop API as all the others. The first step towards this
change is moving the non-waiting parts of snitch initialization code
from init_snitch_obj() into snitch_ptr constructor.
A note on this change: after patch #2 the snitch_ptr<->driver linkage
connects local objects with each other, not container() of any. This
is important, because connecting container() would be impossible inside
constructor, as the container pointer is initialized by seastar _after_
the service constructor itself.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently snitch drivers register themselves in class-registry with all
sorts of construction options possible. All those different constuctors
are in fact "config options".
When later snitch will declare its dependencies (gossiper and system
keyspace), it will require patching all this registrations, which's very
inconvenient.
This patch introduces the snitch_config struct and replaces all the
snitch constructors with the snitch_driver(snitch_config cfg) one.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This patch makes the snitch base class reference local snitch_ptr, not
its sharded<> container and, respectively, makes the base container()
method return _backreference->container() instead.
The motivation of this change is, again, in the next patch, which will
move snitch_ptr<->driver_object linkage into snitch_ptr constructor.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Some snitch drivers want the peering_sharded_service::container()
functionality, but they can't directly use it, because the driver
class is in fact the pimplification behind the sharded<snitch_ptr>
service. To overcome this there's a _my_distributed pointer on the
driver base class that points back to sharded<snitch_ptr> object.
This patch replaces the direct _my_distributed usage with the
container() method that does it and also asserts that the pointer
in question is initialized (some drivers already do it, some don't).
Other than making the code more peering_sharded_service-like, this
patch allows changing _my_distributed into _backreference that
points to this shard's snitch_ptr, see next patch.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is the last place that still uses gossip_snitch_info(). It
can be reworked to use the get_app_states(), then the former
helper can be removed.
Another motivation for this is to stop using the _gossiper_started
boolean from the base class. This, in turn, will allow to remove
the whole gossiper_starting() notification altogether.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In its .start() helper the property-file driver does everything but
registers the reconnectable helper (like the ec2 m.r. one from the
previous patch did). Similarly to ec2 m.r. snitch this one can also
register its helper in .start(), before gossiper_starting() is called.
One thing to care about in this driver is that some tests start this
snitch without starting gossiper, thus an extra protection against
not initialized gossiper is needed.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This driver registers reconnectable helper in it gossiper_starting()
callback. It can be done earlier -- in the snitch .start() one, as
gossiper doesn't notify listeners until its started for real (event
its shardow round doesn't kick them).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Nowadays snitch states are put into gossiper via .gossiper_starting()
call by gossiper. This, in turn, happens in two places -- on node
ring join code and on re-enabling gossiper via the API call.
The former can be performed by the ring joining code with the help of
recently introduced snitch.get_app_states() helper.
The latter call is in fact not needed. Re-gossiped are DC, RACK and
for some drivers the INTERNAL_IP states that don't change throughout
snitch lifetime and are preserved in the gossiper pre-loaded states.
Thus, once the snitch states are applied by storage service ring join
code, the respective states udpate can be removed from the snitch
gossiper_starting() implementations.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This virtual method returns back the list of app states that snitch
drivers need to gossip around. The exact implementation copies the
gossip_snitch_info() logic of the respective drivers and is unused.
Next patches will make use of it (spoiler: the latter method will be
removed after that).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The driver in question wants to execute some of its actions on shard 0
and it calls smp::invoke(0, ...) for this. The invoked lambda thus needs
to refer to global snitch instance.
There's nicer and shorter way of re-sharding for snith drivers -- the
sharded<snith_ptr>* _my_distributed field on the base class.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's snitch code that needs it. It now takes messaging service
from gossiper, so it can do the same with system keyspace. This
change removes one user of the global sys.ks. cache instance.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>