Commit Graph

40965 Commits

Author SHA1 Message Date
Amnon Heiman
fc9bd2de03 config: Add prometheus_allow_protobuf flag
Native histograms (also known as sparse histograms) are an experimental
Prometheus feature. They use protobuf as the reporting layer.  The
prometheus_allow_protobuf flag allows the user to enable protobuf
protocol. When this flag is set to true, and the Prometheus server sends
in the request that it accepts protobuf, the result will be in protobuf
protocol.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2024-01-23 13:12:07 +02:00
Piotr Dulikowski
79c3ed7fdb service: move topology mutation builder out of storage_service
The topology_mutation_builder, topology_node_mutation_builder and
topology_request_tracking_mutation_builder are currently used by
storage service - mainly, but not exclusively, by the topology
coordinator logic. As we are going to extract the topology coordinator
to a separate file, we need to move the builders to their own file as
well so that they will be accessible both by the topology coordinator
and the storage service.
2024-01-23 11:17:46 +01:00
Piotr Dulikowski
6f11651222 storage_service: detemplate topology_node_mutation_builder::set
One of the overloads of `topology_node_mutation_builder::set` is a
template which takes a std::set of things that convert to a sstring.
This was done to support sets of strings of different types (e.g.
sstring, string_view) but it turns out that only sstring is used at the
moment.

De-template the method as it is unnecessary for it to be a template.
Moreover, the `topology_node_mutation_builder` is going to be moved in
the next commit of the PR to a separate file, so not having template
methods makes the task simpler.
2024-01-23 11:17:46 +01:00
Nadav Har'El
830e52008d test/alternator: add more tests for TagResource
Issue #16904 discovered that Alternator refuses to allow an empty tag
value while it's useful (and DynamoDB allows it). This brought to my
attention that our test coverage of the TagResource operation was lacking.
So this patch adds more tests for some corner cases of TagResource which
we missed, including the allowed lengths of tag keys and values.

These tests reproduce #16904 (the case of empty tag value) and also #16908
(allowing and correctly counting unicode letters), and also add
regression testing to cases which we already handled correctly.

As usual, all the new tests also pass on DynamoDB.

Refs #16904
Refs #16908

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-23 11:55:22 +02:00
Nadav Har'El
08b26269d8 alternator: allow empty tag value
The existing code incorrectly forbid setting a tag on a table to an empty
string value, but this is allowed by DynamoDB and is useful, so we fix it
in this patch.

While at it, improve the error-checking code for tag parameters to
cleanly detect more cases (like missing or non-string keys or values).

The following patch is a test that fails before this patch (because
it fails to insert a tag with an empty value) and passes after it.

Fixes #16904.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-23 11:26:08 +02:00
Michał Jadwiszczak
49544c47a1 reader_concurrency_semaphore: add name of semaphore in tracing messages 2024-01-23 10:25:34 +01:00
Michał Jadwiszczak
aac90c1f92 cql3:query_processor: add logged user to query tracing info 2024-01-23 10:25:34 +01:00
Nadav Har'El
4d6b286345 test/alternator: add "--vnodes" option to run script
test/cql-pytest/run.py was recently modified to add the "tablets"
experimental feature, so test/alternator/run now also runs Scylla by
default with tablets enabled.

This is the correct default going forward, but in the short term it
would be nice to also have an option to easily do a manual test run
*without* tablets.

So this patch adds a "--vnodes" option to the test/alternator/run script.
This option causes "run" to run Scylla without enabling the "tablets"
experimental feature.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-23 10:53:23 +02:00
Nadav Har'El
c496d60716 alternator: use tablets by default, if available
Before this patch, Alternator tables did not use tablets even if this
feature was available - tablets had to be manually enabled per table
by using a tag. But recently we changed CQL to enable tablets by default
on all keyspaces (when the experimental "tablets" option is turned on),
so this patch does the same for Alternator tables:

1. When the "tablets" experimental feature is on, new Alternator tables
   will use tablets instead of vnodes. They will use the default choice
   of initial_tablets.

2. The same tag that in the past could be used to enable tablets on a
   specific table, now can be used to disable tablets or change the
   default initial_tablets for a specific table at creation time.

Fixes #16355

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-23 10:53:23 +02:00
Nadav Har'El
36f14f89df test/alternator: run some tests without tablets
If an Alternator table uses tablets (we'll turn this on in a following
patch), some tests are known to fail because of features not yet
supported with tablets, namely:

  Refs #16317 - Support Alternator Streams with tablets (CDC)
  Refs #16567 - Support Alternator TTL with tablets

This patch changes all tests failing on tablets due to one of these two
known issues to explicitly ask to disable tablets when creating their
test table. This means that at least we continue to test these two
features (Streams and TTL) even if they don't yet work with tablets.

We'll need to remember to remove this override when tablet support
for CDC and Alternator TTL arrives. I left a comment in the right
places in the code with the relevant issue numbers, to remind us what
to change when we fix those issues.

This patch also adds xfail_tablets and skip_tablets fixtures that can
be used to xfail or skip tests when running with tablets - but we
don't use them yet - and may never use them, but since I already wrote
this code it won't hurt having it, just in case. When running without
tablets, or against an older Scylla or on DynamoDB, the tests with
these marks are run normally.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-23 10:46:48 +02:00
Botond Dénes
08cf5ccd23 Merge 'Fix test_tablet_missing_data_repair' from Asias He
This PR fixes test_tablet_missing_data_repair and enable the test again.

If a node is not UP yet, repair in the test will be a partial repair. The partial repair will not repair all the data which cause the check of rows after repair to fail.  Check nodes see each other as UP before repair.

Closes scylladb/scylladb#16930

* github.com:scylladb/scylladb:
  test: Enable test_tablet_missing_data_repair again
  test: Wait for nodes to be up when repair
  test: Check repair status in ScyllaRESTAPIClient
2024-01-23 10:38:13 +02:00
Anna Stuchlik
9076a944c5 doc: improve the ScyllaDB for Developers page
This commit improves the developer-oriented section
of the core documentation:

- Added links to the developer sections in the new
  Get Started guide (Develop with ScyllaDB and
  Tutorials and Example Projects) for ease of access.

- Replaced the outdated Learn to Use ScyllaDB page with
  a link to the up-to-date page in the Get Started guide.
  This involves removing the learn.rst file and adding
  an appropriate redirection.

- Removed the Apache Copyrights, as this page does not
  need it.

- Removed the Features panel box as there was only one
  feature listed, which looked weird. Also, we are in
  the process of removing the Features section.

Closes scylladb/scylladb#16800
2024-01-23 10:06:31 +02:00
Kefu Chai
ac473eca91 utils:: add formatter for enum_option
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for enum_option<>. since its
operator<<() is still used by the homebrew generic formatter for
formatting vector<>, operator<<() is preserved.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16917
2024-01-23 10:03:51 +02:00
Kefu Chai
91a93b125b utils:: add formatter for cql3::authorized_prepared_statements_cache_key
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
cql3::authorized_prepared_statements_cache_key, and remove its
operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16924
2024-01-23 09:13:14 +02:00
Kefu Chai
76b9e4f4f4 locator: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16914
2024-01-23 09:12:23 +02:00
Asias He
99e3d2ce72 test: Enable test_tablet_missing_data_repair again
Fixes #16859
2024-01-23 15:02:02 +08:00
Kefu Chai
db77587309 tracing: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16925
2024-01-23 08:57:11 +02:00
Kefu Chai
26004071b3 configure.py: reenable -Wnarrowing
it seems that the tree builds just fine with this warning enabled.
and narrowing is a potentially unsafe numeric conversion. so let's
enable this warning option.

this change also helps to reduce the difference between the rules
generated by configure.py and those generated by CMake.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16929
2024-01-23 08:49:25 +02:00
Kefu Chai
5005e0a156 configure.py: s/--std=/-std/
neither clang nor gcc supports the --std flag, they support -std=
though. see https://clang.llvm.org/cxx_status.html and
https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html
so, let's use the -std=gnu++20 for the C++20 standard with GNU
extensions.

this change also helps to reduce the difference between the rules
generated by `configure.py` and those generated by CMake.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16928
2024-01-23 08:48:05 +02:00
Asias He
7c230f17cc test: Wait for nodes to be up when repair
If a node is not UP yet, repair in the test will be a partial repair.
Check nodes see each other as UP before repair.

Fixes #16859
2024-01-23 11:10:08 +08:00
Asias He
57a4e5594d test: Check repair status in ScyllaRESTAPIClient
Raise an exception in case the repair is not successful.
2024-01-23 11:10:08 +08:00
Tomasz Grabiec
06c42681bd tests: tablets: Add tests for removenode and replace 2024-01-23 01:19:42 +01:00
Tomasz Grabiec
e5dcf03b88 tablets: Add support for removenode and replace handling
New tablet replicas are allocated synchronously with node
operations. They are safely rebuilt from all existing replicas.
The list of ignored nodes passed to node operations is respected.

Tablet scheduler is responsible for scheduling tablet transition which
changes the replicas set. The infrastructure for handling decommission
in tablet scheduler is reused for this.

Scheduling is done incrementally, respecting per-shard load
limits. Rebuilding transitions are recognized by load calculation to
affect all tablet replicas.

New kind of tablet transition is introduced called "rebuild" which
adds new tablet replica and rebuilds it from existing replicas. Other
than that, the transition goes through the same stages as regular
migration to ensure safe synchronization with request coordinators.

In this PR we simply stream from all tablet replicas. Later we should
switch to calling repair to avoid sending excessive amounts of data.

Fixes #16690.
2024-01-23 01:19:42 +01:00
Tomasz Grabiec
bdd5bdae14 topology_coordinator: tablets: Do not fail in a tight loop
If streaming or cleanup RPC fails, we would retry immediately. That
fills the logs with erorrs. Throttle them by sleeping on error before
the same action is retried.
2024-01-23 01:19:42 +01:00
Tomasz Grabiec
a3f6682ba2 topology_coordinator: tablets: Avoid warnings about ignored failured future 2024-01-23 01:18:10 +01:00
Tomasz Grabiec
5fccee3a13 storage_service, topology: Track excluded state in locator::topology
Will be used by tablet load balancer to avoid excluded nodes in
scheduling.
2024-01-23 01:12:58 +01:00
Tomasz Grabiec
d59db94f3c raft topology: Introduce param-less topology::get_excluded_nodes()
Picks up currently excluded nodes. Will be used during tablet rebuild
on removenode.
2024-01-23 01:12:58 +01:00
Tomasz Grabiec
d053c5ef1e raft topology: Move get_excluded_nodes() to topology
Will be accessed outside topology coordinator from tablet rebuild handler.
2024-01-23 01:12:58 +01:00
Tomasz Grabiec
92f01674f2 tablets: load_balancer: Generalize load tracking
This patch removes some duplication of logic and implicit assumptions
by creating clear algebra for load impact calculation and its
application to state of the load balancer.

Will make adding new kinds of tablet transitions with different impact
on load much easier.
2024-01-23 01:12:57 +01:00
Tomasz Grabiec
649ca0e46c tablets: Introduce get_migration_streaming_info() which works on migration request
Will be used by tablet load balancer to compute impact on load of
planned migrations. Currently, the logic is hard coded in the load
balancer and may get out of sync with the logic we have in
get_migration_streaming_info() for already running tablet transitions.

The logic will become more complex for rebuild transition, so use
shared code to compute it.
2024-01-23 01:12:57 +01:00
Tomasz Grabiec
6dc56fd80b tablets: Move migration_to_transition_info() to tablets.hh 2024-01-23 01:12:57 +01:00
Tomasz Grabiec
1df256221c tablets: Extract get_new_replicas() which works on migraiton request
Now we have a single place which translates tablet migration request to new
replicas.

Will be reused in other places.
2024-01-23 01:12:57 +01:00
Tomasz Grabiec
ae382196f1 tablets: Move tablet_migration_info to tablets.hh
Will add methods which operate on it to tablets.hh where they belong.
2024-01-23 01:12:57 +01:00
Tomasz Grabiec
4a06ffb43c tablets: Store transition kind per tablet
Will be used to distinguish regular migration from rebuild, repair and
RF change.
2024-01-23 01:12:57 +01:00
Pavel Emelyanov
d1d4620af8 config: Add --tablets-initial-scale-factor
Previous patch taught tablets allocator to multiply the initial tablets
count by some value. This patch makes this factor configurable

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-22 19:18:18 +03:00
Pavel Emelyanov
eb3b237e05 tablet_allocator: Add initial tablets scale to config
When allocating tablets for table for the frist time their initial count
is calculated so that each shard in a cluster gets one tablet. It may
happen that more than one initial tablet per shard is better, e.g. perf
tests typically rely on that.

It's possible to specify the initial tablets count when creating a
keyspace, this number doesn't take the cluster topology into
consideration and may also be not very nice.

As a temporary solution (e.g. for perf tests) we may add a configurable
that scales the initial number of calculated tablets by some factor

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-22 19:14:45 +03:00
Pavel Emelyanov
f57b194db0 tablet_allocator: Add config
Tablet allocator is a sharded service, that starts in main, it's worth
equipping it with a config. Next patches will fill it with some payload

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-22 19:13:58 +03:00
Kamil Braun
3268be3860 raft: server: track last persisted snapshot descriptor index
Also introduce a condition variable notified whenever this index is
updated.

Will be user in following commits.
2024-01-22 16:48:08 +01:00
Kamil Braun
1e786d9d64 raft: server: framework for handling server requests
Add data structures and modify `io_fiber` code to prepare it for
handling requests generated by the `server`, not just `fsm`.
Used in later commits.
2024-01-22 16:47:34 +01:00
Kefu Chai
33794eca19 database: wait until commitlog are reclaimed in flush_all_tables()
this change addresses the possible data resurrection after
"nodetool compact" and "nodetool flush" commands. and prepare for
the fix of a similar data resurrection issue after "nodetool cleanup".

active commitlog segments are recycled in the background once they are
discarded.

and there is a chance that we could have data resurrection even after
"nodetool cleanup", because the mutations in commitlog's active segments
could change the tables which are supposed to be removed by
"nodetool cleanup", so as a solution to address this problem in the
pre-tablets era, we force new active segments of commitlog, and flush the
involved memtables. since the active segments are discarded in the
background, the completion of the "nodetool cleanup" does not guarantee
that these mutation won't be applied to memtable when server restarts,
if it is killed right away.

the same applies to "force_flush", "force_compaction" and
"force_keyspace_compaction" API calls which are used by nodetool as
well. quote from Benny's comment

> If major comapction doesn't wait for the commitlog deletion it is
> also exposed to data resurrection since theoretically it could purge
> tombstones based on the assumption that commitlog would not resurrect
> data that they might shadow, BUT on a crash/restart scenario commitlog
> replay would happen since the commitlog segments weren't deleted -
> breaking the contract with compaction.

so to ensure that the active segments are reclaimed upon completion of
"nodetool cleanup", "nodetool compact" and "nodetool flush" commands,
let's wait for pending deletes in `database::flush_all_tables()`, so the
caller wait until the reclamation of deleted active segments completes.

Refs #4734
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16915
2024-01-22 17:31:57 +02:00
David Garcia
f3eeba8cc6 docs: parse config.cc properties as rst text
This enhancement formats descriptions in config.cc using the standard markup language reStructuredText (RST).

By doing so, it improves the rendering of these descriptions in the documentation, allowing you to use various directives like admonitions, code blocks, ordered lists, and more.

Closes scylladb/scylladb#16311
2024-01-22 16:40:18 +02:00
Botond Dénes
a48881801a replica/tablets: drop keyspace_name from system.tablets partition-key
The name of the keyspace being part of the partition key is not useful,
the table_id already uniquely identifies the table. The keyspace name
being part of the key, means that code wanting to interact with this
table, often has to resolve the table id, just to be able to provide the
keyspace name. This is counter productive, so make the keyspace_name
just a static column instead, just like table_name already is.

Fixes: #16377

Closes scylladb/scylladb#16881
2024-01-22 13:12:02 +01:00
Petr Gusev
6a4176c84f Update seastar submodule
* seastar 8b9ae36b...85359b28 (4):
  > rpc: extend the use_gate until request processing is finished

Fixes scylladb/scylladb#16382

  > scripts: Remove build.sh
  > build: do not install FindProtobuf.cmake
  > net: add missing include

Closes scylladb/scylladb#16883
2024-01-22 11:29:50 +01:00
Kamil Braun
1007ac4956 Merge 'sync_raft_topology_nodes: force_remove_endpoint for left nodes only if an IP is not used by other nodes' from Petr Gusev
Before the patch we called `gossiper.remove_endpoint` for IP-s of the
left nodes. The problem is that in replace-with-same-ip scenario we
called `gossiper.remove_endpoint` for IP which is used by the new,
replacing node. The `gossiper.remove_endpoint` method puts the IP into
quarantine, which means gossiper will ignore all events about this IP
for `quarantine_delay` (one minute by default). If we immediately
replace just replaced node with the same IP again, the bootstrap will
fail since the gossiper events are blocked for this IP, and we won't be
able to resolve an IP for the new host_id.

Another problem was that we called gossiper.remove_endpoint method,
which doesn't remove an endpoint from `_endpoint_state_map`, only from
live and unreachable lists. This means the IP will keep circulating in
the gossiper message exchange between cluster nodes until full cluster
restart.

This patch fixes both of these problems. First, we rely on the fact that
when topology coordinator moves the `being_replaced` node to the left
state, the IP of the `replacing` node is known to all nodes. This means
before removing an IP from the gossiper we can check if this IP is
currently used by another node in the current raft topology. This is
done by constructing the `used_ips` map based on normal and transition
nodes. This map is cached to avoid quadratic behaviour.

Second, we call `gossiper.force_remove_endpoint`, not
`gossiper.remove_endpoint`. This function removes and IP from
`_endpoint_state_map`, as well as from live and unreachable lists.

Closes scylladb/scylladb#16820

* github.com:scylladb/scylladb:
  get_peer_info_for_update: update only required fields in raft topology mode
  get_peer_info_for_update: introduce set_field lambda
  storage_service::on_change: fix indent
  storage_service::on_change: skip handle_state functions in raft topology mode
  test_replace_different_ip: check old IP is removed from gossiper
  test_replace: check two replace with same IP one after another
  storage_service: sync_raft_topology_nodes: force_remove_endpoint for left nodes only if an IP is not used by other nodes
2024-01-22 11:25:55 +01:00
Botond Dénes
742bc1bd11 test/topology_experimental_raft: test_tablet.py: disable flaky test
Skip test_tablet_missing_data_repair, it is failing a lot breaking
promotion and CI. Can't revert because the PR introducing it was already
piled on. So disable while investigated.

Refs: #16859

Closes scylladb/scylladb#16879
2024-01-22 11:49:05 +02:00
Avi Kivity
9e8b65f587 chunked_vector: remove range constructor
Standard containers don't have constructors that take ranges;
instead people use boost::copy_range or C++23 std::ranges::to.

Make the API more uniform by removing this special constructor.

The only caller, in a test, is adjusted.

Closes scylladb/scylladb#16905
2024-01-22 10:26:15 +02:00
Lakshmi Narayanan Sreethar
a1867986e7 test.py: deduce correct path for unit tests when built with cmake
Fix the path deduction for unit test executables when the source code is
built with cmake.

Fixes #16906

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#16907
2024-01-22 10:03:44 +02:00
Nadav Har'El
0bef50ef0c cql-pytest: add "--vnodes" option to "run" script
Running test/cql-pytest/run now defaults to enabling the "tablets"
experimental feature when running Scylla - and tests detect this and
use this feature as appropriate. This is the correct default going
forward, but in the short term it would be nice to also have an
option to easily do a manual test run *without* tablets.

So this patch adds a "--vnodes" option to the test/cql-pytest/run
script. This option causes "run" to run Scylla without enabling the
"tablets" experimental feature.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16896
2024-01-22 09:35:11 +02:00
Anna Stuchlik
a462b914cb doc: add 2024.1 to the OSS vs. Enterprise matrix
This commit adds the information that
ScyllaDB Enterprise 2024.1 is based
on ScyllaDB Open Source 5.4
to the OSS vs. Enterprise matrix.

Closes scylladb/scylladb#16880
2024-01-22 09:25:08 +02:00
Kefu Chai
9550f29d22 cql3: add formatter for cql3::prepared_cache_key_type
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for cql3::prepared_cache_key_type
and cql3::prepared_cache_key_type::cache_key_type, and remove
their operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16901
2024-01-21 19:12:59 +02:00