Currently, rows in system.commitlog_cleanups are only dropped on node restart,
so the table can accumulate an unbounded number of records.
This probably isn't a problem in practice, because tablet cleanups aren't that
frequent, but this patch adds a countermeasure anyway.
This patch makes the choice to delete the unneeded records right when new records
are added. This isn't ideal -- it would be more natural if the unneeded records
were deleted as soon as they become unneeded -- but it does the job with a
minimal amount of code.
Add a helper function which returns the minimum replay position
across all existing or future commitlog segments.
Only positions greater or equal to it can be replayed on the next reboot.
We will use this helper in a future patch to garbage collect some cleanup
metadata which refers to replay positions.
To avoid data resurrection after cleanup, we have to filter out the
cleaned mutations during commitlog replay.
In this patch, we get tablet cleanup to record the affected set of mutations
to system.commitlog_cleanups. In a later patch, we will use these records
for filtering during commitlog replay.
The `topology_coordinator` is a large class (>1000 loc) which resides in
an even larger source file (storage_service.cc, ~7800 loc). This PR
moves the topology_coordinator class out of the storage_service.cc file
in order to improve modularity and recompilation times during
development.
As a first step, the `topology_mutation_builder` and
`topology_node_mutation_builder` classes are also moved from
storage_service.cc to their own, new header/source files as they are an
important abstraction used both by the topology coordinator code and
some other code in storage_service.cc that won't be moved.
Then, the `topology_coordinator` is moved out. The
`topology_coordinator` class is completely hidden in the new
topology_coordinator.cc file and can only be started and waited on to
finish via the new `run_topology_coordinator` function.
Fixes: scylladb/scylladb#16605Closesscylladb/scylladb#16609
* github.com:scylladb/scylladb:
service: move topology coordinator to a separate file
storage_service: introduce run_topology_coordinator function
service: move topology mutation builder out of storage_service
storage_service: detemplate topology_node_mutation_builder::set
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
cql3::statements::statement_type. and its operator<<() is dropped.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16948
The topology coordinator is a large class that sits in an even larger
storage_service.cc file. For the sake of code modularization and
reducing recompilation time, move the topology coordinator outside
storage_service.cc.
The topology_coordinator class is moved to the new
topology_coordinator.cc unchanged. Along with it, the following items
are moved:
- wait_for_ip function - it's used both by storage_service and
topology_coordinator, so in order for the new topology_coordinator.cc
not to depend on storage service, it is moved to the new file,
- raft_topology logger - for the same reason as wait_for_ip,
- run_topology_coordinator - serves as the main interface for the
topology coordinator. The topology coordinator class is not exposed at
all, it's only possible to start the coordinator and wait until it
shuts down itself via that function.
Extracts a part of the logic of the raft_state_monitor_fiber method into
a separate function. It will be moved to a separate file in the next
commit along with the topology coordinator, and will serve as the only
way of interaction with the topology coordinator while the class itself
will remain hidden.
The topology_coordinator class is now directly constructed on the stack
(or rather in the coroutine frame), the indirection via shared_ptr is no
longer needed.
Before introduction of PR#15524 the removal had always been invoked
via finally() continuation. In spite of making flush() noexcept, the
mentioned PR modified the logic. If flush() returns exceptional future,
then the removal is not performed.
This change restores the old behavior - removal operation is always called.
Since now, the logic of compaction_group::stop() is as follows:
- firstly, it waits for completion of flush() via
seastar::coroutine::as_future() to avoid premature exception
- then it executes compaction_manager.remove()
- in the end it inspects the future returned from flush()
to re-throw the exception if the operation failed
Fixed: scylladb#16751
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Closesscylladb/scylladb#16940
Alternator incorrectly refuses an empty tag value for TagResource, but DynamoDB does allow this case and it's useful (note that an empty tag key is rightly forbidden). So this short series fixes this case, and adds additional tests for TagResource which covers this case and other cases we forgot to cover in tests.
Fixes#16904.
Closesscylladb/scylladb#16910
* github.com:scylladb/scylladb:
test/alternator: add more tests for TagResource
alternator: allow empty tag value
There are currently two options how to "request" the number of initial tables for a table
1. specify it explicitly when creating a keyspace
2. let scylla calculate it on its own
Both are not very nice. The former doesn't take cluster layout into consideration. The latter does, but starts with one tablet per shard, which can be too low if the amount of data grows rapidly.
Here's a (maybe temporary) proposal to facilitate at least perf tests -- the --tablets-initial-scale-factor option that enhances the option number two above by multiplying the calculated number of tablets by the configured number. This is what we currently do to run perf tests by patching scylla, with the option it going to be more convenient.
Closesscylladb/scylladb#16919
* github.com:scylladb/scylladb:
config: Add --tablets-initial-scale-factor
tablet_allocator: Add initial tablets scale to config
tablet_allocator: Add config
The topology_mutation_builder, topology_node_mutation_builder and
topology_request_tracking_mutation_builder are currently used by
storage service - mainly, but not exclusively, by the topology
coordinator logic. As we are going to extract the topology coordinator
to a separate file, we need to move the builders to their own file as
well so that they will be accessible both by the topology coordinator
and the storage service.
One of the overloads of `topology_node_mutation_builder::set` is a
template which takes a std::set of things that convert to a sstring.
This was done to support sets of strings of different types (e.g.
sstring, string_view) but it turns out that only sstring is used at the
moment.
De-template the method as it is unnecessary for it to be a template.
Moreover, the `topology_node_mutation_builder` is going to be moved in
the next commit of the PR to a separate file, so not having template
methods makes the task simpler.
Issue #16904 discovered that Alternator refuses to allow an empty tag
value while it's useful (and DynamoDB allows it). This brought to my
attention that our test coverage of the TagResource operation was lacking.
So this patch adds more tests for some corner cases of TagResource which
we missed, including the allowed lengths of tag keys and values.
These tests reproduce #16904 (the case of empty tag value) and also #16908
(allowing and correctly counting unicode letters), and also add
regression testing to cases which we already handled correctly.
As usual, all the new tests also pass on DynamoDB.
Refs #16904
Refs #16908
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The existing code incorrectly forbid setting a tag on a table to an empty
string value, but this is allowed by DynamoDB and is useful, so we fix it
in this patch.
While at it, improve the error-checking code for tag parameters to
cleanly detect more cases (like missing or non-string keys or values).
The following patch is a test that fails before this patch (because
it fails to insert a tag with an empty value) and passes after it.
Fixes#16904.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This PR fixes test_tablet_missing_data_repair and enable the test again.
If a node is not UP yet, repair in the test will be a partial repair. The partial repair will not repair all the data which cause the check of rows after repair to fail. Check nodes see each other as UP before repair.
Closesscylladb/scylladb#16930
* github.com:scylladb/scylladb:
test: Enable test_tablet_missing_data_repair again
test: Wait for nodes to be up when repair
test: Check repair status in ScyllaRESTAPIClient
This commit improves the developer-oriented section
of the core documentation:
- Added links to the developer sections in the new
Get Started guide (Develop with ScyllaDB and
Tutorials and Example Projects) for ease of access.
- Replaced the outdated Learn to Use ScyllaDB page with
a link to the up-to-date page in the Get Started guide.
This involves removing the learn.rst file and adding
an appropriate redirection.
- Removed the Apache Copyrights, as this page does not
need it.
- Removed the Features panel box as there was only one
feature listed, which looked weird. Also, we are in
the process of removing the Features section.
Closesscylladb/scylladb#16800
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for enum_option<>. since its
operator<<() is still used by the homebrew generic formatter for
formatting vector<>, operator<<() is preserved.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16917
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
cql3::authorized_prepared_statements_cache_key, and remove its
operator<<().
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16924
it seems that the tree builds just fine with this warning enabled.
and narrowing is a potentially unsafe numeric conversion. so let's
enable this warning option.
this change also helps to reduce the difference between the rules
generated by configure.py and those generated by CMake.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16929
Previous patch taught tablets allocator to multiply the initial tablets
count by some value. This patch makes this factor configurable
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When allocating tablets for table for the frist time their initial count
is calculated so that each shard in a cluster gets one tablet. It may
happen that more than one initial tablet per shard is better, e.g. perf
tests typically rely on that.
It's possible to specify the initial tablets count when creating a
keyspace, this number doesn't take the cluster topology into
consideration and may also be not very nice.
As a temporary solution (e.g. for perf tests) we may add a configurable
that scales the initial number of calculated tablets by some factor
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Tablet allocator is a sharded service, that starts in main, it's worth
equipping it with a config. Next patches will fill it with some payload
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
this change addresses the possible data resurrection after
"nodetool compact" and "nodetool flush" commands. and prepare for
the fix of a similar data resurrection issue after "nodetool cleanup".
active commitlog segments are recycled in the background once they are
discarded.
and there is a chance that we could have data resurrection even after
"nodetool cleanup", because the mutations in commitlog's active segments
could change the tables which are supposed to be removed by
"nodetool cleanup", so as a solution to address this problem in the
pre-tablets era, we force new active segments of commitlog, and flush the
involved memtables. since the active segments are discarded in the
background, the completion of the "nodetool cleanup" does not guarantee
that these mutation won't be applied to memtable when server restarts,
if it is killed right away.
the same applies to "force_flush", "force_compaction" and
"force_keyspace_compaction" API calls which are used by nodetool as
well. quote from Benny's comment
> If major comapction doesn't wait for the commitlog deletion it is
> also exposed to data resurrection since theoretically it could purge
> tombstones based on the assumption that commitlog would not resurrect
> data that they might shadow, BUT on a crash/restart scenario commitlog
> replay would happen since the commitlog segments weren't deleted -
> breaking the contract with compaction.
so to ensure that the active segments are reclaimed upon completion of
"nodetool cleanup", "nodetool compact" and "nodetool flush" commands,
let's wait for pending deletes in `database::flush_all_tables()`, so the
caller wait until the reclamation of deleted active segments completes.
Refs #4734
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16915
This enhancement formats descriptions in config.cc using the standard markup language reStructuredText (RST).
By doing so, it improves the rendering of these descriptions in the documentation, allowing you to use various directives like admonitions, code blocks, ordered lists, and more.
Closesscylladb/scylladb#16311
The name of the keyspace being part of the partition key is not useful,
the table_id already uniquely identifies the table. The keyspace name
being part of the key, means that code wanting to interact with this
table, often has to resolve the table id, just to be able to provide the
keyspace name. This is counter productive, so make the keyspace_name
just a static column instead, just like table_name already is.
Fixes: #16377Closesscylladb/scylladb#16881
Before the patch we called `gossiper.remove_endpoint` for IP-s of the
left nodes. The problem is that in replace-with-same-ip scenario we
called `gossiper.remove_endpoint` for IP which is used by the new,
replacing node. The `gossiper.remove_endpoint` method puts the IP into
quarantine, which means gossiper will ignore all events about this IP
for `quarantine_delay` (one minute by default). If we immediately
replace just replaced node with the same IP again, the bootstrap will
fail since the gossiper events are blocked for this IP, and we won't be
able to resolve an IP for the new host_id.
Another problem was that we called gossiper.remove_endpoint method,
which doesn't remove an endpoint from `_endpoint_state_map`, only from
live and unreachable lists. This means the IP will keep circulating in
the gossiper message exchange between cluster nodes until full cluster
restart.
This patch fixes both of these problems. First, we rely on the fact that
when topology coordinator moves the `being_replaced` node to the left
state, the IP of the `replacing` node is known to all nodes. This means
before removing an IP from the gossiper we can check if this IP is
currently used by another node in the current raft topology. This is
done by constructing the `used_ips` map based on normal and transition
nodes. This map is cached to avoid quadratic behaviour.
Second, we call `gossiper.force_remove_endpoint`, not
`gossiper.remove_endpoint`. This function removes and IP from
`_endpoint_state_map`, as well as from live and unreachable lists.
Closesscylladb/scylladb#16820
* github.com:scylladb/scylladb:
get_peer_info_for_update: update only required fields in raft topology mode
get_peer_info_for_update: introduce set_field lambda
storage_service::on_change: fix indent
storage_service::on_change: skip handle_state functions in raft topology mode
test_replace_different_ip: check old IP is removed from gossiper
test_replace: check two replace with same IP one after another
storage_service: sync_raft_topology_nodes: force_remove_endpoint for left nodes only if an IP is not used by other nodes
Skip test_tablet_missing_data_repair, it is failing a lot breaking
promotion and CI. Can't revert because the PR introducing it was already
piled on. So disable while investigated.
Refs: #16859Closesscylladb/scylladb#16879
Standard containers don't have constructors that take ranges;
instead people use boost::copy_range or C++23 std::ranges::to.
Make the API more uniform by removing this special constructor.
The only caller, in a test, is adjusted.
Closesscylladb/scylladb#16905
Running test/cql-pytest/run now defaults to enabling the "tablets"
experimental feature when running Scylla - and tests detect this and
use this feature as appropriate. This is the correct default going
forward, but in the short term it would be nice to also have an
option to easily do a manual test run *without* tablets.
So this patch adds a "--vnodes" option to the test/cql-pytest/run
script. This option causes "run" to run Scylla without enabling the
"tablets" experimental feature.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#16896
This commit adds the information that
ScyllaDB Enterprise 2024.1 is based
on ScyllaDB Open Source 5.4
to the OSS vs. Enterprise matrix.
Closesscylladb/scylladb#16880
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for cql3::prepared_cache_key_type
and cql3::prepared_cache_key_type::cache_key_type, and remove
their operator<<().
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16901
This PR:
- Removes the redundant information about previous versions from the Create Cluster page.
- Fixes language mistakes on that page, and replaces "Scylla" with "ScyllaDB".
(nobackport)
Closesscylladb/scylladb#16885
* github.com:scylladb/scylladb:
doc: fix the language on the Create Cluster page
doc: remove reduntant info about old versions
When a base table changes and altered, so does the views that might
refer to the added column (which includes "SELECT *" views and also
views that might need to use this column for rows lifetime (virtual
columns).
However the query processor implementation for views change notification
was an empty function.
Since views are tables, the query processor needs to at least treat them
as such (and maybe in the future, do also some MV specific stuff).
This commit adds a call to `on_update_column_family` from within
`on_update_view`.
The side effect true to this date is that prepared statements for views
which changed due to a base table change will be invalidated.
Fixes https://github.com/scylladb/scylladb/issues/16392
This series also adds a test which fails without this fix and passes when the fix is applied.
Closesscylladb/scylladb#16897
* github.com:scylladb/scylladb:
Add test for mv prepared statements invalidation on base alter
query processor: treat view changes at least as table changes