This PR:
- Removes the redundant information about previous versions from the Create Cluster page.
- Fixes language mistakes on that page, and replaces "Scylla" with "ScyllaDB".
(nobackport)
Closesscylladb/scylladb#16885
* github.com:scylladb/scylladb:
doc: fix the language on the Create Cluster page
doc: remove reduntant info about old versions
When a base table changes and altered, so does the views that might
refer to the added column (which includes "SELECT *" views and also
views that might need to use this column for rows lifetime (virtual
columns).
However the query processor implementation for views change notification
was an empty function.
Since views are tables, the query processor needs to at least treat them
as such (and maybe in the future, do also some MV specific stuff).
This commit adds a call to `on_update_column_family` from within
`on_update_view`.
The side effect true to this date is that prepared statements for views
which changed due to a base table change will be invalidated.
Fixes https://github.com/scylladb/scylladb/issues/16392
This series also adds a test which fails without this fix and passes when the fix is applied.
Closesscylladb/scylladb#16897
* github.com:scylladb/scylladb:
Add test for mv prepared statements invalidation on base alter
query processor: treat view changes at least as table changes
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define a formatter for cql3::ut_name, and remove
their operator<<().
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16890
Issue #16392 describes a bug where when a base table is altered, it's
materialized views prepared statements are not invalidated which in turn
causes them to return missing data.
This test reproduces this bug and serves as a regression test for this
problem.
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
When a base table changes and altered, so does the views that might
refer to the added column (which includes "SELECT *" views and also
views that might need to use this column for rows lifetime (virtual
columns).
However the query processor implementation for views change notification
was an empty function.
Since views are tables, the query processor needs to at least treat them
as such (and maybe in the future, do also some MV specific stuff).
This commit adds a call to `on_update_column_family` from within
`on_update_view`.
The side effect true to this date is that prepared statements for views
which changed due to a base table change will be invalidated.
Fixes#16392
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Some fields of system.peers table are updated
through raft, we don't need to peek them from gossiper.
The goal of the patch is to declare explicitly
which code is responsible for which fields.
In particular, in raft topology mode we don't
need to update raft-managed fields since
it's done in topology_state_load and
raft_ip_address_updater.
This is a refactoring commit. In the next commit
we'll add a parameter to this unified lambda and
this is easy to do if we have only one lambda and
not three.
We don't need them in raft topology mode since the token_metadata
update happens in topology_state_load function. We lift the
_raft_topology_change_enabled checks from those functions to on_change.
In this commit we modify the existing
test_replace_different_ip. We add the check that the old
IP is not contained in alive or down lists, which
means it's completely wiped from gossiper. This test is failing
without the force_remove_endpoint fix from
a previous commit. We also check that the state of
local system.peers table is correct.
This commit removes the upgrade guides
from ScyllaDB Open Source to Enterprise
for versions we no longer support.
In addition, it removes a link to
one of the removed pages from
the Troubleshooting section (the link is
redundant).
Closesscylladb/scylladb#16249
This mini-set includes code coverage support for ScyllaDB, it provides:
1. Support for building ScyllaDB with coverage support.
2. Utilities for processing coverage profiling data
3. test.py support for generation and processing of coverage profiling into an lcov trace files which can later be used to produce HTML or textual coverage reports.
Refs #16323Closesscylladb/scylladb#16784
* github.com:scylladb/scylladb:
Add code coverage documentation
test.py: support code coverage
code coverage: Add libraries for coverage handling
test.py: support --coverage and --coverage-mode
configure.py support coverage profiles on standrad build modes
Currently if topology coordinator gets stuck in a CI test run it's hard to debug this (e.g. scylladb/scylladb#16708). We can add a lot of logging inside topology coordinator code to aid debugging, without spamming the logs -- these are relatively rare control plane events.
Closesscylladb/scylladb#16749
* github.com:scylladb/scylladb:
test/pylib: scylla_cluster: enable raft_topology=debug level by default
raft topology: increase level of some TRACE messages
raft topology: log when entering transition states
raft topology: don't include null ID in exclude_nodes
raft topology: INFO log when executing global commands and updating topology state
storage_service: separate logger for raft topology
Add `--experimental-features=tablets` to both `test/cql-pytest/suite.yaml` and `test/cql-pytest/run.py`, so tablets are enabled. Detect tablet support in `contest.py` and add an xfail and skip marker to mark tests that fail/crash with tablets. These are expected to be fixed soon.
Some tests checking things around alter-keyspace, had to force-disable tablets on the created keyspace, because tablets interfere with the test (a keyspace with tablets cannot have simple strategy for example).
Tablets were also interfering with `test_keyspace.py:test_storage_options_local`, because it is expecting `system_schema.scylla_keyspaces` to not have any entries for local storage keyspace, but they have it if tablets are enabled. Adjust the test to account for this.
Closesscylladb/scylladb#16840
* github.com:scylladb/scylladb:
test/cql-pytest: run.py,suite.yaml: enable tablets by default
test/cql-pytest: sprinkle xfail_tablets and skip_with_tablets as needed
test/cql-pytest: disable tablets for some keyspace-altering tests
test/cql-pytest: test_keyspace.py: test_storage_options_local(): fix for tablets
test/cql-pytest: fix test_tablets.py to set initial_tablets correctly
test/cql-pytest: add tablet detection logic and fixtures
test/cql-pytest: extract is_scylla check into util.py
* tools/cqlsh 426fa0ea...b8d86b76 (8):
> Make cqlsh work with unix domain sockets
Fixesscylladb/scylladb#16489
> Bump python-driver version
> dist/debian: add trailer line
> dist/debian: wrap long line
> Draft: explicit build-time packge dependencies
> stop retruning status_code=2 on schema disagreement
> Fix minor typos in the code
> Dockerfile: apt-get update and apt-get upgrade to get latest OS packages
For tests that cover functionality, which doesn't yet work with tablets.
These tests and the respective functionality they test, are expected to
be fixed soon, and then these fixtures will be removed.
When tablets are enabled on a keyspace, they cannot be altered to simple
replication strategy anymore.
These keyspaces are testing exactly that, so disable tablets on the
initial keyspace create statements.
This test expects a keyspace with local storage option, to not have a
row in system_schema.scylla_keyspace. With tablets enabled by default,
this won't be the case. Adjust the test to check for the specific
storage-related columns instead.
Recently, in commit 49026dc319, the
way to choose the number of tablets in a new keyspace changed.
This broke the test we had for a memory leak when many tablets were
used, which saw the old syntax wasn't recognized and assumed Scylla
is running without tablet support - so the test was skipped.
Let's fix the syntax. After this patch the test passes if the tablets
experimental feature is enabled, and only skipped if it isn't.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Add keyspace_has_tablets() utility function, which, given a keyspace,
returns whether it is using tablets or not.
In addition, 3 new fixtures are added:
* has_tablets - does scylla has tablets by default?
* xfail_tablets - the test is marked xfail, when tablets are enabled by
default.
* skip_with_tablets - the test is skipped when tablets are enabled by
default, because it might crash with tablets.
We expect the latter two to be removed soon(ish), as we make all test,
and the functionality they test work with tablets.
This is a test case for the problem, described in the
previous commit. Before that fix the second replace
failed since it couldn't resolve an IP for the new host_id.
Before the patch we called gossiper.remove_endpoint for IP-s
of the left nodes. The problem is that in replace-with-same-ip
scenario we called gossiper.remove_endpoint for IP which is
used by the new, replacing node. The gossiper.remove_endpoint
method puts the IP into quarantine, which means gossiper will
ignore all events about this IP for quarantine_delay (one minute by
default). If we immediately replace just replaced node with
the same IP again, the bootstrap will fail since the gossiper
events are blocked for this IP, and we won't be able to
resolve an IP for the new host_id.
Another problem was that we called gossiper.remove_endpoint
method, which doesn't remove an endpoint from _endpoint_state_map,
only from live and unreachable lists. This means the IP
will keep circulating in the gossiper message exchange between cluster
nodes until full cluster restart.
This patch fixes both of these problems. First, we rely on
the fact that when topology coordinator moves the being_replaced
node to the left state, the IP of the replacing node is known to all nodes.
This means before removing an IP from the gossiper we can check if
this IP is currently used by another node in the current raft topology.
This is done by constructing the used_ips map based on normal and
transition nodes. This map is cached to avoid quadratic behaviour.
Second, we call gossiper.force_remove_endpoint, not
gossiper.remove_endpoint. This function removes and IP from
_endpoint_state_map, as well as from live and unreachable lists.
The tests for both of these improvements will be added in subsequent
commits.
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define a formatter for db::operation_type, and
remove their operator<<().
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16832
this change should silence the warning like
```
/home/kefu/dev/scylladb/repair/repair.cc:222:23: error: comparison of integers of different signs: 'int' and 'size_type' (aka 'unsigned long') [-Werror,-Wsign-compare]
222 | for (int i = 0; i < all.size(); i++) {
| ~ ^ ~~~~~~~~~~
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16867
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we replace operator<< with format_as() for
unimplemented::cause, so that we don't rely on the deprecated behavior,
and neither do we create a fully blown fmt::formatter. as in
fmt v10, format_as() can be used in place of fmt::formatter,
while in fmt v9, format_as() is only allowed to return a integer.
so, to be future-proof, and to be simpler, format_as() is used.
we can even replace `format_as(c)` with `c`, once fmt v10 is
available in future.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16866
these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning.
Closesscylladb/scylladb#16868
* github.com:scylladb/scylladb:
auth: do not include unused headers
locator: Handle replication factor of 0 for initial_tablets calculations
table: add_sstable_and_update_cache: trigger compaction only in compaction group
compaction_manager: perform_task_on_all_files: return early when there are no sstables to compact
compaction_manager: perform_cleanup: use compaction_manager::eligible_for_compaction
This short series prevents the creation of compaction tasks when we know in advance that they have nothing to do.
This is possible in the clean path by:
- improve the detection of candidates for cleanup by skipping sstables that require cleanup but are already being compacted
- checking that list of sstables selected for cleanup isn't empty before creating the cleanup task
For upgrade sstables, and generally when rewriting all sstable: launch the task only if the list off candidate sstables isn't empty.
For regular compaction, when triggered via `table::add_sstable_and_update_cache`, we currently trigger compaction (by calling `submit`) on all compaction groups while the sstable is added only to one of them.
Also, it is typically called for maintenance sstables that are awaiting offstrategy compaction, in which case we can skip calling `submit` entirely since the caller triggers offstrategy compaction at a later stage.
Refs scylladb/scylladb#15673
Refs scylladb/scylladb#16694Fixesscylladb/scylladb#16803Closesscylladb/scylladb#16808
* github.com:scylladb/scylladb:
table: add_sstable_and_update_cache: trigger compaction only in compaction group
compaction_manager: perform_task_on_all_files: return early when there are no sstables to compact
compaction_manager: perform_cleanup: use compaction_manager::eligible_for_compaction
When calculating per-DC tablets the formula is shards_in_dc / rf_in_dc,
but the denominator in it can be configured to be literally zero and the
division doesn't work.
Fix by assuming zero tablets for dcs with zero rf
fixes: #16844
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#16861
`server` was the only user of this function and it can now be
implemented using `fsm`'s public interface.
In later commits we'll extend the logic of `io_fiber` to also subscribe
to other events, triggered by `server` API calls, not only to outputs
from `fsm`.
In later commits we will use it to wake up `io_fiber` directly from
`raft::server` based on events generated by `raft::server` itself -- not
only from events generated by `raft::fsm`.
`raft::fsm` still obtains a reference to the condition variable so it
can keep signaling it.
This constructor does not provide persisted commit index. It was only
used in tests, so move it there, to the helper `fsm_debug` which
inherits from `fsm`.
Test cases which used `fsm` directly instead of `fsm_debug` were
modified to use `fsm_debug` so they can access the constructor.
`fsm_debug` doesn't change the behavior of `fsm`, only adds some helper
members. This will be useful in following commits too.
In a later commit we'll move `poll_output` out of `fsm` and it won't
have access to internals logged by this message (`_log.stable_idx()`).
Besides, having it in `has_output` gives a more detailed trace. In
particular we can now see values such as `stable_idx` and `last_idx`
from the moment of returning a new fsm output, not only when poll
started waiting for it (a lot of time can pass between these two
events).
This parameter says how many entries at most should be left trailing
before the snapshot index. There are multiple places where this
decision is made:
- in `applier_fiber` when the server locally decides to take a snapshot
due to log size pressure; this applies to the in-memory log
- in `fsm::step` when the server received an `install_snapshot` message
from the leader; this also applies to the in-memory log
- and in `io_fiber` when calling `store_snapshot_descriptor`; this
applies to the on-disk log.
The logic of how many entries should be left trailing is calculated
twice:
- first, in `applier_fiber` or in `fsm::step` when truncating the
in-memory log
- and then again as the snapshot descriptor is being persisted.
The logic is to take `_config.snapshot_trailing` for locally generated
snapshots (coming from `applier_fiber`) and `0` for remote snapshots
(from `fsm::step`).
But there is already an error injection that changes the behavior of
`applier_fiber` to leave `0` trailing entries. However, this doesn't
affect the following `store_snapshot_descriptor` call which still uses
`_config.snapshot_trailing`. So if the server got restarted, the entries
which were truncated in-memory would get "revived" from disk.
Fortunately, this is test-only code.
However in future commits we'd like to change the logic of
`applier_fiber` even further. So instead of having a separate
calculation of trailing entries inside `io_fiber`, it's better for it to
use the number that was already calculated once. This number is passed to
`fsm::apply_snapshot` (by `applier_fiber` or `fsm::step`) and can then
be received by `io_fiber` from `fsm_output` to use it inside
`store_snapshot_descriptor`.
This looks like a minor oversight, in `server_impl::abort` there are
multiple calls to `set_exception` on the different promises, only one of
them would not receive `*_aborted`.
before this change, we always reference the return value of
`make_reader()`, and the return value's type `flat_mutation_reader_v2`
is movable, so we can just pass it by moving away from it.
in this change, instead of using a lambda, let's just have the
return value of it. simpler this way.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16835