Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
This situation before this patch is that when tablets are enabled for
a keyspace, we can create a materialized view but later any write to
the base table fails with an on_internal_error(), saying that:
"Tried to obtain per-keyspace effective replication map of test
but it's per-table."
Indeed, with tablets, the replication is different for each table - it's
not the same for the entire keyspace.
So this patch changes the view update code to take the replication
map from the specific base table, not the keyspace.
This is good enough to get materialized-views reads and writes working
in a simple single-node case, as the included test demonstrates (the
test fails with on_internal_error() before this patch, and passes
afterwards).
But this fix is not perfect - the base-view pairing code really needs
to consider not only the base table's replication map, but also the
view table's replication map - as those can be different. We'll fix
this remaining problem as a followup in a separate patch - it will
require a substantially more elaborate test to reproduce the need
for the different mapping and to verify that fix.
Fixes#16209.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#16211
The purpose of `maybe_fix_legacy_secondary_index_mv_schema` was to deal
with legacy materialized view schemas used for secondary indexes,
schemas which were created before the notion of "computed columns" was
introduced. Back then, secondary index schemas would use a regular
"token" column. Later it became a computed column and old schemas would
be migrated during rolling upgrade.
The migration code was introduced in 2019
(db8d4a0cc6) and then fixed in 2020
(d473bc9b06).
The fix was present in Enterprise 2022.1 and in OSS 4.5. So, assuming
that users don't try crazy things like upgrading from 2021.X to 2023.X
(which we do not support), all clusters will have already executed the
migration code once they upgrade to 2023.X, meaning we can get rid of
it.
The main motivation of this PR is to get rid of the
`db::schema_tables::merge_schema` call in `parse_schema_tables`. In Raft
mode this was the only call to `merge_schema` outside "group 0 code" and
in fact it is unsafe -- it uses locally generated mutations with locally
generated timestamp (`api::new_timestamp()`), so if we actually did it,
we would permanently diverge the group 0 state machine across nodes
(the schema pulling code is disabled in Raft mode). Fortunately, this
should be dead code by now, as explained in the previous paragraph.
The migration code is now turned into a sanity check, if the users
try something crazy, they will get an error instead of silent data
corruption.
Closesscylladb/scylladb#15695
* github.com:scylladb/scylladb:
view: remove unused `_backing_secondary_index`
schema_tables: turn view schema fixing code into a sanity check
schema_tables: make comment more precise
feature_service: make COMPUTED_COLUMNS feature unconditionally true
When base write triggers mv write and it needs to be send to another
shard it used the same service group and we could end up with a
deadlock.
This fix affects also alternator's secondary indexes.
Testing was done using (yet) not committed framework for easy alternator
performance testing: https://github.com/scylladb/scylladb/pull/13121.
I've changed hardcoded max_nonlocal_requests config in scylla from 5000 to 500 and
then ran:
./build/release/scylla perf-alternator-workloads --workdir /tmp/scylla-workdir/ --smp 2 \
--developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write_gsi \
--duration 60 --ring-delay-ms 0 --skip-wait-for-gossip-to-settle 0 --continue-after-error true --concurrency 2000
Without the patch when scylla is overloaded (i.e. number of scheduled futures being close to max_nonlocal_requests) after couple seconds
scylla hangs, cpu usage drops to zero, no progress is made. We can confirm we're hitting this issue by seeing under gdb:
p seastar::get_smp_service_groups_semaphore(2,0)._count
$1 = 0
With the patch I wasn't able to observe the problem, even with 2x
concurrency. I was able to make the process hang with 10x concurrency
but I think it's hitting different limit as there wasn't any depleted
smp service group semaphore and it was happening also on non mv loads.
Fixes https://github.com/scylladb/scylladb/issues/15844Closesscylladb/scylladb#15845
The purpose of `maybe_fix_legacy_secondary_index_mv_schema` was to deal
with legacy materialized view schemas used for secondary indexes,
schemas which were created before the notion of "computed columns" was
introduced. Back then, secondary index schemas would use a regular
"token" column. Later it became a computed column and old schemas would
be migrated during rolling upgrade.
The migration code was introduced in 2019
(db8d4a0cc6) and then fixed in 2020
(d473bc9b06).
The fix was present in Enterprise 2022.1 and in OSS 4.5. So, assuming
that users don't try crazy things like upgrading from 2021.X to 2023.X
(which we do not support), all clusters will have already executed the
migration code once they upgrade to 2023.X, meaning we can get rid of
it.
The main motivation of this patch is to get rid of the
`db::schema_tables::merge_schema` call in `parse_schema_tables`. In Raft
mode this was the only call to `merge_schema` outside "group 0 code" and
in fact it is unsafe -- it uses locally generated mutations with locally
generated timestamp (`api::new_timestamp()`), so if we actually did it,
we would permanently diverge the group 0 state machine across nodes
(the schema pulling code is disabled in Raft mode). Fortunately, this
should be dead code by now, as explained in the previous paragraph.
The migration code is now turned into a sanity check, if the users
try something crazy, they will get an error instead of silent data
corruption.
When a remote view update doesn't succeed there's a log message
saying "Error applying view update...".
This message had log level ERROR, but it's not really a hard error.
View updates can fail for a multitude of reasons, even during normal operation.
A failing view update isn't fatal, it will be saved as a view hint a retried later.
Let's change the log level to WARN. It's something that shouldn't happen too much,
but it's not a disaster either.
ERROR log level causes trouble in tests which assume that an ERROR level message
means that the test has failed.
Refs: https://github.com/scylladb/scylladb/issues/15046#issuecomment-1712748784
For local view updates the log level stays at "ERROR", local view updates shouldn't fail.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Closesscylladb/scylladb#15640
When sending mutation to remote endpoint,
the selected endpoints must be in sync with
the current effective_replication_map.
Currently, the endpoints are sent down the storage_proxy
stack, and later on an effective_replication_map is retrieved
again, and it might not match the target or pending endpoints,
similar to the case seen in https://github.com/scylladb/scylladb/issues/15138
The correct way is to carry the same effective replication map
used to select said endpoints and pass it down the stack.
See also https://github.com/scylladb/scylladb/pull/15141
Fixes scylladb/scylladb#15144
Fixes scylladb/scylladb#14730
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#15142
We see the abort_requested_exception error from time
to time, instead of sleep_aborted that was expected
and quietly ignored (in debug log level).
Treat abort_requested_exception the same way since
the error is expected on shutdown and to reduce
test flakiness, as seen for example, in
https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/3033/artifact/logs-full.release.010/1691896356104_repair_additional_test.py%3A%3ATestRepairAdditional%3A%3Atest_repair_schema/node2.log
```
INFO 2023-08-13 03:12:29,151 [shard 0] compaction_manager - Asked to stop
WARN 2023-08-13 03:12:29,152 [shard 0] gossip - failure_detector_loop: Got error in the loop, live_nodes={}: seastar::sleep_aborted (Sleep is aborted)
INFO 2023-08-13 03:12:29,152 [shard 0] gossip - failure_detector_loop: Finished main loop
WARN 2023-08-13 03:12:29,152 [shard 0] cdc - Aborted update CDC description table with generation (2023/08/13 03:12:17, d74aad4b-6d30-4f22-947b-282a6e7c9892)
INFO 2023-08-13 03:12:29,152 [shard 1] compaction_manager - Asked to stop
INFO 2023-08-13 03:12:29,152 [shard 1] compaction_manager - Stopped
INFO 2023-08-13 03:12:29,153 [shard 0] init - Signal received; shutting down
INFO 2023-08-13 03:12:29,153 [shard 0] init - Shutting down view builder ops
INFO 2023-08-13 03:12:29,153 [shard 0] view - Draining view builder
INFO 2023-08-13 03:12:29,153 [shard 1] view - Draining view builder
INFO 2023-08-13 03:12:29,153 [shard 0] compaction_manager - Stopped
ERROR 2023-08-13 03:12:29,153 [shard 0] view - start failed: seastar::abort_requested_exception (abort requested)
ERROR 2023-08-13 03:12:29,153 [shard 1] view - start failed: seastar::abort_requested_exception (abort requested)
```
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#15029
View update routines accept `mutation` objects.
But what comes out of staging sstable readers is a stream of
mutation_fragment_v2 objects.
To build view updates after a repair/streaming, we have to
convert the fragment stream into `mutation`s. This is done by piping
the stream to mutation_rebuilder_v2.
To keep memory usage limited, the stream for a single partition might
have to be split into multiple partial `mutation` objects.
view_update_consumer does that, but in improper way -- when the
split/flush happens inside an active range tombstone, the range
tombstone isn't closed properly. This is illegal, and triggers an
internal error.
This patch fixes the problem by closing the active range tombstone
(and reopening in the same position in the next `mutation` object).
The tombstone is closed just after the last seen clustered position.
This is not necessary for correctness -- for example we could delay
all processing of the range tombstone until we see its end
bound -- but it seems like the most natural semantic.
Fixes#14503
Adding a function declaration to expression.hh causes many
recompilations. Reduce that by:
- moving some restrictions-related definitions to
the existing expr/restrictions.hh
- moving evaluation related names to a new header
expr/evaluate.hh
- move utilities to a new header
expr/expr-utilities.hh
expression.hh contains only expression definitions and the most
basic and common helpers, like printing.
Spans are slightly cleaner, slightly faster (as they avoid an indirection),
and allow for replacing some of the arguments with small_vector:s.
Closes#14313
In that level no io_priority_class-es exist. Instead, all the IO happens
in the context of current sched-group. File API no longer accepts prio
class argument (and makes io_intent arg mandatory to impls).
So the change consists of
- removing all usage of io_priority_class
- patching file_impl's inheritants to updated API
- priority manager goes away altogether
- IO bandwidth update is performed on respective sched group
- tune-up scylla-gdb.py io_queues command
The first change is huge and was made semi-autimatically by:
- grep io_priority_class | default_priority_class
- remove all calls, found methods' args and class' fields
Patching file_impl-s is smaller, but also mechanical:
- replace io_priority_class& argument with io_intent* one
- pass intent to lower file (if applicatble)
Dropping the priority manager is:
- git-rm .cc and .hh
- sed out all the #include-s
- fix configure.py and cmakefile
The scylla-gdb.py update is a bit hairry -- it needs to use task queues
list for IO classes names and shares, but to detect it should it checks
for the "commitlog" group is present.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13963
The `view_update_write_response_handler` class, which is a subclass of
`abstract_write_response_handler`, was created for a single purpose:
to make it possible to cancel a handler for a view update write,
which means we stop waiting for a response to the write, timing out
the handler immediately. This was done to solve issue with node
shutdown hanging because it was waiting for a view update to finish;
view updates were configured with 5 minute timeout. See #3966, #4028.
Now we're having a similar problem with hint updates causing shutdown
to hang in tests (#8079).
`view_update_write_response_handler` implements cancelling by adding
itself to an intrusive list which we then iterate over to timeout each
handler when we shutdown or when gossiper notifies `storage_proxy`
that a node is down.
To make it possible to reuse this algorithm for other handlers, move
the functionality into `abstract_write_response_handler`. We inherit
from `bi::list_base_hook` so it introduces small memory overhead to
each write handler (2 pointers) which was only present for view update
handlers before. But those handlers are already quite large, the
overhead is small compared to their size.
Use this new functionality to also cancel hint write handlers when we
shutdown. This fixes#8079.
Closes#14047
* github.com:scylladb/scylladb:
test: reproducer for hints manager shutdown hang
test: pylib: ScyllaCluster: generalize config type for `server_add`
test: pylib: scylla_cluster: add explicit timeout for graceful server stop
service: storage_proxy: make hint write handlers cancellable
service: storage_proxy: rename `view_update_handlers_list`
service: storage_proxy: make it possible to cancel all write handler types
Whether a write handler should be cancellable is now controlled by a
parameter passed to `create_write_response_handler`. We plumb it down
from `send_to_endpoint` which is called by hints manager.
This will cause hint write handlers to immediately timeout when we
shutdown or when a destination node is marked as dead.
Fixes#8079
Some assorted cleanups here: consolidation of schema agreement waiting
into a single place and removing unused code from the gossiper.
CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/1458/
Reviewed-by: Konstantin Osipov <kostja@scylladb.com>
* gleb/gossiper-cleanups of github.com:scylladb/scylla-dev:
storage_service: avoid unneeded copies in on_change
storage_service: remove check that is always true
storage_service: rename handle_state_removing to handle_state_removed
storage_service: avoid string copy
storage_service: delete code that handled REMOVING_TOKENS state
gossiper: remove code related to advertising REMOVING_TOKEN state
migration_manager: add wait_for_schema_agreement() function
After a schema change, memtable and cache have to be upgraded to the new schema. Currently, they are upgraded (on the first access after a schema change) atomically, i.e. all rows of the entry are upgraded with one non-preemptible call. This is a one of the last vestiges of the times when partition were treated atomically, and it is a well known source of numerous large stalls.
This series makes schema upgrades gentle (preemptible). This is done by co-opting the existing MVCC machinery.
Before the series, all partition_versions in the partition_entry chain have the same schema, and an entry upgrade replaces the entire chain with a single squashed and upgraded version.
After the series, each partition_version has its own schema. A partition entry upgrade happens simply by adding an empty version with the new schema to the head of the chain. Row entries are upgraded to the current schema on-the-fly by the cursor during reads, and by the MVCC version merge ongoing in the background after the upgrade.
The series:
1. Does some code cleanup in the mutation_partition area.
2. Adds a schema field to partition_version and removes it from its containers (partition_snapshot, cache_entry, memtable_entry).
3. Adds upgrading variants of constructors and apply() for `row` and its wrappers.
4. Prepares partition_snapshot_row_cursor, mutation_partition_v2::apply_monotonically and partition_snapshot::merge_partition_versions for dealing with heterogeneous version chains.
5. Modifies partition_entry::upgrade to perform upgrades by extending the version chain with a new schema instead of squashing it to a single upgraded version.
Fixes#2577Closes#13761
* github.com:scylladb/scylladb:
test: mvcc_test: add a test for gentle schema upgrades
partition_version: make partition_entry::upgrade() gentle
partition_version: handle multi-schema snapshots in merge_partition_versions
mutation_partition_v2: handle schema upgrades in apply_monotonically()
partition_version: remove the unused "from" argument in partition_entry::upgrade()
row_cache_test: prepare test_eviction_after_schema_change for gentle schema upgrades
partition_version: handle multi-schema entries in partition_entry::squashed
partition_snapshot_row_cursor: handle multi-schema snapshots
partiton_version: prepare partition_snapshot::squashed() for multi-schema snapshots
partition_version: prepare partition_snapshot::static_row() for multi-schema snapshots
partition_version: add a logalloc::region argument to partition_entry::upgrade()
memtable: propagate the region to memtable_entry::upgrade_schema()
mutation_partition: add an upgrading variant of lazy_row::apply()
mutation_partition: add an upgrading variant of rows_entry::rows_entry
mutation_partition: switch an apply() call to apply_monotonically()
mutation_partition: add an upgrading variant of rows_entry::apply_monotonically()
mutation_fragment: add an upgrading variant of clustering_row::apply()
mutation_partition: add an upgrading variant of row::row
partition_version: remove _schema from partition_entry::operator<<
partition_version: remove the schema argument from partition_entry::read()
memtable: remove _schema from memtable_entry
row_cache: remove _schema from cache_entry
partition_version: remove the _schema field from partition_snapshot
partition_version: add a _schema field to partition_version
mutation_partition: change schema_ptr to schema& in mutation_partition::difference
mutation_partition: change schema_ptr to schema& in mutation_partition constructor
mutation_partition_v2: change schema_ptr to schema& in mutation_partition_v2 constructor
mutation_partition: add upgrading variants of row::apply()
partition_version: update the comment to apply_to_incomplete()
mutation_partition_v2: clean up variants of apply()
mutation_partition: remove apply_weak()
mutation_partition_v2: remove a misleading comment in apply_monotonically()
row_cache_test: add schema changes to test_concurrent_reads_and_eviction
mutation_partition: fix mixed-schema apply()
We already use the new pending_endpoints from erm though
the get_pending_ranges virtual function, in this commit
we update all the remaining places to use the new
implementation in erm, as well as remove the old implementation
in token_metadata.
The only reason why it's there (right next to compaction_fwd.hh) is
because the database::table_truncate_state subclass needs the definition
of compaction_manager::compaction_reenabler subclass.
However, the former sub is not used outside of database.cc and can be
defined in .cc. Keeping it outside of the header allows dropping the
compaction_manager.hh from database.hh thus greatly reducing its fanout
over the code (from ~180 indirect inclusions down to ~20).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13622
The method needs proxy to get data_dictionary::database from to pass down to select_statement::prepare(). And a legacy bit that can come with data_dictionary::database as well. Fortunately, all the call traces that end up at select_statement() start inside table:: methods that have view_update_generator, or at view_builder::consumer that has reference to view_builder. Both services can share the database reference. However, the call traces in question pass through several code layers, so the PR adds data_dictionary::database to those layers one by one.
Closes#13591
* github.com:scylladb/scylladb:
view_info: Drop calls to get_local_storage_proxy()
view_info: Add data_dictionary argument to select_statement()
view_info: Add data_dictionary argument to partition_slice() method
view_filter_checking_visitor: Construct with data_dictionary
view: Carry data_dictionary arg through standalone helpers
view_updates: Carry data_dictionary argument throug methods
view_update_builder: Construct with data dictionary
table: Push view_update_generator arg to affected_views()
view: Add database getters to v._update_generator and v._builder
The view_builder::view_build_statuses() needs topology to walk its
nodes. Now it gets one from global proxy via its token metadata, but
database also has tokens and view_builder has reference to database.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In both cases the proxy is called to get data_dictionary from. Now its
available as the call argument.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This method needs data_dictionary to work. Fortunately, all callers of
it already have the dictionary at hand and can just pass it as argument.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The caller is calculate_affected_clustering_ranges() with dictionary
arg, the method needs dictionary to call view_info::select_statement()
later.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The visitor is wait-free helper for matches_view_filter() that has
dictionary as its argument. Later the visitor will pass the dictionary
to view_info::select_statement().
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There's a bunch of functions in view.{hh|cc} that don't belong to any
class and perform view-related claculations for view updates. Lots of
them eventually call view_info::select_statement() which will later need
the dictionary.
By now all those methods' callers have data dictionary at hand and can
share it via argument.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The goal is to have the dictionary at places that later wrap calls to
view_info::select_statement(). This graph of calls starts at the only
public view_updates::generate_update() method which, in turn, is called
from view_update_builder that already has data dictionary at hand.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The caller is table with view-update-generator at hand (it calls
mutate_MV on). Builder here is used as a temporary object that destroys
once the caller coroutine co_return-s, so keeping the database obtained
from the view-update-generator is safe.
Later the v.u.b. object will propagate its data dictionary down the
callstacks.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When view builder constructs it populates itself with view updates.
Later the updates may instantiate the value_getter-s which, in turn,
would need to check if the view is backing secondary index.
Good news is that when view builder constructs it has all the
information at hand needed to evaluate this "backing" bit. It's then
propagated down to value_getter via corresponding view_updates.
The getter's _view field becomes unused after this change and is
(void)-ed to make this patch compile.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The getter needs to check if the view is backing a secondary index.
Currentl it's done inside the handle_computed_column() method, but it's
more convenient if this bit is known during construction, so move it
there. There are no places that can change this property between
view_getter is created and the method in question is called.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Don't maintain a "shadow" endpoint_to_host_id_map in token_metadata_impl.
Instead, get the nodes_by_endpoint map from topology
and use it to build the endpoint_to_host_id_map.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This is not really an error, so print it in debug log_level
rather than error log_level.
Fixes#13374
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#13462
Now the mutate_MV is the method of v.u.generator which has reference to
the sharded<storage_proxy>. Few helper static wrappers are patched to
get the needed proxy or database reference from the mutate_MV call.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Nowadays its a static helper, but internally it depends on storage
proxy, so it grabs its global instance. Making it a method of view
update generator makes it possible to use the proxy dependency from the
generator.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The method is called by view_builder::consumer when building a view and
the consumer already has stable dependency reference on the view updates
generator.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The latter is the place where mutate_MV is called and it needs the
view updates generator nearby.
The call-stack starts at database::do_apply(). As was described in one
of the previous patches, applying mutations that need updating views
happen late enough, so if the view updates generator is not plugged to
the database yet, it's OK to bail out with exception. If it's plugged,
it's carried over thus keeping the generator instance alive and waited
for on its stop.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is another mutations consumer that pushes view updates forward and
thus also needs the view updates generator pointer. It gets one from the
view builder that already has the dependency on generator.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The consumer is in fact pushing the updates and _that_'s the component
that would really need the view_update_generator at hand. The consumer
is created from the generator itself so no troubles getting the pointer.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The builder will need generator for view_builder::consumer in one of the
next patches.
The builder is a standalone service that starts one of the latest and no
other services need builder as their dependency.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When propagating a view update to a paired view
replica fails, there is an error message.
This message is printed for every mutation,
which causes log spam when some node goes down.
This isn't a fatal error - it's normal that
a remote view replica goes down, it'll hopefully
receive the updates later through hints.
I'm unsure if the error message should
be printed at all, but for now we can
just rate limit it and that will improve
the situation with log spamming.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Closes#13175
And propagate it down to where it is created. This will be used to add
trace points for semaphore related events, but this will come in the
next patches.
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Move mutation-related files to a new mutation/ directory. The names
are kept in the global namespace to reduce churn; the names are
unambiguous in any case.
mutation_reader remains in the readers/ module.
mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this
patch.
This is a step forward towards librarization or modularization of the
source base.
Closes#12788