Commit Graph

37096 Commits

Author SHA1 Message Date
Avi Kivity
ffce6d94fc Merge 'service: storage_proxy: make hint write handlers cancellable' from Kamil Braun
The `view_update_write_response_handler` class, which is a subclass of
`abstract_write_response_handler`, was created for a single purpose:
to make it possible to cancel a handler for a view update write,
which means we stop waiting for a response to the write, timing out
the handler immediately. This was done to solve issue with node
shutdown hanging because it was waiting for a view update to finish;
view updates were configured with 5 minute timeout. See #3966, #4028.

Now we're having a similar problem with hint updates causing shutdown
to hang in tests (#8079).

`view_update_write_response_handler` implements cancelling by adding
itself to an intrusive list which we then iterate over to timeout each
handler when we shutdown or when gossiper notifies `storage_proxy`
that a node is down.

To make it possible to reuse this algorithm for other handlers, move
the functionality into `abstract_write_response_handler`. We inherit
from `bi::list_base_hook` so it introduces small memory overhead to
each write handler (2 pointers) which was only present for view update
handlers before. But those handlers are already quite large, the
overhead is small compared to their size.

Use this new functionality to also cancel hint write handlers when we
shutdown. This fixes #8079.

Closes #14047

* github.com:scylladb/scylladb:
  test: reproducer for hints manager shutdown hang
  test: pylib: ScyllaCluster: generalize config type for `server_add`
  test: pylib: scylla_cluster: add explicit timeout for graceful server stop
  service: storage_proxy: make hint write handlers cancellable
  service: storage_proxy: rename `view_update_handlers_list`
  service: storage_proxy: make it possible to cancel all write handler types
2023-05-30 01:36:50 +03:00
Avi Kivity
27f7cc4032 Revert "Merge 'cql: update permissions when creating/altering a function/keyspace' from Wojciech Mitros"
This reverts commit 52e4edfd5e, reversing
changes made to d2d53fc1db. The associated test
fails with about 10% probablity, which blocks other work.

Fixes #13919
Reopens #13747
2023-05-29 23:03:25 +03:00
Botond Dénes
a35758607a Update tools/java submodule
* tools/java eb3c43f8...0cbfeb03 (1):
  > nodetool: add `--primary-replica-only` option to `refresh`
2023-05-29 23:03:25 +03:00
Botond Dénes
fc24685b4d Update tools/jmx submodule
* tools/jmx 1fd23b60...d1077582 (1):
  > Support `--primary-replica-only` option from `nodetool refresh`
2023-05-29 23:03:25 +03:00
Pavel Emelyanov
b0525e20d5 main: Ignore sleep_aborted exception in main
When scylla starts it may go to sleep along the way before the "serving"
message appears. If SIGINT is sent at that time the whole thing unrolls
and the main code ends up catching the sleep_aborted exception, printing
the error in logs and exiting with non-zero code. However, that's not an
error, just the start was interrupted earlier than it was expected by
the stop_signal thing.

fixes: #12898

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #14034
2023-05-29 23:03:25 +03:00
Avi Kivity
2303f08eea utils: logalloc: correct asan_interface.h location
It's a system header, so it deserves angle brackets.

Closes #14036
2023-05-29 23:03:25 +03:00
Benny Halevy
c685ef9e71 partitioned_sstable_set: insert: return early if sst is already in the set
Currently, partitioned_sstable_set::insert may erase a sstable
from the set inadvertently, if an exception is thrown while
(re-)inserting it.

To prevent that, simply return early after detecting that
insertion didn't took place, based on the unordered_set::insert
result.

This issue is theoretical, as there are no known case
of re-inserting sstables into the partitioned sstable set.

Fixes #14060

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #14061
2023-05-29 23:03:25 +03:00
Aleksandra Martyniuk
24864e39dd compaction: delete unnecessary sequence number incrementations
Task manager's tasks that have parent task inherit sequence number
from their parents. Thus they do not need to have a new sequence number
generated as it will be overwritten anyway.

Closes #14045
2023-05-29 23:03:25 +03:00
Kefu Chai
c00f4af5d4 build: cmake: link auth against libcrypt
libxcrypt is used by auth subsystem, for instance, `crypt_r()` provided
by this library is used by passwords.cc. so let's link against it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14030
2023-05-29 23:03:24 +03:00
Benny Halevy
774a10017c backlog_controller: destroy _update_timer before _current_backlog
The _update_timer callback calls adjust() that
depends on _current_backlog and currently, _current_backlog is
destroyed before _update_timer.

This is benign since there are no preemption points in
the destructor, but it's more correct and elegant
to destroy the timer first, before other members it depends on.

Fixes #14056

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #14057
2023-05-29 23:03:24 +03:00
Kefu Chai
a0b8aa9b13 create-relocatable-package.py: raise if rmtree fails
occasionally, we are observing build failures like:
```
17:20:54  FAILED: build/release/dist/tar/scylla-debuginfo-5.4.0~dev-0.20230522.5b2687e11800.x86_64.tar.gz
17:20:54  dist/debuginfo/scripts/create-relocatable-package.py --mode release 'build/release/dist/tar/scylla-debuginfo-5.4.0~dev-0.20230522.5b2687e11800.x86_64.tar.gz'
17:20:54  Traceback (most recent call last):
17:20:54    File "/jenkins/workspace/scylla-master/scylla-ci/scylla/dist/debuginfo/scripts/create-relocatable-package.py", line 60, in <module>
17:20:54      os.makedirs(f'build/{SCYLLA_DIR}')
17:20:54    File "<frozen os>", line 225, in makedirs
17:20:54  FileExistsError: [Errno 17] File exists: 'build/scylla-debuginfo-package'
```

to understand the root cause better, instead of swallowing the error,
let's raise the exception it is not caused by non-existing directory.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13978
2023-05-29 23:03:24 +03:00
Avi Kivity
2cef3350af Merge 'Initialize/destroy ks/cf directories with explicit class methods' from Pavel Emelyanov
This set encapsulates ks/cf directories creation and deletion into keyspace and table classes methods. This is needed to facilitate making the storage initialization storage-type aware in the future. Also this makes the replica/ code less involved in formatting sstables' directory path by hand.

refs: #13020
refs: #12707

Closes #14048

* github.com:scylladb/scylladb:
  keyspace: Introduce init_storage()
  keyspace: Remove column_family_directory()
  table: Introduce destroy_storage()
  table: Simplify init_storage()
  table: Coroutinize init_storage()
  table: Relocate ks.make_directory_for_column_family()
  distributed_loader: Use cf.dir() instead of ks.column_family_directory()
  test: Don't create directory for system tables in cql_test_env
2023-05-29 23:03:24 +03:00
Kefu Chai
55ee0e2724 build: preserve $libs when linking a single testing executable
if we just want to build a single test and scylla executables, we
might want to use `configure.py` like:

./configure.py --mode debug --compiler clang++ --with scylla --with test/boost/database_test

which generates `build.ninja` for us, with following rules:

build $builddir/debug/test/boost/database_test_g: link.debug ... | $builddir/debug/seastar/libseastar.so
$builddir/debug/seastar/libseastar_testing.so
   libs = $seastar_libs_debug $libs -lthrift -lboost_system $seastar_testing_libs_debug
   libs = $seastar_libs_debug

but the last line prevents database_test_g for linking against
the third-party libraries like libabsl, which could have been
pulled in by $libs. but the second assignment expression just
makes the value of `libs` identical to that of `seastar_libs_debug`.
but that library does not include the libraries which are only
used by scylla. so we could run into link failure with the
`build.ninja` generated with this command line. like:
```
FAILED: build/debug/test/boost/database_test_g
...
ld.lld: error: undefined symbol: seastar::testing::entry_point(int, char**)
>>> referenced by scylla_test_case.hh:22 (./test/lib/scylla_test_case.hh:22)
>>>               build/debug/test/boost/database_test.o:(main)
...
ld.lld: error: undefined symbol: boost::unit_test::unit_test_log_t::set_checkpoint(boost::unit_test::basic_cstring<char const>, unsigned long, boost::unit_tes
t::basic_cstring<char const>)
>>> referenced by database_test.cc:298 (test/boost/database_test.cc:298)
>>>               build/debug/test/boost/database_test.o:(require_exist(seastar::basic_sstring<char, unsigned int, 15u, true> const&, bool))
...
```

with this change, the extra assignment expression is dropped. this
should not cause any regression. as f'$seastar_libs_{mode}' as
been included as a part of `local_libs` before the grand if-the-else
block in the for loop before this `f.write()` statement.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14041
2023-05-29 23:03:24 +03:00
Kefu Chai
74dd6dc185 Revert "test: string_format_test: don't compare std::string with sstring"
This reverts commit 3c54d5ec5e.

The reverted change fixed the FTBFS of the test in question with Clang 16,
which rightly stopped convert the LHS of `"hello" == sstring{"hello"}` to
the type of the type acceptable by the member operator even we have a
constructor for this conversion, like

class sstring {
public:
  bar_t(const char*);
  bool operator==(const sstring&) const;
  bool operator!=(const sstring&) const;
};

because we have an operator!=, as per the draft of C++ standard
https://eel.is/c++draft/over.match.oper#4 :

> A non-template function or function template F named operator==
> is a rewrite target with first operand o unless a search for the
> name operator!= in the scope S from the instantiation context of
> the operator expression finds a function or function template
> that would correspond ([basic.scope.scope]) to F if its name were
> operator==, where S is the scope of the class type of o if F is a
> class member, and the namespace scope of which F is a member
> otherwise.

in 397f4b51c3, the seastar submodule was
updated. in which, we now have a dedicated overload for the `const char*`
case. so the compiler is now able to compile the expression like
`"hello" == sstring{"hello"}` in C++20 now.

so, in this change, the workaround is reverted.

Closes #14040
2023-05-29 23:03:24 +03:00
Benny Halevy
26705ba6af partitioned_sstable_set: erase empty runs
When erasing a sstable first check if its run_id
exists in _all_runs, otherwise do nothing with
that respect, and then if the run becomes empty
when erasing the last sstable (and it could have been
a single-sstable run from get go), erase the run
from `_all_runs`.

Fixes #14052

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #14054
2023-05-29 23:03:24 +03:00
Alejo Sanchez
2050a1a125 test.py: warn and skip for missing unit/boost tests
If the executable of a matching unit or boost test is not executable,
warn to console and skip.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #13982
2023-05-29 23:03:24 +03:00
Kamil Braun
beabb61566 test: reproducer for hints manager shutdown hang 2023-05-29 11:03:39 +02:00
Kamil Braun
7e56388721 test: pylib: ScyllaCluster: generalize config type for server_add
Generalize from `dict[str, str]` to `dict[str, Any]`.
2023-05-29 11:03:36 +02:00
Kamil Braun
ce13395ce4 test: pylib: scylla_cluster: add explicit timeout for graceful server stop
If server shutdown hangs, the `manager.server_stop_gracefully` call
would eventually (after 5 minutes) timeout with a cryptic
`TimeoutError`; it's a generic timeout for performing requests by the
tests to `ScyllaClusterManager`. It was non-obvious how to find what
actually caused the timeout - you'd have to browse multiple logs.

Introduce an explicit timeout in `ScyllaServer.stop_gracefully`. Set it
to 1 minute. Whether this is a good value may be arguable, but shutdown
taking longer than that probably indicates problems. The important thing
is that this timeout is shorter than the generic request timeout.

If this times out we get a nice error in the test:
```
E               test.pylib.rest_client.HTTPError: HTTP error 500, uri: http+unix://api/cluster/server/1/stop_gracefully, params: None, json: None, body:
E               Stopping server ScyllaServer(1, 127.162.40.1, 826d5884-4696-4a22-80a7-cc872aa43102) gracefully took longer than 60s
```
2023-05-29 11:03:30 +02:00
Kamil Braun
0ef35ceed4 service: storage_proxy: make hint write handlers cancellable
Whether a write handler should be cancellable is now controlled by a
parameter passed to `create_write_response_handler`. We plumb it down
from `send_to_endpoint` which is called by hints manager.

This will cause hint write handlers to immediately timeout when we
shutdown or when a destination node is marked as dead.

Fixes #8079
2023-05-29 11:03:18 +02:00
Kamil Braun
eddb7406b4 service: storage_proxy: rename view_update_handlers_list
The list will be used for non-view-update write handlers as well, so
generalize the name. Also generalize some variable names used in the
implementation.

This commit only renames things + some comments were added,
there are no logical changes.
2023-05-29 10:59:50 +02:00
Kamil Braun
c7ef9a12ee service: storage_proxy: make it possible to cancel all write handler types
The `view_update_write_response_handler` class, which is a subclass of
`abstract_write_response_handler`, was created for a single purpose: to
make it possible to cancel a handler for a view update write, which
means we stop waiting for a response to the write, timing out the
handler immediately. This was done to solve issue with node shutdown
hanging because it was waiting for a view update to finish; view updates
were configured with 5 minute timeout. See #3966, #4028.

Now we're having a similar problem with hint updates causing shutdown to
hang in tests (#8079).

`view_update_write_response_handler` implements cancelling by adding
itself to an intrusive list which we then iterate over to timeout each
handler when we shutdown or when gossiper notifies `storage_proxy` that
a node is down.

To make it possible to reuse this algorithm for other handlers, move the
functionality into `abstract_write_response_handler`. We inherit from
`bi::list_base_hook` so it introduces small memory overhead to each
write handler (2 pointers) which was only present for view update
handlers before. But those handlers are already quite large, the
overhead is small compared to their size.

Not all handlers are added to the cancelling list, this is controlled by
the `cancellable` parameter passed to the constructor. For now we're
only cancelling view handlers as before. In following commits we'll also
cancel hint handlers.
2023-05-29 10:42:57 +02:00
Kefu Chai
af65d5a1e8 test: sstable: use BOOST_REQUIRE_*() when appropriate
instead of using BOOST_REQUIRE() use, for instance
BOOST_REQUIRE_NE() and BOOST_REQUIRE_EQUAL() for better
error message when the test fails, as Boost::test would
print out the LHS and RHS of the comparison expression
if it fails.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14050
2023-05-27 11:10:47 +03:00
Pavel Emelyanov
5861d15912 Merge 'Small gossiper and migration_manager cleanups' from Gleb
Some assorted cleanups here: consolidation of schema agreement waiting
into a single place and removing unused code from the gossiper.

CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/1458/

Reviewed-by: Konstantin Osipov <kostja@scylladb.com>

* gleb/gossiper-cleanups of github.com:scylladb/scylla-dev:
  storage_service: avoid unneeded copies in on_change
  storage_service: remove check that is always true
  storage_service: rename handle_state_removing to handle_state_removed
  storage_service: avoid string copy
  storage_service: delete code that handled REMOVING_TOKENS state
  gossiper: remove code related to advertising REMOVING_TOKEN state
  migration_manager: add wait_for_schema_agreement() function
2023-05-27 10:49:54 +03:00
Avi Kivity
e4d6ed7a70 Merge 'Coroutinize utils::verify_owner_and_mode()' from Pavel Emelyanov
Closes #14049

* github.com:scylladb/scylladb:
  utils: Restore indentation after previous patch
  utils: Coroutinize verify_owner_and_mode()
2023-05-26 23:20:30 +03:00
Pavel Emelyanov
2eb88945ea utils: Restore indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-26 18:53:14 +03:00
Pavel Emelyanov
4ebb812df0 utils: Coroutinize verify_owner_and_mode()
There's a helper verification_error() that prints a warning and returns
excpetional future. The one is converted into void throwing one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-26 18:52:15 +03:00
Pavel Emelyanov
29d80d1fe9 keyspace: Introduce init_storage()
Similarly to class table, the keyspace class also needs to create
directory for itself for some reason. It looks excessive as table
creation would call recursive_touch_directory() and would create the ks
directory too, but this call is there

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-26 18:15:46 +03:00
Pavel Emelyanov
93d8240bfb keyspace: Remove column_family_directory()
It's no longer used outside of make_column_family_config(). Not to
encourage people to use it -- drop it and open-code into that single
caller

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-26 18:15:43 +03:00
Pavel Emelyanov
0e50fc609c table: Introduce destroy_storage()
When table is DROP-ed the directory with all its sstables is removed
(unless it contains snapshots). Wrap this into table.destroy_storage()
method, later it will need to become sstable::storage-specific

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-26 18:15:43 +03:00
Pavel Emelyanov
7ae49f513e table: Simplify init_storage()
There's no need in copying the datadirs vector to call parallel_for_each
upon. The datadirs[0] is in fact datadir field.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-26 18:15:43 +03:00
Pavel Emelyanov
99dfade020 table: Coroutinize init_storage()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-26 18:15:43 +03:00
Pavel Emelyanov
a19b8af187 table: Relocate ks.make_directory_for_column_family()
This method initializes storage for table naturally belongs to that
class. So rename it while moving. Also, there's no longer need to carry
table name and uuid as arguments, being table method it can just get the
paths to work on from config

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-26 18:15:41 +03:00
Pavel Emelyanov
6db5f08eab distributed_loader: Use cf.dir() instead of ks.column_family_directory()
These two return the same, but the latter makes it the harder way

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-26 17:59:47 +03:00
Pavel Emelyanov
44b811ce19 test: Don't create directory for system tables in cql_test_env
The distributed_loader::init_system_keyspaces() does it when called few
lines above this place

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-26 17:58:46 +03:00
Kamil Braun
a58beb8ce4 Merge 'Fix flakiness of test_tablets.py' from Tomasz Grabiec
We've observed sporadic failures of this test in CI related to driver reconnection after server restart.

Fixes #14032

Closes #14027

* github.com:scylladb/scylladb:
  test: test_tablets.py: Wait for driver to see the hosts after restart
  test: test_tablets.py: Pass server id to server_restart()
  test: test_tablets.py: Add missing await on server_restart()
2023-05-25 14:38:37 +02:00
Gleb Natapov
0e80c5162a storage_service: avoid unneeded copies in on_change
Move array of strings instead of copying.
2023-05-25 14:51:14 +03:00
Gleb Natapov
3a201c25c8 storage_service: remove check that is always true
The array cannot be empty since we access the first element of the array
before we call this function.
2023-05-25 14:50:23 +03:00
Gleb Natapov
715897ff31 storage_service: rename handle_state_removing to handle_state_removed
The function no longer handles REMOVING_TOKING state so rename the
function and drop no longer needed checks for the non existing state.
2023-05-25 14:48:58 +03:00
Gleb Natapov
4103281648 storage_service: avoid string copy 2023-05-25 14:48:39 +03:00
Gleb Natapov
05aa07835d storage_service: delete code that handled REMOVING_TOKENS state
The state is never advertised so the code is never used.
2023-05-25 14:48:09 +03:00
Gleb Natapov
66ff072540 gossiper: remove code related to advertising REMOVING_TOKEN state
Apparently it was needed for removetoken support which was deprecated in
the ORIGIN already.
2023-05-25 14:47:16 +03:00
Gleb Natapov
a429018a8a migration_manager: add wait_for_schema_agreement() function
Several subsystems re-implement the same logic for waiting for schema
agreement. Provide the function in the migration_manager and use it
instead.
2023-05-25 14:44:53 +03:00
Tomasz Grabiec
9d3d9be29e test: test_tablets.py: Wait for driver to see the hosts after restart
Apparently, the driver may be still establishing connections in the
background after connecting to the cluster and queries may fail with:

  cassandra.cluster.NoHostAvailable

Replace reconnection with wait_for_cql_and_get_hosts(), which ensures
that the driver sees the host.
2023-05-25 11:38:40 +02:00
Botond Dénes
5a14c3311a Merge 'Break S3 upload 50Gb file limit' from Pavel Emelyanov
Current S3 uploading sink has implicit limit for the final file size that comes from two places. First, S3 protocol declares that uploading parts count from 1 to 10000 (inclusive). Second, uploading sink sends out parts once they grow above S3 minimal part size which is 5Mb. Since sstables puts data in 128kb (or smaller) portions, parts are almost exactly 5Mb in size, so the total uploading size cannot grow above ~50Gb. That's too low.

To break the limit the new sink (called jumbo sink) uses the UploadPartCopy S3 call that helps splicing several objects into one right on the server. Jumbo sink starts uploading parts into an intermediate temporary object called a piece and named ${original_object}_${piece_number}. When the number of parts in current piece grows above the configured limit the piece is finalized and upload-copied into the object as its next part, then deleted. This happens in the background, meanwhile the new piece is created and subsequent data is put into it. When the sink is flushed the current piece is flushed as is and also squashed into the object.

The new jumbo sink is capable of uploading ~500Tb of data, which looks enough.

fixes: #13019

Closes #13577

* github.com:scylladb/scylladb:
  sstables: Switch data and index sink to use jumbo uploader
  s3/test: Tune-up multipart upload test alignment
  s3/test: Add jumbo upload test
  s3/client: Wait for background upload fiber on close-abort
  c3/client: Implement jumbo upload sink
  s3/client: Move memory buffers to upload_sink from base
  s3/client: Move last part upload out of finalize_upload()
  s3/client: Merge do_flush() with upload_part()
  s3/client: Rename upload_sink -> upload_sink_base
2023-05-25 11:44:06 +03:00
Kamil Braun
1339ae141a Merge 'Small improvements after pending_ranges, endpoints_for_reading -> erm PR' from Gusev Petr
This is a small follow-up for [this PR](https://github.com/scylladb/scylladb/pull/13715), it resolves some comments in the initial PR that didn't make their way into it.
* remove `noexcept` from `clear_gently`, since exceptions can be raised from move constructor;
* an optimisation for `vnode_effective_replication_map::get_range_addresses`, avoid redundant binary search.

Closes #14015

* github.com:scylladb/scylladb:
  vnode_erm: optimize get_range_addresses
  clear_gently: remove noexcept for rvalue references overload
2023-05-25 10:37:27 +02:00
Pavel Emelyanov
222f21d180 messaging_service: Remove unused headers from m.s..hh
The tracing.hh is quite large to care
Another one is "while at it"

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #14024
2023-05-25 08:38:49 +03:00
Kefu Chai
8e7c7e1079 docs/dev/repair_based_node_ops: better formatting
* indent the nested paragraphs of list items
* use table to format the time sequence for better
  readability

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14016
2023-05-25 08:31:43 +03:00
Kefu Chai
8e6fbb99c7 docs/operating-scylla: lowercase the name of an option
"Enable_repair_based_node_ops" is the name of an option, and the leading
character should be lowecase "e". so fix it.

Fixes #14017
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14018
2023-05-25 08:21:59 +03:00
Tomasz Grabiec
51e3b9321b Merge ' mvcc: make schema upgrades gentle' from Michał Chojnowski
After a schema change, memtable and cache have to be upgraded to the new schema. Currently, they are upgraded (on the first access after a schema change) atomically, i.e. all rows of the entry are upgraded with one non-preemptible call. This is a one of the last vestiges of the times when partition were treated atomically, and it is a well known source of numerous large stalls.

This series makes schema upgrades gentle (preemptible). This is done by co-opting the existing MVCC machinery.
Before the series, all partition_versions in the partition_entry chain have the same schema, and an entry upgrade replaces the entire chain with a single squashed and upgraded version.
After the series, each partition_version has its own schema. A partition entry upgrade happens simply by adding an empty version with the new schema to the head of the chain. Row entries are upgraded to the current schema on-the-fly by the cursor during reads, and by the MVCC version merge ongoing in the background after the upgrade.

The series:
1. Does some code cleanup in the mutation_partition area.
2. Adds a schema field to partition_version and removes it from its containers (partition_snapshot, cache_entry, memtable_entry).
3. Adds upgrading variants of constructors and apply() for `row` and its wrappers.
4. Prepares partition_snapshot_row_cursor, mutation_partition_v2::apply_monotonically and partition_snapshot::merge_partition_versions for dealing with heterogeneous version chains.
5. Modifies partition_entry::upgrade to perform upgrades by extending the version chain with a new schema instead of squashing it to a single upgraded version.

Fixes #2577

Closes #13761

* github.com:scylladb/scylladb:
  test: mvcc_test: add a test for gentle schema upgrades
  partition_version: make partition_entry::upgrade() gentle
  partition_version: handle multi-schema snapshots in merge_partition_versions
  mutation_partition_v2: handle schema upgrades in apply_monotonically()
  partition_version: remove the unused "from" argument in partition_entry::upgrade()
  row_cache_test: prepare test_eviction_after_schema_change for gentle schema upgrades
  partition_version: handle multi-schema entries in partition_entry::squashed
  partition_snapshot_row_cursor: handle multi-schema snapshots
  partiton_version: prepare partition_snapshot::squashed() for multi-schema snapshots
  partition_version: prepare partition_snapshot::static_row() for multi-schema snapshots
  partition_version: add a logalloc::region argument to partition_entry::upgrade()
  memtable: propagate the region to memtable_entry::upgrade_schema()
  mutation_partition: add an upgrading variant of lazy_row::apply()
  mutation_partition: add an upgrading variant of rows_entry::rows_entry
  mutation_partition: switch an apply() call to apply_monotonically()
  mutation_partition: add an upgrading variant of rows_entry::apply_monotonically()
  mutation_fragment: add an upgrading variant of clustering_row::apply()
  mutation_partition: add an upgrading variant of row::row
  partition_version: remove _schema from partition_entry::operator<<
  partition_version: remove the schema argument from partition_entry::read()
  memtable: remove _schema from memtable_entry
  row_cache: remove _schema from cache_entry
  partition_version: remove the _schema field from partition_snapshot
  partition_version: add a _schema field to partition_version
  mutation_partition: change schema_ptr to schema& in mutation_partition::difference
  mutation_partition: change schema_ptr to schema& in mutation_partition constructor
  mutation_partition_v2: change schema_ptr to schema& in mutation_partition_v2 constructor
  mutation_partition: add upgrading variants of row::apply()
  partition_version: update the comment to apply_to_incomplete()
  mutation_partition_v2: clean up variants of apply()
  mutation_partition: remove apply_weak()
  mutation_partition_v2: remove a misleading comment in apply_monotonically()
  row_cache_test: add schema changes to test_concurrent_reads_and_eviction
  mutation_partition: fix mixed-schema apply()
2023-05-24 22:58:43 +02:00