Compare commits

..

119 Commits

Author SHA1 Message Date
Yaniv Kaul
953fee7fb0 Update cql-extensions.md 2023-10-25 15:18:55 +03:00
Botond Dénes
6c90d166cc Merge 'build: cmake: avoid using large amount stack of when compiling parser ' from Kefu Chai
this mirrors what we have in `configure.py`, to build the CqlParser with `-O1`
and disable `-fsanitize-address-use-after-scope` when compiling CqlParser.cc
in order to prevent the compiler from emitting code which uses large amount of stack
space at the runtime.

Closes scylladb/scylladb#15819

* github.com:scylladb/scylladb:
  build: cmake: avoid using large amount stack of when compiling parser
  build: cmake: s/COMPILE_FLAGS/COMPILE_OPTIONS/
2023-10-24 16:19:51 +03:00
Nadav Har'El
4b80130b0b Merge 'reduce announcements of the automatic schema changes ' from Patryk Jędrzejczak
There are some schema modifications performed automatically (during bootstrap, upgrade etc.) by Scylla that are announced by multiple calls to `migration_manager::announce` even though they are logically one change. Precisely, they appear in:
- `system_distributed_keyspace::start`,
- `redis:create_keyspace_if_not_exists_impl`,
- `table_helper::setup_keyspace` (for the `system_traces` keyspace).

All these places contain a FIXME telling us to `announce` only once. There are a few reasons for this:
- calling `migration_manager::announce` with Raft is quite expensive -- taking a `read_barrier` is necessary, and that requires contacting a leader, which then must contact a quorum,
- we must implement a retrying mechanism for every automatic `announce` if `group0_concurrent_modification` occurs to enable support for concurrent bootstrap in Raft-based topology. Doing it before the FIXMEs mentioned above would be harder, and fixing the FIXMEs later would also be harder.

This PR fixes the first two FIXMEs and improves the situation with the last one by reducing the number of the `announce` calls to two. Unfortunately, reducing this number to one requires a big refactor. We can do it as a follow-up to a new, more specific issue. Also, we leave a new FIXME.

Fixing the first two FIXMEs required enabling the announcement of a keyspace together with its tables. Until now, the code responsible for preparing mutations for a new table could assume the existence of the keyspace. This assumption wasn't necessary, but removing it required some refactoring.

Fixes #15437

Closes scylladb/scylladb#15594

* github.com:scylladb/scylladb:
  table_helper: announce twice in setup_keyspace
  table_helper: refactor setup_table
  redis: create_keyspace_if_not_exists_impl: fix indentation
  redis: announce once in create_keyspace_if_not_exists_impl
  db: system_distributed_keyspace: fix indentation
  db: system_distributed_keyspace: announce once in start
  tablet_allocator: update on_before_create_column_family
  migration_listener: add parameter to on_before_create_column_family
  alternator: executor: use new prepare_new_column_family_announcement
  alternator: executor: introduce create_keyspace_metadata
  migration_manager: add new prepare_new_column_family_announcement
2023-10-24 15:42:48 +03:00
David Garcia
a5519c7c1f docs: update cofig params design
Closes scylladb/scylladb#15827
2023-10-24 15:41:56 +03:00
Kefu Chai
f8104b92f8 build: cmake: detect rapidxml
we use rapidxml for parsing XML, so let's detect it before using it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15813
2023-10-24 15:12:04 +03:00
Kamil Braun
2a21029ff5 Merge 'make topology_coordinator::run noexcept' from Gleb
Topology coordinator should handle failures internally as long as it
remains to be the coordinator. The raft state monitor is not in better
position to handle any errors thrown by it, all it can do it to restart
the coordinator. The series makes topology_coordinator::run handle all
the errors internally and mark the function as noexcept to not leak
error handling complexity into the raft state monitor.

* 'gleb/15728-fix' of github.com:scylladb/scylla-dev:
  storage_service: raft topology: mark topology_coordinator::run function as noexcept
  storage_service: raft topology: do not throw error from fence_previous_coordinator()
2023-10-24 12:16:36 +02:00
Kefu Chai
4abcec9296 test: add __repr__ for MinIoServer and S3_Server
it is printed when pytest passes it down as a fixture as part of
the logging message. it would help with debugging a object_store test.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15817
2023-10-24 12:35:49 +03:00
Gleb Natapov
dcaaa74cd4 storage_service: raft topology: mark topology_coordinator::run function as noexcept
The function handled all exceptions internally. By making it noexcept we
make sure that the caller (raft_state_monitor_fiber) does not need
handle any exceptions from the topology coordinator fiber.
2023-10-24 10:58:45 +03:00
Gleb Natapov
65bf5877e7 storage_service: raft topology: do not throw error from fence_previous_coordinator()
Throwing error kills the topology coordinator monitor fiber. Instead we
retry the operation until it succeeds or the node looses its leadership.
This is fine before for the operation to succeed quorum is needed and if
the quorum is not available the node should relinquish its leadership.

Fixes #15728
2023-10-24 10:57:48 +03:00
Botond Dénes
0cba973972 Update tools/java submodule
* tools/java 3c09ab97...86a200e3 (1):
  > cassandra-stress: add storage options
2023-10-24 09:41:36 +03:00
Kefu Chai
9347b61d3b build: cmake: avoid using large amount stack of when compiling parser
this mirrors what we have in `configure.py`, to build the CqlParser with -O1
and disable sanitize-address-use-after-scope when compiling CqlParser.cc
in order to prevent the compiler from emitting code which uses large amount of stack
at the runtime.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-24 12:40:20 +08:00
Kefu Chai
3da02e1bf4 build: cmake: s/COMPILE_FLAGS/COMPILE_OPTIONS/
according to
https://cmake.org/cmake/help/latest/prop_sf/COMPILE_FLAGS.html,
COMPILE_FLAGS has been superseded by COMPILE_OPTIONS. so let's
replace the former with the latter.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-24 12:40:20 +08:00
Pavel Emelyanov
7c580b4bd4 Merge 'sstable: switch to uuid identifier for naming S3 sstable objects' from Kefu Chai
before this change, we create a new UUID for a new sstable managed by the s3_storage, and we use the string representation of UUID defined by RFC4122 like "0aa490de-7a85-46e2-8f90-38b8f496d53b" for naming the objects stored on s3_storage. but this representation is not what we are using for storing sstables on local filesystem when the option of "uuid_sstable_identifiers_enabled" is enabled. instead, we are using a base36-based representation which is shorter.

to be consistent with the naming of the sstables created for local filesystem, and more importantly, to simplify the interaction between the local copy of sstables and those stored on object storage, we should use the same string representation of the sstable identifier.

so, in this change:

1. instead of creating a new UUID, just reuse the generation of the sstable for the object's key.
2. do not store the uuid in the sstable_registry system table. As we already have the generation of the sstable for the same purpose.
3. switch the sstable identifier representation from the one defined by the RFC4122 (implemented by fmt::formatter<utils::UUID>) to the base36-based one (implemented by fmt::formatter<sstables::generation_type>)

Fixes #14175
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#14406

* github.com:scylladb/scylladb:
  sstable: remove _remote_prefix from s3_storage
  sstable: switch to uuid identifier for naming S3 sstable objects
2023-10-23 21:05:13 +03:00
Pavel Emelyanov
d7031de538 Merge 'test/pylib: extract the env variable related functions out' from Kefu Chai
this series extracts the the env variables related functions out and remove unused `import`s for better readability.

Closes scylladb/scylladb#15796

* github.com:scylladb/scylladb:
  test/pylib: remove duplicated imports
  test/pylib: extract the env variable printing into MinIoServer
  test/pylib: extract _set_environ() out
2023-10-23 21:03:03 +03:00
Aleksandra Martyniuk
0c6a3f568a compaction: delete default_compaction_progress_monitor
default_compaction_progress_monitor returns a reference to a static
object. So, it should be read-only, but its users need to modify it.

Delete default_compaction_progress_monitor and use one's own
compaction_progress_monitor instance where it's needed.

Closes scylladb/scylladb#15800
2023-10-23 16:03:34 +03:00
Anna Stuchlik
55ee999f89 doc: enable publishing docs for branch-5.4
This commit enables publishing documentation
from branch-5.4. The docs will be published
as UNSTABLE (the warning about version 5.4
being unstable will be displayed).

Closes scylladb/scylladb#15762
2023-10-23 15:47:01 +03:00
Avi Kivity
ee9cc450d4 logalloc: report increases of reserves
The log-structured allocator maintains memory reserves to so that
operations using log-strucutured allocator memory can have some
working memory and can allocate. The reserves start small and are
increased if allocation failures are encountered. Before starting
an operation, the allocator first frees memory to satisfy the reserves.

One problem is that if the reserves are set to a high value and
we encounter a stall, then, first, we have no idea what value
the reserves are set to, and second, we have no idea what operation
caused the reserves to be increased.

We fix this problem by promoting the log reports of reserve increases
from DEBUG level to INFO level and by attaching a stack trace to
those reports. This isn't optimal since the messages are used
for debugging, not for informing the user about anything important
for the operation of the node, but I see no other way to obtain
the information.

Ref #13930.

Closes scylladb/scylladb#15153
2023-10-23 13:37:50 +02:00
Tomasz Grabiec
4af585ec0e Merge 'row_cache: make_reader_opt(): make make_context() reentrant ' from Botond Dénes
Said method is called in an allocating section, which will re-try the enclosed lambda on allocation failure. `read_context()` however moves the permit parameter so on the second and later calls, the permit will be in a moved-from state, triggering a `nullptr` dereference and therefore a segfault.

We already have a unit test (`test_exception_safety_of_reads` in `row_cache_test.cc`) which was supposed to cover this, but:
* It only tests range scans, not single partition reads, which is a separate path.
* Turns out allocation failure tests are again silently broken (no error is injected at all). This is because `test/lib/memtable_snapshot_source.hh` creates a critical alloc section which accidentally covers the entire duration of tests using it.

Fixes: #15578

Closes scylladb/scylladb#15614

* github.com:scylladb/scylladb:
  test/boost/row_cache_test: test_exception_safety_of_reads: also cover single-partition reads
  test/lib/memtable_snapshot_source: disable critical alloc section while waiting
  row_cache: make_reader_opt(): make make_context() reentrant
2023-10-23 11:22:13 +02:00
Raphael S. Carvalho
ea6c281b9f replica: Fix major compaction semantics by performing off-strategy first
Major compaction semantics is that all data of a table will be compacted
together, so user can expect e.g. a recently introduced tombstone to be
compacted with the data it shadows.
Today, it can happen that all data in maintenance set won't be included
for major, until they're promoted into main set by off-strategy.
So user might be left wondering why major is not having the expected
effect.
To fix this, let's perform off-strategy first, so data in maintenance
set will be made available by major. A similar approach is done for
data in memtable, so flush is performed before major starts.
The only exception will be data in staging, which cannot be compacted
until view building is done with it, to avoid inconsistency in view
replicas.
The serialization in comapaction manager of reshape jobs guarantee
correctness if there's an ongoing off-strategy on behalf of the
table.

Fixes #11915.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#15792
2023-10-23 11:32:03 +03:00
Nadav Har'El
e7dd0ec033 test/cql-pytest: reproduce incompatibility with same-name bind marks
This patch adds a reproducer for a minor compatibility between Scylla's
and Cassandra's handling of a prepared statement when a bind marker with
the same name is used more than once, e.g.,
```
SELECT * FROM tbl WHERE p=:x AND c=:x
```
It turns out that Scylla tells the driver that there is only one bind
marker, :x, whereas Cassandra tells the driver that there are two bind
markers, both named :x. This makes no different if the user passes
a map `{'x': 3}`, but if the user passes a tuple, Scylla accepts only
`(3,)` (assigning both bind markers the same value) and Cassandra
accepts only `(3,3)`.

The test added in this patch demonstrates this incompatibility.
It fails on Scylla, passes on Cassandra, and is marked "xfail".

Refs #15559

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#15564
2023-10-23 11:19:15 +03:00
Aleksandra Martyniuk
a1271d2d5c repair: throw more detailed exception
Exception thrown from row_level_repair::run does not show the root
cause of a failure making it harder to debug.

Add the internal exception contents to runtime_error message.

After the change the log will mention the real cause (last line), e.g.:

repair - repair[92db0739-584b-4097-b6e2-e71a66e40325]: 33 out of 132 ranges failed,
keyspace=system_distributed, tables={cdc_streams_descriptions_v2, cdc_generation_timestamps,
view_build_status, service_levels}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=false,
failed_because=seastar::nested_exception: std::runtime_error (Failed to repair for keyspace=system_distributed,
cf=cdc_streams_descriptions_v2, range=(8720988750842579417,+inf))
(while cleaning up after seastar::abort_requested_exception (abort requested))

Closes scylladb/scylladb#15770
2023-10-23 11:15:25 +03:00
Botond Dénes
950a1ff22c Merge 'doc: improve the docs for handling failures' from Anna Stuchlik
This PR improves the way of how handling failures is documented and accessible to the user.
- The Handling Failures section is moved from Raft to Troubleshooting.
- Two new topics about failure are added to Troubleshooting with a link to the Handling Failures page (Failure to Add, Remove, or Replace a Node, Failure to Update the Schema).
- A note is added to the add/remove/replace node procedures to indicate that a quorum is required.

See individual commits for more details.

Fixes https://github.com/scylladb/scylladb/issues/13149

Closes scylladb/scylladb#15628

* github.com:scylladb/scylladb:
  doc: add a note about Raft
  doc: add the quorum requirement to procedures
  doc: add more failure info to Troubleshooting
  doc: move Handling Failures to Troubleshooting
2023-10-23 11:09:28 +03:00
Kefu Chai
5a17a02abb build: cmake: add -ffile-prefix-map option
this mirrors what we already have in configure.py.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15798
2023-10-23 10:26:21 +03:00
Botond Dénes
940c2d1138 Merge 'build: cmake: use add_compile_options() and add_link_options() when appropriate ' from Kefu Chai
instead of appending the options to the CMake variables, use the command to do this. simpler this way. and the bonus is that the options are de-duplicated.

Closes scylladb/scylladb#15797

* github.com:scylladb/scylladb:
  build: cmake: use add_link_options() when appropriate
  build: cmake: use add_compile_options() when appropriate
2023-10-23 09:58:10 +03:00
Botond Dénes
c960c2cdbf Merge 'build: extract code fragments into functions' from Kefu Chai
this series is one of the steps to remove global statements in `configure.py`.

not only the script is more structured this way, this also allows us to quickly identify the part which should/can be reused when migrating to CMake based building system.

Refs #15379

Closes scylladb/scylladb#15780

* github.com:scylladb/scylladb:
  build: update modeval using a dict
  build: pass args.test_repeat and args.test_timeout explicitly
  build: pull in jsoncpp using "pkgs"
  build: build: extract code fragments into functions
2023-10-23 09:42:37 +03:00
Kefu Chai
0080b15939 build: cmake: use add_link_options() when appropriate
instead of appending to CMAKE_EXE_LINKER_FLAGS*, use
add_link_options() to add more options. as CMAKE_EXE_LINKER_FLAGS*
is a string, and typically set by user, let's use add_link_options()
instead.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-23 12:06:42 +08:00
Kefu Chai
686adec52e build: cmake: use add_compile_options() when appropriate
instead of appending to CMAKE_CXX_FLAGS, use add_compile_options()
to add more options. as CMAKE_CXX_FLAGS is a string, and typically
set by user, let's use add_compile_options() instead, the options
added by this command will be added before CMAKE_CXX_FLAGS, and
will have lower priority.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-23 12:06:42 +08:00
Kefu Chai
8756838b16 test/pylib: remove duplicated imports
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-23 10:36:05 +08:00
Kefu Chai
6b84bc50c3 test/pylib: extract the env variable printing into MinIoServer
less repeatings this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-23 10:36:05 +08:00
Kefu Chai
02cad8f85b test/pylib: extract _set_environ() out
will add _unset_environ() later. extracting this helper out helps
with the readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-23 10:36:05 +08:00
Kefu Chai
b36cef6f1a sstable: remove _remote_prefix from s3_storage
since we use the sstable.generation() for the remote prefix of
the key of the object for storing the sstable component, there is
no need to set remote_prefix beforehand.

since `s3_storage::ensure_remote_prefix()` and
`system_kesypace::sstables_registry_lookup_entry()` are not used
anymore, they are removed.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-23 10:08:22 +08:00
Kefu Chai
af8bc8ba63 sstable: switch to uuid identifier for naming S3 sstable objects
before this change, we create a new UUID for a new sstable managed
by the s3_storage, and we use the string representation of UUID
defined by RFC4122 like "0aa490de-7a85-46e2-8f90-38b8f496d53b" for
naming the objects stored on s3_storage. but this representation is
not what we are using for storing sstables on local filesystem when
the option of "uuid_sstable_identifiers_enabled" is enabled. instead,
we are using a base36-based representation which is shorter.

to be consistent with the naming of the sstables created for local
filesystem, and more importantly, to simplify the interaction between
the local copy of sstables and those stored on object storage, we should
use the same string representation of the sstable identifier.

so, in this change:

1. instead of creating a new UUID, just reuse the generation of the
   sstable for the object's key.
2. do not store the uuid in the sstable_registry system table. As
   we already have the generation of the sstable for the same purpose.
3. switch the sstable identifier representation from the one defined
   by the RFC4122 (implemented by fmt::formatter<utils::UUID>) to the
   base36-based one (implemented by
   fmt::formatter<sstables::generation_type>)
4. enable the `uuid_sstable_identifers` cluster feature if it is
   enabled in the `test_env_config`, so that it the sstable manager
   can enable the uuid-based uuid when creating a new uuid for
   sstable.
5. throw if the generation of sstable is not UUID-based when
   accessing / manipulating an sstable with S3 storage backend. as
   the S3 storage backend now relies on this option. as, otherwise
   we'd have sstables with key like s3://bucket/number/basename, which
   is just unable to serve as a unique id for sstable if the bucket is
   shared across multiple tables.

Fixes #14175
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-23 10:08:22 +08:00
Avi Kivity
f181ac033a Merge 'tools/nodetool: implement additional commands, part 2/N' from Botond Dénes
The following new commands are implemented:
* stop
* compactionhistory

All are associated with tests. All tests (both old and new) pass with both the scylla-native and the cassandra nodetool implementation.

Refs: https://github.com/scylladb/scylladb/issues/15588

Closes scylladb/scylladb#15649

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement compactionhistory command
  tools/scylla-nodetool: implement stop command
  mutation/json: extract generic streaming writer into utils/rjson.hh
  test/nodetool: rest_api_mock.py: add support for error responses
2023-10-21 00:11:42 +03:00
Botond Dénes
19fc01be23 Merge 'Sanitize API -> task_manager dependency' from Pavel Emelyanov
This is the continuation of 8c03eeb85d

Registering API handlers for services need to

* get the service to handle requests via argument, not from http context (http context, in turn, is going not to depend on anything)
* unset the handlers on stop so that the service is not used after it's stopped (and before API server is stopped)

This makes task manager handlers work this way

Closes scylladb/scylladb#15764

* github.com:scylladb/scylladb:
  api: Unset task_manager test API handlers
  api: Unset task_manager API handlers
  api: Remove ctx->task_manager dependency
  api: Use task_manager& argument in test API handlers
  api: Push sharded<task_manager>& down the test API set calls
  api: Use task_manager& argument in API handlers
  api: Push sharded<task_manager>& down the API set calls
2023-10-20 18:07:20 +03:00
Botond Dénes
4b57c2bf18 tools/scylla-nodetool: implement compactionhistory command 2023-10-20 10:55:38 -04:00
Botond Dénes
a212ddc5b1 tools/scylla-nodetool: implement stop command 2023-10-20 10:04:56 -04:00
Botond Dénes
9231454acd mutation/json: extract generic streaming writer into utils/rjson.hh
This writer is generally useful, not just for writing mutations as json.
Make it generally available as well.
2023-10-20 10:04:56 -04:00
Botond Dénes
6db2698786 test/nodetool: rest_api_mock.py: add support for error responses 2023-10-20 10:04:56 -04:00
Kefu Chai
9f62bfa961 build: update modeval using a dict
instead of updating `modes` in with global statements, update it in
a function. for better readablity. and to reduce the statements in
global scope.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-20 21:37:07 +08:00
Botond Dénes
ad90bb8d87 replica/database: remove "streaming" from dirty memory metric description
We don't have streaming memtables for a while now.

Closes scylladb/scylladb#15638
2023-10-20 13:09:57 +03:00
Kefu Chai
c240c70278 build: pass args.test_repeat and args.test_timeout explicitly
for better readability.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-20 16:53:16 +08:00
Kefu Chai
c2cd11a8b3 build: pull in jsoncpp using "pkgs"
this change adds "jsoncpp" dependency using "pkgs". simpler this
way. it also helps to remove more global statements.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-20 16:53:16 +08:00
Kefu Chai
890113a9cf build: build: extract code fragments into functions
this change extract `get_warnings_options()` out. it helps to
remove more global statements.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-20 16:53:16 +08:00
Patryk Jędrzejczak
fbcd667030 replica: keyspace::create_replication_strategy: remove a redundant parameter
The options parameter is redundant. We always use
`_metadata->strategy_options()` and
`keyspace::create_replication_strategy` already assumes that
`_metadata` is set by using its other fields.

Closes scylladb/scylladb#15776
2023-10-20 10:20:49 +03:00
Botond Dénes
460bc7d8e1 test/boost/row_cache_test: test_exception_safety_of_reads: also cover single-partition reads
The test currently only covers scans. Single partition reads have a
different code-path, make sure it is also covered.
2023-10-20 03:16:57 -04:00
Botond Dénes
ffefa623f4 test/lib/memtable_snapshot_source: disable critical alloc section while waiting
memtable_snapshot_source starts a background fiber in its constructor,
which compacts LSA memory in a loop. The loop's inside is covered with a
critical alloc section. It also contains a wait on a condition variable
and in its present form the critical section also covers the wait,
effectively turning off allocation failure injection for any test using
the memtable_snapshot_source.
This patch disables the critical alloc section while the loop waits on
the condition variable.
2023-10-20 03:16:57 -04:00
Botond Dénes
92966d935a row_cache: make_reader_opt(): make make_context() reentrant
Said lambda currently moves the permit parameter, so on the second and
later calls it will possibly run into use-after-move. This can happen if
the allocating section below fails and is re-tried.
2023-10-20 03:16:57 -04:00
Kefu Chai
11d7cadf0d install-dependencies.sh: drop java deps
the java related build dependencies are installed by

* tools/java/install-dependencies.sh
* tools/jmx/install-dependencies.sh

respectively. and the parent `install-dependencies.sh` always
invoke these scripts, so there is no need to repeat them in the
parent `install-dependenceies.sh` anymore.

in addition to dedup the build deps, this change also helps to
reduce the size of build dependencies. as by default, `dnf`
install the weak deps, unless `-setopt=install_weak_deps=False`
is passed to it, so this change also helps to reduce the traffic
and foot print of the installed packages for building scylla.

see also 9dddad27bf

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15473
2023-10-20 09:43:28 +03:00
Kamil Braun
059d647ee5 test/pylib: scylla_cluster: protect ScyllaCluster.stop with a lock
test.py calls `uninstall()` and `stop()` concurrently from exit
artifacts, and `uninstall()` internally calls `stop()`. This leads to
premature releasing of IP addresses from `uninstall()` (returning IPs to
the pool) while the servers using those IPs are still stopping. Then a
server might obtain that IP from the pool and fail to start due to
"Address already in use".

Put a lock around the body of `stop()` to prevent that.

Fixes: scylladb/scylladb#15755

Closes scylladb/scylladb#15763
2023-10-20 09:30:37 +03:00
Kefu Chai
80c656a08b types: use more readable error message when serializing non-ASCII string
before this change, we print

marshaling error: Value not compatible with type org.apache.cassandra.db.marshal.AsciiType: '...'

but the wording is not quite user friendly, it is a mapping of the
underlying implementation, user would have difficulty understanding
"marshaling" and/or "org.apache.cassandra.db.marshal.AsciiType"
when reading this error message.

so, in this change

1. change the error message to:
     Invalid ASCII character in string literal: '...'
   which should be more straightforward, and easier to digest.
2. update the test accordingly

please note, the quoted non-ASCII string is preserved instead of
being printed in hex, as otherwise user would not be able to map it
with his/her input.

Refs #14320
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15678
2023-10-20 09:25:44 +03:00
Pavel Emelyanov
0c69a312db Update seastar submodule
* seastar bab1625c...17183ed4 (73):
  > thread_pool: Reference reactor, not point to
  > sstring: inherit publicly from string_view formatter
  > circleci: use conditional steps
  > weak_ptr: include used header
  > build: disable the -Wunused-* warnings for checkheaders
  > resource: move variable into smaller lexical scope
  > resource: use structured binding when appropriate
  > httpd: Added server and client addresses to request structure
  > io_queue: do not dereference moved-away shared pointer
  > treewide: explicitly define ctor and assignment operator
  > memory: use `err` for the error string
  > doc: Add document describing all the math behind IO scheduler
  > io_queue: Add flow-rate based self slowdown backlink
  > io_queue: Make main throttler uncapped
  > io_queue: Add queue-wide metrics
  > io_queue: Introduce "flow monitor"
  > io_queue: Count total number of dispatched and completed requests so far
  > io_queue: Introduce io_group::io_latency_goal()
  > tests: test the vector overload for when_all_succeed
  > core: add a vector overload to when_all_succeed
  > loop: Fix iterator_range_estimate_vector_capacity for random iters
  > loop: Add test for iterator_range_estimate_vector_capacity
  > core/posix return old behaviour using non-portable pthread_attr_setaffinity_np when present
  > memory: s/throw()/noexcept/
  > build: enable -Wdeprecated compiler option
  > reactor: mark kernel_completion's dtor protected
  > tests: always wait for promise
  > http, json, net: define-generated copy ctor for polymorphic types
  > treewide: do not define constexpr static out-of-line
  > reactor: do not define dtor of kernel_completion
  > http/exception: stop using dynamic exception specification
  > metrics: replace vector with deque
  > metrics: change metadata vector to deque
  > utils/backtrace.hh: make simple_backtrace formattable
  > reactor: Unfriend disk_config_params
  > reactor: Move add_to_flush_poller() to internal namespace
  > reactor: Unfriend a bunch of sched group template calls
  > rpc_test: Test rpc send glitches
  > net: Implement batch flush support for existing sockets
  > iostream: Configure batch flushes if sink can do it
  > net: Added remote address accessors
  > circleci: update the image to CircleCI "standard" image
  > build: do not add header check target if no headers to check
  > build: pass target name to seastar_check_self_contained
  > build: detect glibc features using CMake
  > build: extract bits checking libc into CheckLibc.cmake
  > http/exception: add formatter for httpd::base_exception
  > http/client: Mark write_body() const
  > http/client: Introduce request::_bytes_written
  > http/client: Mark maybe_wait_for_continue() const
  > http/client: Mark send_request_head() const
  > http/client: Detach setup_request()
  > http/api_docs: copy in api_docs's copy constructor
  > script: do not inherit from object
  > scripts: addr2line: change StdinBacktraceIterator to a function
  > scripts: addr2line: use yield instead defining a class
  > tests: skip tests that require backtrace if execinfo.h is not found
  > backtrace: check for existence of execinfo.h
  > core: use ino_t and off_t as glibc sets these to 64bit if 64bit api is used
  > core: add sleep_abortable instantiation for manual_clock
  > tls: Return EPIPE exception when writing to shutdown socket
  > http/client: Don't cache connection if server advertises it
  > http/client: Mark connection as "keep in cache"
  > core: fix strerror_r usage from glibc extension
  > reactor: access sigevent.sigev_notify_thread_id with a macro
  > posix: use pthread_setaffinity_np instead of pthread_attr_setaffinity_np
  > reactor: replace __mode_t with mode_t
  > reactor: change sys/poll.h to posix poll.h
  > rpc: Add unit test for per-domain metrics
  > rpc: Report client connections metrics
  > rpc: Count dead client stats
  > rpc: Add seastar::rpc::metrics
  > rpc: Make public queues length getters

io-scheduler fixes
refs: #15312
refs: #11805

http client fixes
refs: #13736
refs: #15509

rpc fixes
refs: #15462

Closes scylladb/scylladb#15774
2023-10-19 20:52:37 +03:00
Tomasz Grabiec
899ecaffcd test: tablets: Enable for verbose logging in test_tablet_metadata_propagates_with_schema_changes_in_snapshot_mode
To help diagnose #14746 where we experience timeouts due to connection
dropping.

Closes scylladb/scylladb#15773
2023-10-19 16:58:53 +03:00
Raphael S. Carvalho
fded314e46 sstables: Fix update of tombstone GC settings to have immediate effect
After "repair: Get rid of the gc_grace_seconds", the sstable's schema (mode,
gc period if applicable, etc) is used to estimate the amount of droppable
data (or determine full expiration = max_deletion_time < gc_before).
It could happen that the user switched from timeout to repair mode, but
sstables will still use the old mode, despite the user asked for a new one.
Another example is when you play with value of grace period, to prevent
data resurrection if repair won't be able to run in a timely manner.
The problem persists until all sstables using old GC settings are recompacted
or node is restarted.
To fix this, we have to feed latest schema into sstable procedures used
for expiration purposes.

Fixes #15643.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#15746
2023-10-19 16:27:59 +03:00
Kefu Chai
a6e68d8309 build: cmake: move message/* into message/CMakeLists.txt
messaging_service.cc depends on idl, but many source files in
scylla-main do no depend on idl, so let's

* move "message/*" into its own directory and add an inter-library
  dependency between it and the "idl" library.
* rename the target of "message" under test/manual to "message_test"
  to avoid the name collision

this should address the compilation failure of
```
FAILED: CMakeFiles/scylla-main.dir/message/messaging_service.cc.o
/usr/bin/clang++ -DBOOST_NO_CXX98_FUNCTION_BASE -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_BROKEN_SOURCE_LOCATION -DSEASTAR_DEBUG -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/cmake/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/cmake/seastar/gen/include -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-mismatched-tags -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -Wno-missing-field-initializers -Wno-deprecated-copy -Wno-ignored-qualifiers -march=westmere  -Og -g -gz -std=gnu++20 -fvisibility=hidden -U_FORTIFY_SOURCE -Wno-error=unused-result "-Wno-error=#warnings" -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT CMakeFiles/scylla-main.dir/message/messaging_service.cc.o -MF CMakeFiles/scylla-main.dir/message/messaging_service.cc.o.d -o CMakeFiles/scylla-main.dir/message/messaging_service.cc.o -c /home/kefu/dev/scylladb/message/messaging_service.cc
/home/kefu/dev/scylladb/message/messaging_service.cc:81:10: fatal error: 'idl/join_node.dist.hh' file not found
         ^~~~~~~~~~~~~~~~~~~~~~~
```
where the compiler failed to find the included `idl/join_node.dist.hh`,
which is exposed by the idl library as part of its public interface.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15657
2023-10-19 13:33:29 +03:00
Botond Dénes
60145d9526 Merge 'build: extract code fragments into functions' from Kefu Chai
this series is one of the steps to remove global statements in `configure.py`.

not only the script is more structured this way, this also allows us to quickly identify the part which should/can be reused when migrating to CMake based building system.

Refs #15379

Closes scylladb/scylladb#15668

* github.com:scylladb/scylladb:
  build: move check for NIX_CC into dynamic_linker_option()
  build: extract dynamic_linker_option(): out
  build: move `headers` into write_build_file()
2023-10-19 13:31:33 +03:00
Avi Kivity
39966e0eb1 Merge 'build: cmake: pass -dynamic-linker to ld' from Kefu Chai
to match the behavior of `configure.py`.

Closes scylladb/scylladb#15667

* github.com:scylladb/scylladb:
  build: cmake: pass -dynamic-linker to ld
  build: cmake: set CMAKE_EXE_LINKER_FLAGS in mode.common.cmake
2023-10-19 13:15:47 +03:00
Jan Ciolek
c256cca6f1 cql3/expr: add more comments in expression.hh
`expression` is a std::variant with 16 different variants
that represent different types of AST nodes.

Let's add documentation that explains what each of these
16 types represents. For people who are not familiar with expression
code it might not be clear what each of them does, so let's add
clear descriptions for all of them.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes scylladb/scylladb#15767
2023-10-19 10:56:38 +03:00
Kefu Chai
b105be220b build: cmake: add join_node.idl.hh to CMake
we add a new verb in 7cbe5e3af8, so
let's update the CMake-based building system accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15658
2023-10-19 10:19:16 +03:00
Nikita Kurashkin
2a7932efa1 alternator: fix DeleteTable return values to match DynamoDB's
It seems that Scylla has more values returned by DeleteTable operation than DynamoDB.
In this patch I added a table status check when generating output.
If we delete the table, values KeySchema, AttributeDefinitions and CreationDateTime won't be returned.
The test has also been modified to check that these attributes are not returned.

Fixes scylladb#14132

Closes scylladb/scylladb#15707
2023-10-19 09:34:16 +03:00
Pavel Emelyanov
ec94cc9538 Merge 'test: set use_uuid to true by default in sstables::test_env ' from Kefu Chai
this series

1. let sstable tests using test_env to use uuid-based sstable identifiers by default
2. let the test who requires integer-based identifier keep using it

this should enable us to perform the s3 related test after enforcing the uuid-based identifier for s3 backend, otherwise the s3 related test would fail as it also utilize `test_env`.

Closes scylladb/scylladb#14553

* github.com:scylladb/scylladb:
  test: set use_uuid to true by default in sstables::test_env
  test: enable test to set uuid_sstable_identifiers
2023-10-19 09:09:38 +03:00
Pavel Emelyanov
0981661f8b api: Unset task_manager test API handlers
So that the task_manager reference is not used when it shouldn't on stop

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-18 18:56:24 +03:00
Pavel Emelyanov
2d543af78e api: Unset task_manager API handlers
So that the task_manager reference is not used when it shouldn't on stop

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-18 18:56:01 +03:00
Pavel Emelyanov
0632ad50f3 api: Remove ctx->task_manager dependency
Now the task manager's API (and test API) use the argument and this
explicit dependency is no longer required

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-18 18:55:27 +03:00
Pavel Emelyanov
572c880d97 api: Use task_manager& argument in test API handlers
Now it's there and can be used. This will allow removing the
ctx->task_manager dependency soon

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-18 18:55:13 +03:00
Pavel Emelyanov
0396ce7977 api: Push sharded<task_manager>& down the test API set calls
This is to make it possible to use this reference instead of the ctx.tm
one by the next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-18 18:54:53 +03:00
Pavel Emelyanov
ef1d2b2c86 api: Use task_manager& argument in API handlers
Now it's there and can be used. This will allow removing the
ctx->task_manager dependency soon

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-18 18:54:24 +03:00
Pavel Emelyanov
14e10e7db4 api: Push sharded<task_manager>& down the API set calls
This is to make it possible to use this reference instead of the ctx.tm
one by the next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-18 18:52:46 +03:00
Avi Kivity
7d5e22b43b replica: memtable: don't forget memtable memory allocation statistics
A memtable object contains two logalloc::allocating_section members
that track memory allocation requirements during reads and writes.
Because these are local to the memtable, each time we seal a memtable
and create a new one, these statistics are forgotten. As a result
we may have to re-learn the typical size of reads and writes, incurring
a small performance penalty.

The solution is to move the allocating_section object to the memtable_list
container. The workload is the same across all memtables of the same
table, so we don't lose discrimination here.

The performance penalty may be increased later if log changes to
memory reserve thresholds including a backtrace, so this reduces the
odds of incurring such a penalty.

Closes scylladb/scylladb#15737
2023-10-18 17:43:33 +02:00
Kefu Chai
c8cb70918b sstable: drop unused parse() overload for deletion_time
`deletion_time` is a part of the `partition_header`, which is in turn
a part of `partition`. and `data_file` is a sequence of `partition`.
`data_file` represents *-Data.db component of an SSTable.
see docs/architecture/sstable3/sstables-3-data-file-format.rst.
we always parse the data component via `flat_mutation_reader_v2`, which is in turn
implemented with mx/reader.cc or kl/reader.cc depending on
the version of SSTable to be read.

in other words, we decode `deletion_time` in mx/reader.cc or
kl/reader.cc, not in sstable.cc. so let's drop the overload
parse() for deletion_time. it's not necessary and more importantly,
confusing.

Refs #15116
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15756
2023-10-18 18:41:56 +03:00
Avi Kivity
f3dc01c85e Merge 'Enlight sstable_directory construction' from Pavel Emelyanov
Currently distributed_loader starts sharded<sstable_directory> with four sharded parameters. That's quite bulky and can be made much shorter.

Closes scylladb/scylladb#15653

* github.com:scylladb/scylladb:
  distributed_loader: Remove explicit sharded<erms>
  distributed_loader: Brush up start_subdir()
  sstable_directory: Add enlightened construction
  table: Add global_table_ptr::as_sharded_parameter()
2023-10-18 16:42:04 +03:00
Anna Stuchlik
274cf7a93a doc:remove upgrade guides for unsupported versions
This commit:
- Removes upgrade guides for versions older than 5.0.
  The oldest one is from version 4.6 to 5.0.
- Adds the redirections for the removed pages.

Closes scylladb/scylladb#15709
2023-10-18 15:12:26 +03:00
Kefu Chai
f69a44bb37 test/object_store: redirect to STDOUT and STDERR
pytest changes the test's sys.stdout and sys.stderr to the
captured fds when it captures the outputs of the test. so we
are not able to get the STDOUT_FILENO and STDERR_FILENO in C
by querying `sys.stdout.fileno()` and `sys.stderr.fileno()`.
their return values are not 1 and 2 anymore, unless pytest
is started with "-s".

so, to ensure that we always redirect the child process's
outputs to the log file. we need to use 1 and 2 for accessing
the well-known fds, which are the ones used by the child
process, when it writes to stdout and stderr.

this change should address the problem that the log file is
always empty, unless "-s" is specified.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15560
2023-10-18 14:54:01 +03:00
Yaron Kaikov
b340bd6d9e release: prepare for 5.5.0-dev 2023-10-18 14:40:06 +03:00
Botond Dénes
f7e269ccb8 Merge 'Progress of compaction executors' from Aleksandra Martyniuk
compaction_read_monitor_generator is an existing mechanism
for monitoring progress of sstables reading during compaction.
In this change information gathered by compaction_read_monitor_generator
is utilized by task manager compaction tasks of the lowest level,
i.e. compaction executors, to calculate task progress.

compaction_read_monitor_generator has a flag, which decides whether
monitored changes will be registered by compaction_backlog_tracker.
This allows us to pass the generator to all compaction readers without
impacting the backlog.

Task executors have access to compaction_read_monitor_generator_wrapper,
which protects the internals of  compaction_read_monitor_generator
and provides only the necessary functionality.

Closes scylladb/scylladb#14878

* github.com:scylladb/scylladb:
  compaction: add get_progress method to compaction_task_impl
  compaction: find total compaction size
  compaction: sstables: monitor validation scrub with compaction_read_generator
  compaction: keep compaction_progress_monitor in compaction_task_executor
  compaction: use read monitor generator for all compactions
  compaction: add compaction_progress_monitor
  compaction: add flag to compaction_read_monitor_generator
2023-10-18 12:19:51 +03:00
Kamil Braun
c1486fee40 Merge 'commitlog: drop truncation_records after replay' from Petr Gusev
This is a follow-up for #15279 and it fixes two problems.

First, we restore flushes on writes for the tables that were switched to the schema commitlog if `SCHEMA_COMMITLOG` feature is not yet enabled. Otherwise durability is not guaranteed.

Second, we address the problem with truncation records, which could refer to the old commitlog if any of the switched tables were truncated in the past. If the node crashes later, and we replay schema commitlog, we may skip some mutations since their `replay_position`s will be smaller than the `replay_position`s stored for the old commitlog in the `truncated` table.

It turned out that this problem exists even if we don't switch commitlogs for tables. If the node was rebooted the segment ids will start from some small number - they use `steady_clock` which is usually bound to boot time. This means that if the node crashed we may skip the mutations because their RPs will be smaller than the last truncation record RP.

To address this problem we delete truncation records as soon as commitlog is replayed. We also include a test which demonstrates the problem.

Fixes #15354

Closes scylladb/scylladb#15532

* github.com:scylladb/scylladb:
  add test_commitlog
  system.truncated: Remove replay_position data from truncated on start
  main.cc: flush only local memtables when replaying schema commitlog
  main.cc: drop redundant supervisor::notify
  system_keyspace: flush if schema commitlog is not available
2023-10-18 11:14:31 +02:00
Gleb Natapov
f80fff3484 gossip: remove unused STATUS_LEAVING gossiper status
The status is no longer used. The function that referenced it was
removed by 5a96751534 and it was unused
back then for awhile already.

Message-Id: <ZS92mcGE9Ke5DfXB@scylladb.com>
2023-10-18 11:13:14 +02:00
Botond Dénes
7f81957437 Merge 'Initialize datadir for system and non-system keyspaces the same way' from Pavel Emelyanov
When populating system keyspace the sstable_directory forgets to create upload/ subdir in the tables' datadir because of the way it's invoked from distributed loader. For non-system keyspaces directories are created in table::init_storage() which is self-contained and just creates the whole layout regardless of what.

This PR makes system keyspace's tables use table::init_storage() as well so that the datadir layout is the same for all on-disk tables.

Test included.

fixes: #15708
closes: scylladb/scylla-manager#3603

Closes scylladb/scylladb#15723

* github.com:scylladb/scylladb:
  test: Add test for datadir/ layout
  sstable_directory: Indentation fix after previous patch
  db,sstables: Move storage init for system keyspace to table creation
2023-10-18 12:12:19 +03:00
David Garcia
51466dcb23 docs: add latest option to aws_images extension
rollback only latest

Closes scylladb/scylladb#15651
2023-10-18 11:43:21 +03:00
Petr Gusev
a0aee54f2c add test_commitlog
Check that commitlog provides durability in case
of a node reboot:
* truncate table T, truncation_record RP=1000;
* clean shutdown node/reboot machine/restart node, now RP=~0
since segment ids count from boot time;
* write some data to T; crash/restart
* check data is retained
2023-10-17 18:16:50 +04:00
Calle Wilund
6fbd210679 system.truncated: Remove replay_position data from truncated on start
Once we've started clean, and all replaying is done, truncation logs
commit log regarding replay positions are invalid. We should exorcise
them as soon as possible. Note that we cannot remove truncation data
completely though, since the time stamps stored are used by things like
batch log to determine if it should use or discard old batch data.
2023-10-17 18:16:48 +04:00
Petr Gusev
dde36b5d9d main.cc: flush only local memtables when replaying schema commitlog
Schema commitlog can be used only on shard 0, so it's redundant
to flush any other memtables.
2023-10-17 18:15:51 +04:00
Petr Gusev
54dd7cf1da main.cc: drop redundant supervisor::notify
Later in the code we have 'replaying schema commit log',
which duplicates this one. Also,
maybe_init_schema_commitlog may skip schema commitlog
initialization if the SCHEMA_COMMITLOG feature is
not yet supported by the cluster, so this notification
can be misleading.
2023-10-17 18:15:49 +04:00
Petr Gusev
c89ead55ff system_keyspace: flush if schema commitlog is not available
In PR #15279 we removed flushes when writing to a number
of tables from the system keyspace. This was made possible
by switching these tables to the schema commitlog.
Schema commitlog is enabled only when the SCHEMA_COMMITLOG
feature is supported by all nodes in the cluster. Before that
these tables will use the regular commitlog, which is not
durable because it uses db::commitlog::sync_mode::PERIODIC. This
means that we may lose data if a node crashes during upgrade
to the version with schema commitlog.

In this commit we fix this problem by restoring flushes
after writes to the tables if the schema commitlog
is not enabled yet.

The patch also contains a test that demonstrates the
problem. We need flush_schema_tables_after_modification
option since otherwise schema changes are not durable
and node fails after restart.
2023-10-17 18:14:27 +04:00
Pavel Emelyanov
d59cd662f8 test: Add test for datadir/ layout
The test checks that

- for non-system keyspace datadir and its staging/ and upload/ subdirs
  are created when the table is created _and_ that the directory is
  re-populated on boot in case it was explicitly removed

- for system non-virtual tables it checks that the same directory layout
  is created on boot

- for system virtual tables it checks that the directory layout doesn't
  exist

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-16 16:26:48 +03:00
Pavel Emelyanov
c3b3e5b107 sstable_directory: Indentation fix after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-16 16:26:37 +03:00
Pavel Emelyanov
059d7c795e db,sstables: Move storage init for system keyspace to table creation
User and system keyspaces are created and populated slightly
differently.

System keyspace is created via system_keyspace::make() which eventually
calls calls add_column_family(). Then it's populated via
init_system_keyspace() which calls sstable_directory::prepare() which,
in turn, optionally creates directories in datadir/ or checks the
directory permissions if it exists

User keyspaces are created with the help of
add_column_family_and_make_directory() call which calls the
add_column_family() mentioned above _and_ calls table::init_storage() to
create directories. When it's populated with init_non_system_keyspaces()
it also calls sstable_directory::prepare() which notices that the
directory exists and then checks the permissions.

As a result, sstable_directory::prepare() initializes storage for system
keyspace only and there's a BUG (#15708) that the upload/ subdir is not
created.

This patch makes the directories creation for _all_ keyspaces with the
table::init_storage(). The change only touches system keyspace by moving
the creation of directories from sstable_directory::prepare() into
system_keyspace::make().

Indentation is deliberately left broken.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-16 16:19:25 +03:00
Patryk Jędrzejczak
7810e8d860 table_helper: announce twice in setup_keyspace
We refactor table_helper::setup_keyspace so that it calls
migration_manager::announce at most twice. We achieve it by
announcing all tables at once.

The number of announcements should further be reduced to one, but
it requires a big refactor. The CQL code used in
parse_new_cf_statement assumes the keyspace has already been
created. We cannot have such an assumption if we want to announce
a keyspace and its tables together. However, we shouldn't touch
the CQL code as it would impact user requests, too.

One solution is using schema_builder instead of the CQL statements
to create tables in table_helper.

Another approach is removing table_helper completely. It is used
only for the system_traces keyspace, which Scylla creates
automatically. We could refactor the way Scylla handles this
keyspace and make table_helper unneeded.
2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak
2b4e1e0f9c table_helper: refactor setup_table
In the following commit, we reduce migration_manager::announce
calls in table_helper::setup_keyspace by announcing all tables
together. To do it, we cannot use table_helper::setup_table
anymore, which announces a single table itself. However, the new
code still has to translate CQL statements, so we extract it to the
new parse_new_cf_statement function to avoid duplication.
2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak
fad71029f0 redis: create_keyspace_if_not_exists_impl: fix indentation
Broken in the previous commit.
2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak
a3044d1f46 redis: announce once in create_keyspace_if_not_exists_impl
We refactor create_keyspace_if_not_exists_impl so that it takes at
most one group 0 guard and calls migration_manager::announce at
most once.
2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak
98d067e77d db: system_distributed_keyspace: fix indentation
Broken in the previous commit.
2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak
5ebc0e8617 db: system_distributed_keyspace: announce once in start
We refactor system_distributed_keyspace::start so that it takes at
most one group 0 guard and calls migration_manager::announce at
most once.

We remove a catch expression together with the FIXME from
get_updated_service_levels (add_new_columns_if_missing before the
patch) because we cannot treat the service_levels update
differently anymore.
2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak
449b4c79c2 tablet_allocator: update on_before_create_column_family
After adding the keyspace_metadata parameter to
migration_listener::on_before_create_column_family,
tablet_allocator doesn't need to load it from the database.

This change is necessary before merging migration_manager::announce
calls in the following commit.
2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak
7653059369 migration_listener: add parameter to on_before_create_column_family
After adding the new prepare_new_column_family_announcement that
doesn't assume the existence of a keyspace, we also need to get
rid of the same assumption in all on_before_create_column_family
calls. After all, they may be initiated before creating the
keyspace. However, some listeners require keyspace_metadata, so we
pass it as a new parameter.
2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak
96d9e768c4 alternator: executor: use new prepare_new_column_family_announcement
We can use the new prepare_new_column_family_announcement function
that doesn't assume the existence of the keyspace instead of the
previous work-around.
2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak
fcd092473c alternator: executor: introduce create_keyspace_metadata
We need to store a new keyspace's keyspace_metadata as a local
variable in create_table_on_shard0. In the following commit, we
use it to call the new prepare_new_column_family_announcement
function.
2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak
7e6017d62d migration_manager: add new prepare_new_column_family_announcement
In the following commits, we reduce the number of the
migration_manager::anounce calls by merging some of them in a way
that logically makes sense. Some of these merges are similar --
we announce a new keyspace and its tables together. However,
we cannot use the current prepare_new_column_family_announcement
there because it assumes that the keyspace has already been created
(when it loads the keyspace from the database). Luckily, this
assumption is not necessary as this function only needs
keyspace_metadata. Instead of loading it from the database, we can
pass it as a parameter.
2023-10-16 14:59:53 +02:00
Aleksandra Martyniuk
198119f737 compaction: add get_progress method to compaction_task_impl
compaction_task_impl::get_progress is used by the lowest level
compaction tasks which progress can be taken from
compaction_progress_monitor.
2023-10-12 17:16:05 +02:00
Aleksandra Martyniuk
39e96c6521 compaction: find total compaction size 2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk
7b3e0ab1f2 compaction: sstables: monitor validation scrub with compaction_read_generator
Validation scrub bypasses the usual compaction machinery, though it
still needs to be tracked with compaction_progress_monitor so that
we could reach its progress from compaction task executor.

Track sstable scrub in validate mode with read monitors.
2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk
3553556708 compaction: keep compaction_progress_monitor in compaction_task_executor
Keep compaction_progress_monitor in compaction_task_executor and pass a reference
to it further, so that the compaction progress could be retrieved out of it.
2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk
37da5a0638 compaction: use read monitor generator for all compactions
Compaction read monitor generators are used in all compaction types.
Classes which did not use _monitor_generator so far, create it with
_use_backlog_tracker set to no, not to impact backlog tracker.
2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk
22bf3c03df compaction: add compaction_progress_monitor
In the following patches compaction_read_monitor_generator will be used
to find progress of compaction_task_executor's. To avoid unnecessary life
prolongation and exposing internals of the class out of compaction.cc,
compaction_progress_monitor is created.

Compaction class keeps a reference to the compaction_progress_monitor.
Inheriting classes which actually use compaction_read_monitor_generator,
need to set it with set_generator method.
2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk
b852ad25bf compaction: add flag to compaction_read_monitor_generator
Following patches will use compaction_read_monitor_generator
to track progress of all types of compaction. Some of them should
not be registered in compaction_backlog_tracker.

_use_backlog_tracker flag, which is by default set to true, is
added to compaction_read_monitor_generator and passed to all
compaction_read_monitors created by this generator.
2023-10-12 17:03:46 +02:00
Kefu Chai
e76a02abc5 build: move check for NIX_CC into dynamic_linker_option()
`employ_ld_trickery` is only used by `dynamic_linker_option()`, so
move it into this function.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-09 11:11:57 +08:00
Kefu Chai
e85fc9f8be build: extract dynamic_linker_option(): out
this change helps to remove more global statements.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-09 11:11:57 +08:00
Kefu Chai
21b61e8f0a build: move headers into write_build_file()
`headers` is only used in this function, so move it closer to where
it is used.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-09 11:11:57 +08:00
Kefu Chai
b3e5c8c348 build: cmake: pass -dynamic-linker to ld
to match the behavior of `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-09 11:07:13 +08:00
Kefu Chai
ce46f7b91b build: cmake: set CMAKE_EXE_LINKER_FLAGS in mode.common.cmake
so that CMakeLists.txt is less cluttered. as we will append
`--dynamic-linker` option to the LDFLAGS.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-09 11:07:13 +08:00
Kefu Chai
1efd0d9a92 test: set use_uuid to true by default in sstables::test_env
for better coverage of uuid-based sstable identifier. since this
option is enabled by default, this also match our tests with the
default behavior of scylladb.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-07 18:56:47 +08:00
Kefu Chai
50c8619ed9 test: enable test to set uuid_sstable_identifiers
some of the tests are still relying on the integer-based sstable
identifier, so let's add a method to test_env, so that the tests
relying on this can opt-out. we will change the default setting
of sstables::test_env to use uuid-base sstable identifier in the
next commit. this change does not change the existing behavior.
it just adds a new knob to test_env_config. and let the tests
relying on this to customize the test_env_config to disable
use_uuid.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-07 18:56:47 +08:00
Anna Stuchlik
5d3584faa5 doc: add a note about Raft
This commit adds a note to specify
that the information on the Handling
Failures page only refers to clusters
with Raft enabled.
Also, the comment is included to remove
the note in future versions.
2023-10-06 16:04:43 +02:00
Pavel Emelyanov
e485c854b2 distributed_loader: Remove explicit sharded<erms>
The sharded replication map was needed to provide sharded for sstable
directory. Now it gets sharded via table reference and thus the erms
thing becomes unused

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-06 15:57:45 +03:00
Pavel Emelyanov
c2eb1ae543 distributed_loader: Brush up start_subdir()
Drop some local references to class members and line-up arguments to
starting distributed sstable directory. Purely a clean up patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-06 15:57:03 +03:00
Pavel Emelyanov
795dcf2ead sstable_directory: Add enlightened construction
The existing constructor is pretty heavyweight for the distributed
loader to use -- it needs to pass it 4 sharded parameters which looks
pretty bulky in the text editor. However, 5 constructor arguments are
obtained directly from the table, so the dist. loader code with global
table pointer at hand can pass _it_ as sharded parameter and let the
sstable directory extract what it needs.

Sad news is that sstable_directory cannot be switched to just use table
reference. Tools code doesn't have table at hand, but needs the
facilities sstable_directory provides

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-06 15:54:51 +03:00
Pavel Emelyanov
e004469827 table: Add global_table_ptr::as_sharded_parameter()
The method returns seastar::sharded_parameter<> for the global table
that evaluates into local table reference

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-06 15:53:57 +03:00
Anna Stuchlik
eb5a9c535a doc: add the quorum requirement to procedures
This commit adds a note to the docs for
cluster management that a quorum is
required to add, remove, or replace a node,
and update the schema.
2023-10-04 13:16:21 +02:00
Anna Stuchlik
bf25b5fe76 doc: add more failure info to Troubleshooting
This commit adds new pages with reference to
Handling Node Failures to Troubleshooting.
The pages are:
- Failure to Add, Remove, or Replace a Node
  (in the Cluster section)
- Failure to Update the Schema
  (in the Data Modeling section)
2023-10-04 12:44:26 +02:00
Anna Stuchlik
8c4f9379d5 doc: move Handling Failures to Troubleshooting
This commit moves the content of the Handling
Failures section on the Raft page to the new
Handling Node Failures page in the Troubleshooting
section.

Background:
When Raft was experimental, the Handling Failures
section was only applicable to clusters
where Raft was explicitly enabled.
Now that Raft is the default, the information
about handling failures is relevant to
all users.
2023-10-04 12:23:33 +02:00
427 changed files with 6931 additions and 9419 deletions

View File

@@ -1,87 +0,0 @@
from github import Github
import argparse
import re
import sys
import os
try:
github_token = os.environ["GITHUB_TOKEN"]
except KeyError:
print("Please set the 'GITHUB_TOKEN' environment variable")
sys.exit(1)
def parser():
parser = argparse.ArgumentParser()
parser.add_argument('--repository', type=str, required=True,
help='Github repository name (e.g., scylladb/scylladb)')
parser.add_argument('--commit_before_merge', type=str, required=True, help='Git commit ID to start labeling from ('
'newest commit).')
parser.add_argument('--commit_after_merge', type=str, required=True,
help='Git commit ID to end labeling at (oldest '
'commit, exclusive).')
parser.add_argument('--update_issue', type=bool, default=False, help='Set True to update issues when backport was '
'done')
parser.add_argument('--ref', type=str, required=True, help='PR target branch')
return parser.parse_args()
def add_comment_and_close_pr(pr, comment):
if pr.state == 'open':
pr.create_issue_comment(comment)
pr.edit(state="closed")
def mark_backport_done(repo, ref_pr_number, branch):
pr = repo.get_pull(int(ref_pr_number))
label_to_remove = f'backport/{branch}'
label_to_add = f'{label_to_remove}-done'
current_labels = [label.name for label in pr.get_labels()]
if label_to_remove in current_labels:
pr.remove_from_labels(label_to_remove)
if label_to_add not in current_labels:
pr.add_to_labels(label_to_add)
def main():
# This script is triggered by a push event to either the master branch or a branch named branch-x.y (where x and y represent version numbers). Based on the pushed branch, the script performs the following actions:
# - When ref branch is `master`, it will add the `promoted-to-master` label, which we need later for the auto backport process
# - When ref branch is `branch-x.y` (which means we backported a patch), it will replace in the original PR the `backport/x.y` label with `backport/x.y-done` and will close the backport PR (Since GitHub close only the one referring to default branch)
args = parser()
pr_pattern = re.compile(r'Closes .*#([0-9]+)')
target_branch = re.search(r'branch-(\d+\.\d+)', args.ref)
g = Github(github_token)
repo = g.get_repo(args.repository, lazy=False)
commits = repo.compare(head=args.commit_after_merge, base=args.commit_before_merge)
processed_prs = set()
# Print commit information
for commit in commits.commits:
print(f'Commit sha is: {commit.sha}')
match = pr_pattern.search(commit.commit.message)
if match:
pr_number = int(match.group(1))
if pr_number in processed_prs:
continue
if target_branch:
pr = repo.get_pull(pr_number)
branch_name = target_branch[1]
refs_pr = re.findall(r'Refs (?:#|https.*?)(\d+)', pr.body)
if refs_pr:
print(f'branch-{target_branch.group(1)}, pr number is: {pr_number}')
# 1. change the backport label of the parent PR to note that
# we've merge the corresponding backport PR
# 2. close the backport PR and leave a comment on it to note
# that it has been merged with a certain git commit,
ref_pr_number = refs_pr[0]
mark_backport_done(repo, ref_pr_number, branch_name)
comment = f'Closed via {commit.sha}'
add_comment_and_close_pr(pr, comment)
else:
print(f'master branch, pr number is: {pr_number}')
pr = repo.get_pull(pr_number)
pr.add_to_labels('promoted-to-master')
processed_prs.add(pr_number)
if __name__ == "__main__":
main()

View File

@@ -1,36 +0,0 @@
name: Check if commits are promoted
on:
push:
branches:
- master
- branch-*.*
env:
DEFAULT_BRANCH: 'master'
jobs:
check-commit:
runs-on: ubuntu-latest
permissions:
pull-requests: write
issues: write
steps:
- name: Dump GitHub context
env:
GITHUB_CONTEXT: ${{ toJson(github) }}
run: echo "$GITHUB_CONTEXT"
- name: Checkout repository
uses: actions/checkout@v4
with:
repository: ${{ github.repository }}
ref: ${{ env.DEFAULT_BRANCH }}
fetch-depth: 0 # Fetch all history for all tags and branches
- name: Install dependencies
run: sudo apt-get install -y python3-github
- name: Run python script
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: python .github/scripts/label_promoted_commits.py --commit_before_merge ${{ github.event.before }} --commit_after_merge ${{ github.event.after }} --repository ${{ github.repository }} --ref ${{ github.ref }}

2
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "seastar"]
path = seastar
url = ../scylla-seastar
url = ../seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui

View File

@@ -78,7 +78,6 @@ target_sources(scylla-main
debug.cc
init.cc
keys.cc
message/messaging_service.cc
multishard_mutation_query.cc
mutation_query.cc
partition_slice_builder.cc
@@ -124,6 +123,7 @@ add_subdirectory(index)
add_subdirectory(interface)
add_subdirectory(lang)
add_subdirectory(locator)
add_subdirectory(message)
add_subdirectory(mutation)
add_subdirectory(mutation_writer)
add_subdirectory(node_ops)
@@ -165,6 +165,7 @@ target_link_libraries(scylla PRIVATE
index
lang
locator
message
mutation
mutation_writer
raft
@@ -193,35 +194,6 @@ target_link_libraries(scylla PRIVATE
seastar
Boost::program_options)
# Force SHA1 build-id generation
set(default_linker_flags "-Wl,--build-id=sha1")
include(CheckLinkerFlag)
set(Scylla_USE_LINKER
""
CACHE
STRING
"Use specified linker instead of the default one")
if(Scylla_USE_LINKER)
set(linkers "${Scylla_USE_LINKER}")
else()
set(linkers "lld" "gold")
endif()
foreach(linker ${linkers})
set(linker_flag "-fuse-ld=${linker}")
check_linker_flag(CXX ${linker_flag} "CXX_LINKER_HAVE_${linker}")
if(CXX_LINKER_HAVE_${linker})
string(APPEND default_linker_flags " ${linker_flag}")
break()
elseif(Scylla_USE_LINKER)
message(FATAL_ERROR "${Scylla_USE_LINKER} is not supported.")
endif()
endforeach()
set(CMAKE_EXE_LINKER_FLAGS "${default_linker_flags}" CACHE INTERNAL "")
# TODO: patch dynamic linker to match configure.py behavior
target_include_directories(scylla PRIVATE
"${CMAKE_CURRENT_SOURCE_DIR}"
"${scylla_gen_build_dir}")

View File

@@ -78,7 +78,7 @@ fi
# Default scylla product/version tags
PRODUCT=scylla
VERSION=5.4.10
VERSION=5.5.0-dev
if test -f version
then

View File

@@ -80,7 +80,7 @@ static sstring_view table_status_to_sstring(table_status tbl_status) {
return "UKNOWN";
}
static future<std::vector<mutation>> create_keyspace(std::string_view keyspace_name, service::storage_proxy& sp, gms::gossiper& gossiper, api::timestamp_type);
static lw_shared_ptr<keyspace_metadata> create_keyspace_metadata(std::string_view keyspace_name, service::storage_proxy& sp, gms::gossiper& gossiper, api::timestamp_type);
static map_type attrs_type() {
static thread_local auto t = map_type_impl::get_instance(utf8_type, bytes_type, true);
@@ -448,7 +448,6 @@ static rjson::value fill_table_description(schema_ptr schema, table_status tbl_s
rjson::add(table_description, "TableName", rjson::from_string(schema->cf_name()));
// FIXME: take the tables creation time, not the current time!
size_t creation_date_seconds = std::chrono::duration_cast<std::chrono::seconds>(gc_clock::now().time_since_epoch()).count();
rjson::add(table_description, "CreationDateTime", rjson::value(creation_date_seconds));
// FIXME: In DynamoDB the CreateTable implementation is asynchronous, and
// the table may be in "Creating" state until creating is finished.
// We don't currently do this in Alternator - instead CreateTable waits
@@ -470,54 +469,58 @@ static rjson::value fill_table_description(schema_ptr schema, table_status tbl_s
rjson::add(table_description["ProvisionedThroughput"], "WriteCapacityUnits", 0);
rjson::add(table_description["ProvisionedThroughput"], "NumberOfDecreasesToday", 0);
std::unordered_map<std::string,std::string> key_attribute_types;
// Add base table's KeySchema and collect types for AttributeDefinitions:
executor::describe_key_schema(table_description, *schema, key_attribute_types);
data_dictionary::table t = proxy.data_dictionary().find_column_family(schema);
if (!t.views().empty()) {
rjson::value gsi_array = rjson::empty_array();
rjson::value lsi_array = rjson::empty_array();
for (const view_ptr& vptr : t.views()) {
rjson::value view_entry = rjson::empty_object();
const sstring& cf_name = vptr->cf_name();
size_t delim_it = cf_name.find(':');
if (delim_it == sstring::npos) {
elogger.error("Invalid internal index table name: {}", cf_name);
continue;
if (tbl_status != table_status::deleting) {
rjson::add(table_description, "CreationDateTime", rjson::value(creation_date_seconds));
std::unordered_map<std::string,std::string> key_attribute_types;
// Add base table's KeySchema and collect types for AttributeDefinitions:
executor::describe_key_schema(table_description, *schema, key_attribute_types);
if (!t.views().empty()) {
rjson::value gsi_array = rjson::empty_array();
rjson::value lsi_array = rjson::empty_array();
for (const view_ptr& vptr : t.views()) {
rjson::value view_entry = rjson::empty_object();
const sstring& cf_name = vptr->cf_name();
size_t delim_it = cf_name.find(':');
if (delim_it == sstring::npos) {
elogger.error("Invalid internal index table name: {}", cf_name);
continue;
}
sstring index_name = cf_name.substr(delim_it + 1);
rjson::add(view_entry, "IndexName", rjson::from_string(index_name));
rjson::add(view_entry, "IndexArn", generate_arn_for_index(*schema, index_name));
// Add indexes's KeySchema and collect types for AttributeDefinitions:
executor::describe_key_schema(view_entry, *vptr, key_attribute_types);
// Add projection type
rjson::value projection = rjson::empty_object();
rjson::add(projection, "ProjectionType", "ALL");
// FIXME: we have to get ProjectionType from the schema when it is added
rjson::add(view_entry, "Projection", std::move(projection));
// Local secondary indexes are marked by an extra '!' sign occurring before the ':' delimiter
rjson::value& index_array = (delim_it > 1 && cf_name[delim_it-1] == '!') ? lsi_array : gsi_array;
rjson::push_back(index_array, std::move(view_entry));
}
if (!lsi_array.Empty()) {
rjson::add(table_description, "LocalSecondaryIndexes", std::move(lsi_array));
}
if (!gsi_array.Empty()) {
rjson::add(table_description, "GlobalSecondaryIndexes", std::move(gsi_array));
}
sstring index_name = cf_name.substr(delim_it + 1);
rjson::add(view_entry, "IndexName", rjson::from_string(index_name));
rjson::add(view_entry, "IndexArn", generate_arn_for_index(*schema, index_name));
// Add indexes's KeySchema and collect types for AttributeDefinitions:
executor::describe_key_schema(view_entry, *vptr, key_attribute_types);
// Add projection type
rjson::value projection = rjson::empty_object();
rjson::add(projection, "ProjectionType", "ALL");
// FIXME: we have to get ProjectionType from the schema when it is added
rjson::add(view_entry, "Projection", std::move(projection));
// Local secondary indexes are marked by an extra '!' sign occurring before the ':' delimiter
rjson::value& index_array = (delim_it > 1 && cf_name[delim_it-1] == '!') ? lsi_array : gsi_array;
rjson::push_back(index_array, std::move(view_entry));
}
if (!lsi_array.Empty()) {
rjson::add(table_description, "LocalSecondaryIndexes", std::move(lsi_array));
}
if (!gsi_array.Empty()) {
rjson::add(table_description, "GlobalSecondaryIndexes", std::move(gsi_array));
// Use map built by describe_key_schema() for base and indexes to produce
// AttributeDefinitions for all key columns:
rjson::value attribute_definitions = rjson::empty_array();
for (auto& type : key_attribute_types) {
rjson::value key = rjson::empty_object();
rjson::add(key, "AttributeName", rjson::from_string(type.first));
rjson::add(key, "AttributeType", rjson::from_string(type.second));
rjson::push_back(attribute_definitions, std::move(key));
}
rjson::add(table_description, "AttributeDefinitions", std::move(attribute_definitions));
}
// Use map built by describe_key_schema() for base and indexes to produce
// AttributeDefinitions for all key columns:
rjson::value attribute_definitions = rjson::empty_array();
for (auto& type : key_attribute_types) {
rjson::value key = rjson::empty_object();
rjson::add(key, "AttributeName", rjson::from_string(type.first));
rjson::add(key, "AttributeType", rjson::from_string(type.second));
rjson::push_back(attribute_definitions, std::move(key));
}
rjson::add(table_description, "AttributeDefinitions", std::move(attribute_definitions));
executor::supplement_table_stream_info(table_description, *schema, proxy);
// FIXME: still missing some response fields (issue #5026)
@@ -1118,8 +1121,9 @@ static future<executor::request_return_type> create_table_on_shard0(tracing::tra
auto group0_guard = co_await mm.start_group0_operation();
auto ts = group0_guard.write_timestamp();
std::vector<mutation> schema_mutations;
auto ksm = create_keyspace_metadata(keyspace_name, sp, gossiper, ts);
try {
schema_mutations = co_await create_keyspace(keyspace_name, sp, gossiper, ts);
schema_mutations = service::prepare_new_keyspace_announcement(sp.local_db(), ksm, ts);
} catch (exceptions::already_exists_exception&) {
if (sp.data_dictionary().has_schema(keyspace_name, table_name)) {
co_return api_error::resource_in_use(format("Table {} already exists", table_name));
@@ -1129,15 +1133,7 @@ static future<executor::request_return_type> create_table_on_shard0(tracing::tra
// This should never happen, the ID is supposed to be unique
co_return api_error::internal(format("Table with ID {} already exists", schema->id()));
}
db::schema_tables::add_table_or_view_to_schema_mutation(schema, ts, true, schema_mutations);
// we must call before_create_column_family callbacks - which allow
// listeners to modify our schema_mutations. For example, CDC may add
// another table (the CDC log table) to the same keyspace.
// Unfortunately the convention is that this callback must be run in
// a Seastar thread.
co_await seastar::async([&] {
mm.get_notifier().before_create_column_family(*schema, schema_mutations, ts);
});
co_await service::prepare_new_column_family_announcement(schema_mutations, sp, *ksm, schema, ts);
for (schema_builder& view_builder : view_builders) {
db::schema_tables::add_table_or_view_to_schema_mutation(
view_ptr(view_builder.build()), ts, true, schema_mutations);
@@ -4461,25 +4457,23 @@ future<executor::request_return_type> executor::describe_continuous_backups(clie
co_return make_jsonable(std::move(response));
}
// Create the keyspace in which we put the alternator table, if it doesn't
// already exist.
// Create the metadata for the keyspace in which we put the alternator
// table if it doesn't already exist.
// Currently, we automatically configure the keyspace based on the number
// of nodes in the cluster: A cluster with 3 or more live nodes, gets RF=3.
// A smaller cluster (presumably, a test only), gets RF=1. The user may
// manually create the keyspace to override this predefined behavior.
static future<std::vector<mutation>> create_keyspace(std::string_view keyspace_name, service::storage_proxy& sp, gms::gossiper& gossiper, api::timestamp_type ts) {
sstring keyspace_name_str(keyspace_name);
static lw_shared_ptr<keyspace_metadata> create_keyspace_metadata(std::string_view keyspace_name, service::storage_proxy& sp, gms::gossiper& gossiper, api::timestamp_type ts) {
int endpoint_count = gossiper.num_endpoints();
int rf = 3;
if (endpoint_count < rf) {
rf = 1;
elogger.warn("Creating keyspace '{}' for Alternator with unsafe RF={} because cluster only has {} nodes.",
keyspace_name_str, rf, endpoint_count);
keyspace_name, rf, endpoint_count);
}
auto opts = get_network_topology_options(sp, gossiper, rf);
auto ksm = keyspace_metadata::new_keyspace(keyspace_name_str, "org.apache.cassandra.locator.NetworkTopologyStrategy", std::move(opts), true);
co_return service::prepare_new_keyspace_announcement(sp.local_db(), ksm, ts);
return keyspace_metadata::new_keyspace(keyspace_name, "org.apache.cassandra.locator.NetworkTopologyStrategy", std::move(opts), true);
}
future<> executor::start() {

View File

@@ -208,10 +208,7 @@ protected:
sstring local_dc = topology.get_datacenter();
std::unordered_set<gms::inet_address> local_dc_nodes = topology.get_datacenter_endpoints().at(local_dc);
for (auto& ip : local_dc_nodes) {
// Note that it's not enough for the node to be is_alive() - a
// node joining the cluster is also "alive" but not responsive to
// requests. We need the node to be in normal state. See #19694.
if (_gossiper.is_normal(ip)) {
if (_gossiper.is_alive(ip)) {
rjson::push_back(results, rjson::from_string(ip.to_sstring()));
}
}

View File

@@ -84,14 +84,6 @@
"type":"string",
"paramType":"path"
},
{
"name":"flush_memtables",
"description":"Controls flushing of memtables before compaction (true by default). Set to \"false\" to skip automatic flushing of memtables before compaction, e.g. when the table is flushed explicitly before invoking the compaction api.",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
},
{
"name":"split_output",
"description":"true if the output of the major compaction should be split in several sstables",

View File

@@ -1,43 +0,0 @@
{
"apiVersion":"0.0.1",
"swaggerVersion":"1.2",
"basePath":"{{Protocol}}://{{Host}}",
"resourcePath":"/raft",
"produces":[
"application/json"
],
"apis":[
{
"path":"/raft/trigger_snapshot/{group_id}",
"operations":[
{
"method":"POST",
"summary":"Triggers snapshot creation and log truncation for the given Raft group",
"type":"string",
"nickname":"trigger_snapshot",
"produces":[
"application/json"
],
"parameters":[
{
"name":"group_id",
"description":"The ID of the group which should get snapshotted",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"timeout",
"description":"Timeout in seconds after which the endpoint returns a failure. If not provided, 60s is used.",
"required":false,
"allowMultiple":false,
"type":"long",
"paramType":"query"
}
]
}
]
}
]
}

View File

@@ -701,30 +701,6 @@
}
]
},
{
"path":"/storage_service/compact",
"operations":[
{
"method":"POST",
"summary":"Forces major compaction in all keyspaces",
"type":"void",
"nickname":"force_compaction",
"produces":[
"application/json"
],
"parameters":[
{
"name":"flush_memtables",
"description":"Controls flushing of memtables before compaction (true by default). Set to \"false\" to skip automatic flushing of memtables before compaction, e.g. when tables were flushed explicitly before invoking the compaction api.",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/keyspace_compaction/{keyspace}",
"operations":[
@@ -752,14 +728,6 @@
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"flush_memtables",
"description":"Controls flushing of memtables before compaction (true by default). Set to \"false\" to skip automatic flushing of memtables before compaction, e.g. when tables were flushed explicitly before invoking the compaction api.",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
@@ -944,21 +912,6 @@
}
]
},
{
"path":"/storage_service/flush",
"operations":[
{
"method":"POST",
"summary":"Flush all memtables in all keyspaces.",
"type":"void",
"nickname":"force_flush",
"produces":[
"application/json"
],
"parameters":[]
}
]
},
{
"path":"/storage_service/keyspace_flush/{keyspace}",
"operations":[

View File

@@ -31,7 +31,6 @@
#include "api/config.hh"
#include "task_manager.hh"
#include "task_manager_test.hh"
#include "raft.hh"
logging::logger apilog("api");
@@ -271,42 +270,38 @@ future<> set_server_done(http_context& ctx) {
});
}
future<> set_server_task_manager(http_context& ctx, lw_shared_ptr<db::config> cfg) {
future<> set_server_task_manager(http_context& ctx, sharded<tasks::task_manager>& tm, lw_shared_ptr<db::config> cfg) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx, &cfg = *cfg](routes& r) {
return ctx.http_server.set_routes([rb, &ctx, &tm, &cfg = *cfg](routes& r) {
rb->register_function(r, "task_manager",
"The task manager API");
set_task_manager(ctx, r, cfg);
set_task_manager(ctx, r, tm, cfg);
});
}
future<> unset_server_task_manager(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_task_manager(ctx, r); });
}
#ifndef SCYLLA_BUILD_MODE_RELEASE
future<> set_server_task_manager_test(http_context& ctx) {
future<> set_server_task_manager_test(http_context& ctx, sharded<tasks::task_manager>& tm) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx](routes& r) mutable {
return ctx.http_server.set_routes([rb, &ctx, &tm](routes& r) mutable {
rb->register_function(r, "task_manager_test",
"The task manager test API");
set_task_manager_test(ctx, r);
set_task_manager_test(ctx, r, tm);
});
}
future<> unset_server_task_manager_test(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_task_manager_test(ctx, r); });
}
#endif
future<> set_server_raft(http_context& ctx, sharded<service::raft_group_registry>& raft_gr) {
auto rb = std::make_shared<api_registry_builder>(ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx, &raft_gr] (routes& r) {
rb->register_function(r, "raft", "The Raft API");
set_raft(ctx, r, raft_gr);
});
}
future<> unset_server_raft(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_raft(ctx, r); });
}
void req_params::process(const request& req) {
// Process mandatory parameters
for (auto& [name, ent] : params) {
@@ -314,7 +309,7 @@ void req_params::process(const request& req) {
continue;
}
try {
ent.value = req.get_path_param(name);
ent.value = req.param[name];
} catch (std::out_of_range&) {
throw httpd::bad_param_exception(fmt::format("Mandatory parameter '{}' was not provided", name));
}

View File

@@ -23,7 +23,6 @@ class load_meter;
class storage_proxy;
class storage_service;
class raft_group0_client;
class raft_group_registry;
} // namespace service
@@ -62,6 +61,10 @@ class gossiper;
namespace auth { class service; }
namespace tasks {
class task_manager;
}
namespace api {
struct http_context {
@@ -71,11 +74,10 @@ struct http_context {
distributed<replica::database>& db;
service::load_meter& lmeter;
const sharded<locator::shared_token_metadata>& shared_token_metadata;
sharded<tasks::task_manager>& tm;
http_context(distributed<replica::database>& _db,
service::load_meter& _lm, const sharded<locator::shared_token_metadata>& _stm, sharded<tasks::task_manager>& _tm)
: db(_db), lmeter(_lm), shared_token_metadata(_stm), tm(_tm) {
service::load_meter& _lm, const sharded<locator::shared_token_metadata>& _stm)
: db(_db), lmeter(_lm), shared_token_metadata(_stm) {
}
const locator::token_metadata& get_token_metadata();
@@ -116,9 +118,9 @@ future<> set_server_gossip_settle(http_context& ctx, sharded<gms::gossiper>& g);
future<> set_server_cache(http_context& ctx);
future<> set_server_compaction_manager(http_context& ctx);
future<> set_server_done(http_context& ctx);
future<> set_server_task_manager(http_context& ctx, lw_shared_ptr<db::config> cfg);
future<> set_server_task_manager_test(http_context& ctx);
future<> set_server_raft(http_context&, sharded<service::raft_group_registry>&);
future<> unset_server_raft(http_context&);
future<> set_server_task_manager(http_context& ctx, sharded<tasks::task_manager>& tm, lw_shared_ptr<db::config> cfg);
future<> unset_server_task_manager(http_context& ctx);
future<> set_server_task_manager_test(http_context& ctx, sharded<tasks::task_manager>& tm);
future<> unset_server_task_manager_test(http_context& ctx);
}

View File

@@ -54,7 +54,7 @@ static const char* str_to_regex(const sstring& v) {
void set_collectd(http_context& ctx, routes& r) {
cd::get_collectd.set(r, [](std::unique_ptr<request> req) {
auto id = ::make_shared<scollectd::type_instance_id>(req->get_path_param("pluginid"),
auto id = ::make_shared<scollectd::type_instance_id>(req->param["pluginid"],
req->get_query_param("instance"), req->get_query_param("type"),
req->get_query_param("type_instance"));
@@ -91,7 +91,7 @@ void set_collectd(http_context& ctx, routes& r) {
});
cd::enable_collectd.set(r, [](std::unique_ptr<request> req) -> future<json::json_return_type> {
std::regex plugin(req->get_path_param("pluginid").c_str());
std::regex plugin(req->param["pluginid"].c_str());
std::regex instance(str_to_regex(req->get_query_param("instance")));
std::regex type(str_to_regex(req->get_query_param("type")));
std::regex type_instance(str_to_regex(req->get_query_param("type_instance")));

View File

@@ -333,7 +333,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_memtable_columns_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->get_path_param("name"), uint64_t{0}, [](replica::column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], uint64_t{0}, [](replica::column_family& cf) {
return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed(std::mem_fn(&replica::memtable::partition_count)), uint64_t(0));
}, std::plus<>());
});
@@ -353,7 +353,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_memtable_off_heap_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->get_path_param("name"), int64_t(0), [](replica::column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](replica::column_family& cf) {
return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {
return active_memtable->region().occupancy().total_space();
}), uint64_t(0));
@@ -369,7 +369,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_memtable_live_data_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->get_path_param("name"), int64_t(0), [](replica::column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](replica::column_family& cf) {
return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {
return active_memtable->region().occupancy().used_space();
}), uint64_t(0));
@@ -394,7 +394,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
cf::get_cf_all_memtables_off_heap_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
warn(unimplemented::cause::INDEXES);
return map_reduce_cf(ctx, req->get_path_param("name"), int64_t(0), [](replica::column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](replica::column_family& cf) {
return cf.occupancy().total_space();
}, std::plus<int64_t>());
});
@@ -410,7 +410,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
cf::get_cf_all_memtables_live_data_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
warn(unimplemented::cause::INDEXES);
return map_reduce_cf(ctx, req->get_path_param("name"), int64_t(0), [](replica::column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](replica::column_family& cf) {
return cf.occupancy().used_space();
}, std::plus<int64_t>());
});
@@ -425,7 +425,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_stats(ctx,req->get_path_param("name") ,&replica::column_family_stats::memtable_switch_count);
return get_cf_stats(ctx,req->param["name"] ,&replica::column_family_stats::memtable_switch_count);
});
cf::get_all_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {
@@ -434,7 +434,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
// FIXME: this refers to partitions, not rows.
cf::get_estimated_row_size_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->get_path_param("name"), utils::estimated_histogram(0), [](replica::column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](replica::column_family& cf) {
utils::estimated_histogram res(0);
for (auto sstables = cf.get_sstables(); auto& i : *sstables) {
res.merge(i->get_stats_metadata().estimated_partition_size);
@@ -446,7 +446,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
// FIXME: this refers to partitions, not rows.
cf::get_estimated_row_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->get_path_param("name"), int64_t(0), [](replica::column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](replica::column_family& cf) {
uint64_t res = 0;
for (auto sstables = cf.get_sstables(); auto& i : *sstables) {
res += i->get_stats_metadata().estimated_partition_size.count();
@@ -457,7 +457,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_estimated_column_count_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->get_path_param("name"), utils::estimated_histogram(0), [](replica::column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](replica::column_family& cf) {
utils::estimated_histogram res(0);
for (auto sstables = cf.get_sstables(); auto& i : *sstables) {
res.merge(i->get_stats_metadata().estimated_cells_count);
@@ -474,7 +474,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_pending_flushes.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_stats(ctx,req->get_path_param("name") ,&replica::column_family_stats::pending_flushes);
return get_cf_stats(ctx,req->param["name"] ,&replica::column_family_stats::pending_flushes);
});
cf::get_all_pending_flushes.set(r, [&ctx] (std::unique_ptr<http::request> req) {
@@ -482,7 +482,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_read.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_stats_count(ctx,req->get_path_param("name") ,&replica::column_family_stats::reads);
return get_cf_stats_count(ctx,req->param["name"] ,&replica::column_family_stats::reads);
});
cf::get_all_read.set(r, [&ctx] (std::unique_ptr<http::request> req) {
@@ -490,7 +490,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_write.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_stats_count(ctx, req->get_path_param("name") ,&replica::column_family_stats::writes);
return get_cf_stats_count(ctx, req->param["name"] ,&replica::column_family_stats::writes);
});
cf::get_all_write.set(r, [&ctx] (std::unique_ptr<http::request> req) {
@@ -498,19 +498,19 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_histogram(ctx, req->get_path_param("name"), &replica::column_family_stats::reads);
return get_cf_histogram(ctx, req->param["name"], &replica::column_family_stats::reads);
});
cf::get_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_rate_and_histogram(ctx, req->get_path_param("name"), &replica::column_family_stats::reads);
return get_cf_rate_and_histogram(ctx, req->param["name"], &replica::column_family_stats::reads);
});
cf::get_read_latency.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_stats_sum(ctx,req->get_path_param("name") ,&replica::column_family_stats::reads);
return get_cf_stats_sum(ctx,req->param["name"] ,&replica::column_family_stats::reads);
});
cf::get_write_latency.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_stats_sum(ctx, req->get_path_param("name") ,&replica::column_family_stats::writes);
return get_cf_stats_sum(ctx, req->param["name"] ,&replica::column_family_stats::writes);
});
cf::get_all_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<http::request> req) {
@@ -522,11 +522,11 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_histogram(ctx, req->get_path_param("name"), &replica::column_family_stats::writes);
return get_cf_histogram(ctx, req->param["name"], &replica::column_family_stats::writes);
});
cf::get_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_rate_and_histogram(ctx, req->get_path_param("name"), &replica::column_family_stats::writes);
return get_cf_rate_and_histogram(ctx, req->param["name"], &replica::column_family_stats::writes);
});
cf::get_all_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<http::request> req) {
@@ -538,7 +538,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_pending_compactions.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->get_path_param("name"), int64_t(0), [](replica::column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](replica::column_family& cf) {
return cf.estimate_pending_compactions();
}, std::plus<int64_t>());
});
@@ -550,7 +550,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_stats(ctx, req->get_path_param("name"), &replica::column_family_stats::live_sstable_count);
return get_cf_stats(ctx, req->param["name"], &replica::column_family_stats::live_sstable_count);
});
cf::get_all_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {
@@ -558,11 +558,11 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_unleveled_sstables.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_unleveled_sstables(ctx, req->get_path_param("name"));
return get_cf_unleveled_sstables(ctx, req->param["name"]);
});
cf::get_live_disk_space_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return sum_sstable(ctx, req->get_path_param("name"), false);
return sum_sstable(ctx, req->param["name"], false);
});
cf::get_all_live_disk_space_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {
@@ -570,7 +570,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_total_disk_space_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return sum_sstable(ctx, req->get_path_param("name"), true);
return sum_sstable(ctx, req->param["name"], true);
});
cf::get_all_total_disk_space_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {
@@ -579,7 +579,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
// FIXME: this refers to partitions, not rows.
cf::get_min_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->get_path_param("name"), INT64_MAX, min_partition_size, min_int64);
return map_reduce_cf(ctx, req->param["name"], INT64_MAX, min_partition_size, min_int64);
});
// FIXME: this refers to partitions, not rows.
@@ -589,7 +589,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
// FIXME: this refers to partitions, not rows.
cf::get_max_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->get_path_param("name"), int64_t(0), max_partition_size, max_int64);
return map_reduce_cf(ctx, req->param["name"], int64_t(0), max_partition_size, max_int64);
});
// FIXME: this refers to partitions, not rows.
@@ -600,7 +600,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
// FIXME: this refers to partitions, not rows.
cf::get_mean_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
// Cassandra 3.x mean values are truncated as integrals.
return map_reduce_cf(ctx, req->get_path_param("name"), integral_ratio_holder(), mean_partition_size, std::plus<integral_ratio_holder>());
return map_reduce_cf(ctx, req->param["name"], integral_ratio_holder(), mean_partition_size, std::plus<integral_ratio_holder>());
});
// FIXME: this refers to partitions, not rows.
@@ -610,7 +610,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->get_path_param("name"), uint64_t(0), [] (replica::column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (replica::column_family& cf) {
auto sstables = cf.get_sstables();
return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return s + sst->filter_get_false_positive();
@@ -628,7 +628,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_recent_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->get_path_param("name"), uint64_t(0), [] (replica::column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (replica::column_family& cf) {
auto sstables = cf.get_sstables();
return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return s + sst->filter_get_recent_false_positive();
@@ -646,7 +646,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->get_path_param("name"), ratio_holder(), [] (replica::column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], ratio_holder(), [] (replica::column_family& cf) {
return boost::accumulate(*cf.get_sstables() | boost::adaptors::transformed(filter_false_positive_as_ratio_holder), ratio_holder());
}, std::plus<>());
});
@@ -658,7 +658,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_recent_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->get_path_param("name"), ratio_holder(), [] (replica::column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], ratio_holder(), [] (replica::column_family& cf) {
return boost::accumulate(*cf.get_sstables() | boost::adaptors::transformed(filter_recent_false_positive_as_ratio_holder), ratio_holder());
}, std::plus<>());
});
@@ -670,7 +670,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_bloom_filter_disk_space_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->get_path_param("name"), uint64_t(0), [] (replica::column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (replica::column_family& cf) {
auto sstables = cf.get_sstables();
return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return s + sst->filter_size();
@@ -688,7 +688,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_bloom_filter_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->get_path_param("name"), uint64_t(0), [] (replica::column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (replica::column_family& cf) {
auto sstables = cf.get_sstables();
return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return s + sst->filter_memory_size();
@@ -706,7 +706,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->get_path_param("name"), uint64_t(0), [] (replica::column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (replica::column_family& cf) {
auto sstables = cf.get_sstables();
return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return s + sst->get_summary().memory_footprint();
@@ -729,7 +729,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
// We are missing the off heap memory calculation
// Return 0 is the wrong value. It's a work around
// until the memory calculation will be available
//auto id = get_uuid(req->get_path_param("name"), ctx.db.local());
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
});
@@ -742,7 +742,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
cf::get_speculative_retries.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->get_path_param("name"), ctx.db.local());
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
});
@@ -755,7 +755,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
cf::get_key_cache_hit_rate.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->get_path_param("name"), ctx.db.local());
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
});
@@ -780,7 +780,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
cf::get_row_cache_hit_out_of_range.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->get_path_param("name"), ctx.db.local());
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
});
@@ -791,7 +791,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_row_cache_hit.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf_raw(ctx, req->get_path_param("name"), utils::rate_moving_average(), [](const replica::column_family& cf) {
return map_reduce_cf_raw(ctx, req->param["name"], utils::rate_moving_average(), [](const replica::column_family& cf) {
return cf.get_row_cache().stats().hits.rate();
}, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {
return make_ready_future<json::json_return_type>(meter_to_json(m));
@@ -807,7 +807,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_row_cache_miss.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf_raw(ctx, req->get_path_param("name"), utils::rate_moving_average(), [](const replica::column_family& cf) {
return map_reduce_cf_raw(ctx, req->param["name"], utils::rate_moving_average(), [](const replica::column_family& cf) {
return cf.get_row_cache().stats().misses.rate();
}, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {
return make_ready_future<json::json_return_type>(meter_to_json(m));
@@ -824,57 +824,57 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_cas_prepare.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf_time_histogram(ctx, req->get_path_param("name"), [](const replica::column_family& cf) {
return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const replica::column_family& cf) {
return cf.get_stats().cas_prepare.histogram();
});
});
cf::get_cas_propose.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf_time_histogram(ctx, req->get_path_param("name"), [](const replica::column_family& cf) {
return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const replica::column_family& cf) {
return cf.get_stats().cas_accept.histogram();
});
});
cf::get_cas_commit.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf_time_histogram(ctx, req->get_path_param("name"), [](const replica::column_family& cf) {
return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const replica::column_family& cf) {
return cf.get_stats().cas_learn.histogram();
});
});
cf::get_sstables_per_read_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->get_path_param("name"), utils::estimated_histogram(0), [](replica::column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](replica::column_family& cf) {
return cf.get_stats().estimated_sstable_per_read;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
});
cf::get_tombstone_scanned_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_histogram(ctx, req->get_path_param("name"), &replica::column_family_stats::tombstone_scanned);
return get_cf_histogram(ctx, req->param["name"], &replica::column_family_stats::tombstone_scanned);
});
cf::get_live_scanned_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_histogram(ctx, req->get_path_param("name"), &replica::column_family_stats::live_scanned);
return get_cf_histogram(ctx, req->param["name"], &replica::column_family_stats::live_scanned);
});
cf::get_col_update_time_delta_histogram.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->get_path_param("name"), ctx.db.local());
//auto id = get_uuid(req->param["name"], ctx.db.local());
std::vector<double> res;
return make_ready_future<json::json_return_type>(res);
});
cf::get_auto_compaction.set(r, [&ctx] (const_req req) {
auto uuid = get_uuid(req.get_path_param("name"), ctx.db.local());
auto uuid = get_uuid(req.param["name"], ctx.db.local());
replica::column_family& cf = ctx.db.local().find_column_family(uuid);
return !cf.is_auto_compaction_disabled_by_user();
});
cf::enable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {
apilog.info("column_family/enable_auto_compaction: name={}", req->get_path_param("name"));
apilog.info("column_family/enable_auto_compaction: name={}", req->param["name"]);
return ctx.db.invoke_on(0, [&ctx, req = std::move(req)] (replica::database& db) {
auto g = replica::database::autocompaction_toggle_guard(db);
return foreach_column_family(ctx, req->get_path_param("name"), [](replica::column_family &cf) {
return foreach_column_family(ctx, req->param["name"], [](replica::column_family &cf) {
cf.enable_auto_compaction();
}).then([g = std::move(g)] {
return make_ready_future<json::json_return_type>(json_void());
@@ -883,10 +883,10 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::disable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {
apilog.info("column_family/disable_auto_compaction: name={}", req->get_path_param("name"));
apilog.info("column_family/disable_auto_compaction: name={}", req->param["name"]);
return ctx.db.invoke_on(0, [&ctx, req = std::move(req)] (replica::database& db) {
auto g = replica::database::autocompaction_toggle_guard(db);
return foreach_column_family(ctx, req->get_path_param("name"), [](replica::column_family &cf) {
return foreach_column_family(ctx, req->param["name"], [](replica::column_family &cf) {
return cf.disable_auto_compaction();
}).then([g = std::move(g)] {
return make_ready_future<json::json_return_type>(json_void());
@@ -895,14 +895,14 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_tombstone_gc.set(r, [&ctx] (const_req req) {
auto uuid = get_uuid(req.get_path_param("name"), ctx.db.local());
auto uuid = get_uuid(req.param["name"], ctx.db.local());
replica::table& t = ctx.db.local().find_column_family(uuid);
return t.tombstone_gc_enabled();
});
cf::enable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {
apilog.info("column_family/enable_tombstone_gc: name={}", req->get_path_param("name"));
return foreach_column_family(ctx, req->get_path_param("name"), [](replica::table& t) {
apilog.info("column_family/enable_tombstone_gc: name={}", req->param["name"]);
return foreach_column_family(ctx, req->param["name"], [](replica::table& t) {
t.set_tombstone_gc_enabled(true);
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
@@ -910,8 +910,8 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::disable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {
apilog.info("column_family/disable_tombstone_gc: name={}", req->get_path_param("name"));
return foreach_column_family(ctx, req->get_path_param("name"), [](replica::table& t) {
apilog.info("column_family/disable_tombstone_gc: name={}", req->param["name"]);
return foreach_column_family(ctx, req->param["name"], [](replica::table& t) {
t.set_tombstone_gc_enabled(false);
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
@@ -919,7 +919,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_built_indexes.set(r, [&ctx, &sys_ks](std::unique_ptr<http::request> req) {
auto ks_cf = parse_fully_qualified_cf_name(req->get_path_param("name"));
auto ks_cf = parse_fully_qualified_cf_name(req->param["name"]);
auto&& ks = std::get<0>(ks_cf);
auto&& cf_name = std::get<1>(ks_cf);
return sys_ks.local().load_view_build_progress().then([ks, cf_name, &ctx](const std::vector<db::system_keyspace_view_build_progress>& vb) mutable {
@@ -957,7 +957,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_compression_ratio.set(r, [&ctx](std::unique_ptr<http::request> req) {
auto uuid = get_uuid(req->get_path_param("name"), ctx.db.local());
auto uuid = get_uuid(req->param["name"], ctx.db.local());
return ctx.db.map_reduce(sum_ratio<double>(), [uuid](replica::database& db) {
replica::column_family& cf = db.find_column_family(uuid);
@@ -968,21 +968,21 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_read_latency_estimated_histogram.set(r, [&ctx](std::unique_ptr<http::request> req) {
return map_reduce_cf_time_histogram(ctx, req->get_path_param("name"), [](const replica::column_family& cf) {
return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const replica::column_family& cf) {
return cf.get_stats().reads.histogram();
});
});
cf::get_write_latency_estimated_histogram.set(r, [&ctx](std::unique_ptr<http::request> req) {
return map_reduce_cf_time_histogram(ctx, req->get_path_param("name"), [](const replica::column_family& cf) {
return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const replica::column_family& cf) {
return cf.get_stats().writes.histogram();
});
});
cf::set_compaction_strategy_class.set(r, [&ctx](std::unique_ptr<http::request> req) {
sstring strategy = req->get_query_param("class_name");
apilog.info("column_family/set_compaction_strategy_class: name={} strategy={}", req->get_path_param("name"), strategy);
return foreach_column_family(ctx, req->get_path_param("name"), [strategy](replica::column_family& cf) {
apilog.info("column_family/set_compaction_strategy_class: name={} strategy={}", req->param["name"], strategy);
return foreach_column_family(ctx, req->param["name"], [strategy](replica::column_family& cf) {
cf.set_compaction_strategy(sstables::compaction_strategy::type(strategy));
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
@@ -990,7 +990,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_compaction_strategy_class.set(r, [&ctx](const_req req) {
return ctx.db.local().find_column_family(get_uuid(req.get_path_param("name"), ctx.db.local())).get_compaction_strategy().name();
return ctx.db.local().find_column_family(get_uuid(req.param["name"], ctx.db.local())).get_compaction_strategy().name();
});
cf::set_compression_parameters.set(r, [](std::unique_ptr<http::request> req) {
@@ -1006,7 +1006,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_sstable_count_per_level.set(r, [&ctx](std::unique_ptr<http::request> req) {
return map_reduce_cf_raw(ctx, req->get_path_param("name"), std::vector<uint64_t>(), [](const replica::column_family& cf) {
return map_reduce_cf_raw(ctx, req->param["name"], std::vector<uint64_t>(), [](const replica::column_family& cf) {
return cf.sstable_count_per_level();
}, concat_sstable_count_per_level).then([](const std::vector<uint64_t>& res) {
return make_ready_future<json::json_return_type>(res);
@@ -1015,7 +1015,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
cf::get_sstables_for_key.set(r, [&ctx](std::unique_ptr<http::request> req) {
auto key = req->get_query_param("key");
auto uuid = get_uuid(req->get_path_param("name"), ctx.db.local());
auto uuid = get_uuid(req->param["name"], ctx.db.local());
return ctx.db.map_reduce0([key, uuid] (replica::database& db) -> future<std::unordered_set<sstring>> {
auto sstables = co_await db.find_column_family(uuid).get_sstables_by_partition_key(key);
@@ -1031,7 +1031,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
cf::toppartitions.set(r, [&ctx] (std::unique_ptr<http::request> req) {
auto name = req->get_path_param("name");
auto name = req->param["name"];
auto [ks, cf] = parse_fully_qualified_cf_name(name);
api::req_param<std::chrono::milliseconds, unsigned> duration{*req, "duration", 1000ms};
@@ -1047,19 +1047,12 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::force_major_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto params = req_params({
std::pair("name", mandatory::yes),
std::pair("flush_memtables", mandatory::no),
std::pair("split_output", mandatory::no),
});
params.process(*req);
if (params.get("split_output")) {
if (req->get_query_param("split_output") != "") {
fail(unimplemented::cause::API);
}
auto [ks, cf] = parse_fully_qualified_cf_name(*params.get("name"));
auto flush = params.get_as<bool>("flush_memtables").value_or(true);
apilog.info("column_family/force_major_compaction: name={} flush={}", req->get_path_param("name"), flush);
apilog.info("column_family/force_major_compaction: name={}", req->param["name"]);
auto [ks, cf] = parse_fully_qualified_cf_name(req->param["name"]);
auto keyspace = validate_keyspace(ctx, ks);
std::vector<table_info> table_infos = {table_info{
.name = cf,
@@ -1067,11 +1060,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
}};
auto& compaction_module = ctx.db.local().get_compaction_manager().get_task_manager_module();
std::optional<major_compaction_task_impl::flush_mode> fmopt;
if (!flush) {
fmopt = major_compaction_task_impl::flush_mode::skip;
}
auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), tasks::task_id::create_null_id(), ctx.db, std::move(table_infos), fmopt);
auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), ctx.db, std::move(table_infos));
co_await task->done();
co_return json_void();
});

View File

@@ -7,7 +7,6 @@
*/
#include <seastar/core/coroutine.hh>
#include <seastar/coroutine/exception.hh>
#include "compaction_manager.hh"
#include "compaction/compaction_manager.hh"
@@ -110,7 +109,7 @@ void set_compaction_manager(http_context& ctx, routes& r) {
});
cm::stop_keyspace_compaction.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto ks_name = validate_keyspace(ctx, req);
auto ks_name = validate_keyspace(ctx, req->param);
auto table_names = parse_tables(ks_name, ctx, req->query_parameters, "tables");
if (table_names.empty()) {
table_names = map_keys(ctx.db.local().find_keyspace(ks_name).metadata().get()->cf_meta_data());
@@ -153,13 +152,10 @@ void set_compaction_manager(http_context& ctx, routes& r) {
});
cm::get_compaction_history.set(r, [&ctx] (std::unique_ptr<http::request> req) {
std::function<future<>(output_stream<char>&&)> f = [&ctx] (output_stream<char>&& out) -> future<> {
auto s = std::move(out);
bool first = true;
std::exception_ptr ex;
try {
co_await s.write("[");
co_await ctx.db.local().get_compaction_manager().get_compaction_history([&s, &first](const db::compaction_history_entry& entry) mutable -> future<> {
std::function<future<>(output_stream<char>&&)> f = [&ctx](output_stream<char>&& s) {
return do_with(output_stream<char>(std::move(s)), true, [&ctx] (output_stream<char>& s, bool& first){
return s.write("[").then([&ctx, &s, &first] {
return ctx.db.local().get_compaction_manager().get_compaction_history([&s, &first](const db::compaction_history_entry& entry) mutable {
cm::history h;
h.id = entry.id.to_sstring();
h.ks = std::move(entry.ks);
@@ -173,21 +169,18 @@ void set_compaction_manager(http_context& ctx, routes& r) {
e.value = it.second;
h.rows_merged.push(std::move(e));
}
if (!first) {
co_await s.write(", ");
}
auto fut = first ? make_ready_future<>() : s.write(", ");
first = false;
co_await formatter::write(s, h);
return fut.then([&s, h = std::move(h)] {
return formatter::write(s, h);
});
}).then([&s] {
return s.write("]").then([&s] {
return s.close();
});
});
co_await s.write("]");
co_await s.flush();
} catch (...) {
ex = std::current_exception();
}
co_await s.close();
if (ex) {
co_await coroutine::return_exception_ptr(std::move(ex));
}
});
});
};
return make_ready_future<json::json_return_type>(std::move(f));
});

View File

@@ -91,7 +91,7 @@ void set_config(std::shared_ptr < api_registry_builder20 > rb, http_context& ctx
});
cs::find_config_id.set(r, [&cfg] (const_req r) {
auto id = r.get_path_param("id");
auto id = r.param["id"];
for (auto&& cfg_ref : cfg.values()) {
auto&& cfg = cfg_ref.get();
if (id == cfg.name()) {

View File

@@ -24,7 +24,7 @@ namespace hf = httpd::error_injection_json;
void set_error_injection(http_context& ctx, routes& r) {
hf::enable_injection.set(r, [](std::unique_ptr<request> req) {
sstring injection = req->get_path_param("injection");
sstring injection = req->param["injection"];
bool one_shot = req->get_query_param("one_shot") == "True";
auto params = req->content;
@@ -56,7 +56,7 @@ void set_error_injection(http_context& ctx, routes& r) {
});
hf::disable_injection.set(r, [](std::unique_ptr<request> req) {
sstring injection = req->get_path_param("injection");
sstring injection = req->param["injection"];
auto& errinj = utils::get_local_injector();
return errinj.disable_on_all(injection).then([] {
@@ -72,7 +72,7 @@ void set_error_injection(http_context& ctx, routes& r) {
});
hf::message_injection.set(r, [](std::unique_ptr<request> req) {
sstring injection = req->get_path_param("injection");
sstring injection = req->param["injection"];
auto& errinj = utils::get_local_injector();
return errinj.receive_message_on_all(injection).then([] {
return make_ready_future<json::json_return_type>(json::json_void());

View File

@@ -18,43 +18,37 @@ namespace fd = httpd::failure_detector_json;
void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {
fd::get_all_endpoint_states.set(r, [&g](std::unique_ptr<request> req) {
return g.container().invoke_on(0, [] (gms::gossiper& g) {
std::vector<fd::endpoint_state> res;
res.reserve(g.num_endpoints());
g.for_each_endpoint_state([&] (const gms::inet_address& addr, const gms::endpoint_state& eps) {
fd::endpoint_state val;
val.addrs = fmt::to_string(addr);
val.is_alive = g.is_alive(addr);
val.generation = eps.get_heart_beat_state().get_generation().value();
val.version = eps.get_heart_beat_state().get_heart_beat_version().value();
val.update_time = eps.get_update_timestamp().time_since_epoch().count();
for (const auto& [as_type, app_state] : eps.get_application_state_map()) {
fd::version_value version_val;
// We return the enum index and not it's name to stay compatible to origin
// method that the state index are static but the name can be changed.
version_val.application_state = static_cast<std::underlying_type<gms::application_state>::type>(as_type);
version_val.value = app_state.value();
version_val.version = app_state.version().value();
val.application_state.push(version_val);
}
res.emplace_back(std::move(val));
});
return make_ready_future<json::json_return_type>(res);
std::vector<fd::endpoint_state> res;
res.reserve(g.num_endpoints());
g.for_each_endpoint_state([&] (const gms::inet_address& addr, const gms::endpoint_state& eps) {
fd::endpoint_state val;
val.addrs = fmt::to_string(addr);
val.is_alive = g.is_alive(addr);
val.generation = eps.get_heart_beat_state().get_generation().value();
val.version = eps.get_heart_beat_state().get_heart_beat_version().value();
val.update_time = eps.get_update_timestamp().time_since_epoch().count();
for (const auto& [as_type, app_state] : eps.get_application_state_map()) {
fd::version_value version_val;
// We return the enum index and not it's name to stay compatible to origin
// method that the state index are static but the name can be changed.
version_val.application_state = static_cast<std::underlying_type<gms::application_state>::type>(as_type);
version_val.value = app_state.value();
version_val.version = app_state.version().value();
val.application_state.push(version_val);
}
res.emplace_back(std::move(val));
});
return make_ready_future<json::json_return_type>(res);
});
fd::get_up_endpoint_count.set(r, [&g](std::unique_ptr<request> req) {
return g.container().invoke_on(0, [] (gms::gossiper& g) {
int res = g.get_up_endpoint_count();
return make_ready_future<json::json_return_type>(res);
});
int res = g.get_up_endpoint_count();
return make_ready_future<json::json_return_type>(res);
});
fd::get_down_endpoint_count.set(r, [&g](std::unique_ptr<request> req) {
return g.container().invoke_on(0, [] (gms::gossiper& g) {
int res = g.get_down_endpoint_count();
return make_ready_future<json::json_return_type>(res);
});
int res = g.get_down_endpoint_count();
return make_ready_future<json::json_return_type>(res);
});
fd::get_phi_convict_threshold.set(r, [] (std::unique_ptr<request> req) {
@@ -62,13 +56,11 @@ void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {
});
fd::get_simple_states.set(r, [&g] (std::unique_ptr<request> req) {
return g.container().invoke_on(0, [] (gms::gossiper& g) {
std::map<sstring, sstring> nodes_status;
g.for_each_endpoint_state([&] (const gms::inet_address& node, const gms::endpoint_state&) {
nodes_status.emplace(node.to_sstring(), g.is_alive(node) ? "UP" : "DOWN");
});
return make_ready_future<json::json_return_type>(map_to_key_value<fd::mapper>(nodes_status));
std::map<sstring, sstring> nodes_status;
g.for_each_endpoint_state([&] (const gms::inet_address& node, const gms::endpoint_state&) {
nodes_status.emplace(node.to_sstring(), g.is_alive(node) ? "UP" : "DOWN");
});
return make_ready_future<json::json_return_type>(map_to_key_value<fd::mapper>(nodes_status));
});
fd::set_phi_convict_threshold.set(r, [](std::unique_ptr<request> req) {
@@ -79,15 +71,13 @@ void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {
});
fd::get_endpoint_state.set(r, [&g] (std::unique_ptr<request> req) {
return g.container().invoke_on(0, [req = std::move(req)] (gms::gossiper& g) {
auto state = g.get_endpoint_state_ptr(gms::inet_address(req->get_path_param("addr")));
if (!state) {
return make_ready_future<json::json_return_type>(format("unknown endpoint {}", req->get_path_param("addr")));
}
std::stringstream ss;
g.append_endpoint_state(ss, *state);
return make_ready_future<json::json_return_type>(sstring(ss.str()));
});
auto state = g.get_endpoint_state_ptr(gms::inet_address(req->param["addr"]));
if (!state) {
return make_ready_future<json::json_return_type>(format("unknown endpoint {}", req->param["addr"]));
}
std::stringstream ss;
g.append_endpoint_state(ss, *state);
return make_ready_future<json::json_return_type>(sstring(ss.str()));
});
fd::get_endpoint_phi_values.set(r, [](std::unique_ptr<request> req) {

View File

@@ -31,21 +31,21 @@ void set_gossiper(http_context& ctx, routes& r, gms::gossiper& g) {
});
httpd::gossiper_json::get_endpoint_downtime.set(r, [&g] (std::unique_ptr<request> req) -> future<json::json_return_type> {
gms::inet_address ep(req->get_path_param("addr"));
gms::inet_address ep(req->param["addr"]);
// synchronize unreachable_members on all shards
co_await g.get_unreachable_members_synchronized();
co_return g.get_endpoint_downtime(ep);
});
httpd::gossiper_json::get_current_generation_number.set(r, [&g] (std::unique_ptr<http::request> req) {
gms::inet_address ep(req->get_path_param("addr"));
gms::inet_address ep(req->param["addr"]);
return g.get_current_generation_number(ep).then([] (gms::generation_type res) {
return make_ready_future<json::json_return_type>(res.value());
});
});
httpd::gossiper_json::get_current_heart_beat_version.set(r, [&g] (std::unique_ptr<http::request> req) {
gms::inet_address ep(req->get_path_param("addr"));
gms::inet_address ep(req->param["addr"]);
return g.get_current_heart_beat_version(ep).then([] (gms::version_type res) {
return make_ready_future<json::json_return_type>(res.value());
});
@@ -53,17 +53,17 @@ void set_gossiper(http_context& ctx, routes& r, gms::gossiper& g) {
httpd::gossiper_json::assassinate_endpoint.set(r, [&g](std::unique_ptr<http::request> req) {
if (req->get_query_param("unsafe") != "True") {
return g.assassinate_endpoint(req->get_path_param("addr")).then([] {
return g.assassinate_endpoint(req->param["addr"]).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
}
return g.unsafe_assassinate_endpoint(req->get_path_param("addr")).then([] {
return g.unsafe_assassinate_endpoint(req->param["addr"]).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
httpd::gossiper_json::force_remove_endpoint.set(r, [&g](std::unique_ptr<http::request> req) {
gms::inet_address ep(req->get_path_param("addr"));
gms::inet_address ep(req->param["addr"]);
return g.force_remove_endpoint(ep, gms::null_permit_id).then([] {
return make_ready_future<json::json_return_type>(json_void());
});

View File

@@ -1,70 +0,0 @@
/*
* Copyright (C) 2024-present ScyllaDB
*/
/*
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#include <seastar/core/coroutine.hh>
#include "api/api.hh"
#include "api/api-doc/raft.json.hh"
#include "service/raft/raft_group_registry.hh"
using namespace seastar::httpd;
extern logging::logger apilog;
namespace api {
namespace r = httpd::raft_json;
using namespace json;
void set_raft(http_context&, httpd::routes& r, sharded<service::raft_group_registry>& raft_gr) {
r::trigger_snapshot.set(r, [&raft_gr] (std::unique_ptr<http::request> req) -> future<json_return_type> {
raft::group_id gid{utils::UUID{req->get_path_param("group_id")}};
auto timeout_dur = std::invoke([timeout_str = req->get_query_param("timeout")] {
if (timeout_str.empty()) {
return std::chrono::seconds{60};
}
auto dur = std::stoll(timeout_str);
if (dur <= 0) {
throw std::runtime_error{"Timeout must be a positive number."};
}
return std::chrono::seconds{dur};
});
std::atomic<bool> found_srv{false};
co_await raft_gr.invoke_on_all([gid, timeout_dur, &found_srv] (service::raft_group_registry& raft_gr) -> future<> {
auto* srv = raft_gr.find_server(gid);
if (!srv) {
co_return;
}
found_srv = true;
abort_on_expiry aoe(lowres_clock::now() + timeout_dur);
apilog.info("Triggering Raft group {} snapshot", gid);
auto result = co_await srv->trigger_snapshot(&aoe.abort_source());
if (result) {
apilog.info("New snapshot for Raft group {} created", gid);
} else {
apilog.info("Could not create new snapshot for Raft group {}, no new entries applied", gid);
}
});
if (!found_srv) {
throw std::runtime_error{fmt::format("Server for group ID {} not found", gid)};
}
co_return json_void{};
});
}
void unset_raft(http_context&, httpd::routes& r) {
r::trigger_snapshot.unset(r);
}
}

View File

@@ -1,18 +0,0 @@
/*
* Copyright (C) 2023-present ScyllaDB
*/
/*
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#pragma once
#include "api_init.hh"
namespace api {
void set_raft(http_context& ctx, httpd::routes& r, sharded<service::raft_group_registry>& raft_gr);
void unset_raft(http_context& ctx, httpd::routes& r);
}

View File

@@ -58,19 +58,15 @@ namespace ss = httpd::storage_service_json;
namespace sp = httpd::storage_proxy_json;
using namespace json;
sstring validate_keyspace(const http_context& ctx, sstring ks_name) {
sstring validate_keyspace(http_context& ctx, sstring ks_name) {
if (ctx.db.local().has_keyspace(ks_name)) {
return ks_name;
}
throw bad_param_exception(replica::no_such_keyspace(ks_name).what());
}
sstring validate_keyspace(const http_context& ctx, const std::unique_ptr<http::request>& req) {
return validate_keyspace(ctx, req->get_path_param("keyspace"));
}
sstring validate_keyspace(const http_context& ctx, const http::request& req) {
return validate_keyspace(ctx, req.get_path_param("keyspace"));
sstring validate_keyspace(http_context& ctx, const parameters& param) {
return validate_keyspace(ctx, param["keyspace"]);
}
locator::host_id validate_host_id(const sstring& param) {
@@ -175,7 +171,7 @@ using ks_cf_func = std::function<future<json::json_return_type>(http_context&, s
static auto wrap_ks_cf(http_context &ctx, ks_cf_func f) {
return [&ctx, f = std::move(f)](std::unique_ptr<http::request> req) {
auto keyspace = validate_keyspace(ctx, req);
auto keyspace = validate_keyspace(ctx, req->param);
auto table_infos = parse_table_infos(keyspace, ctx, req->query_parameters, "cf");
return f(ctx, std::move(req), std::move(keyspace), std::move(table_infos));
};
@@ -254,21 +250,17 @@ future<json::json_return_type> set_tables_tombstone_gc(http_context& ctx, const
}
void set_transport_controller(http_context& ctx, routes& r, cql_transport::controller& ctl) {
ss::start_native_transport.set(r, [&ctx, &ctl](std::unique_ptr<http::request> req) {
ss::start_native_transport.set(r, [&ctl](std::unique_ptr<http::request> req) {
return smp::submit_to(0, [&] {
return with_scheduling_group(ctx.db.local().get_statement_scheduling_group(), [&ctl] {
return ctl.start_server();
});
return ctl.start_server();
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::stop_native_transport.set(r, [&ctx, &ctl](std::unique_ptr<http::request> req) {
ss::stop_native_transport.set(r, [&ctl](std::unique_ptr<http::request> req) {
return smp::submit_to(0, [&] {
return with_scheduling_group(ctx.db.local().get_statement_scheduling_group(), [&ctl] {
return ctl.request_stop_server();
});
return ctl.request_stop_server();
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
@@ -290,21 +282,17 @@ void unset_transport_controller(http_context& ctx, routes& r) {
}
void set_rpc_controller(http_context& ctx, routes& r, thrift_controller& ctl) {
ss::stop_rpc_server.set(r, [&ctx, &ctl](std::unique_ptr<http::request> req) {
ss::stop_rpc_server.set(r, [&ctl](std::unique_ptr<http::request> req) {
return smp::submit_to(0, [&] {
return with_scheduling_group(ctx.db.local().get_statement_scheduling_group(), [&ctl] {
return ctl.request_stop_server();
});
return ctl.request_stop_server();
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::start_rpc_server.set(r, [&ctx, &ctl](std::unique_ptr<http::request> req) {
ss::start_rpc_server.set(r, [&ctl](std::unique_ptr<http::request> req) {
return smp::submit_to(0, [&] {
return with_scheduling_group(ctx.db.local().get_statement_scheduling_group(), [&ctl] {
return ctl.start_server();
});
return ctl.start_server();
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
@@ -342,7 +330,7 @@ void set_repair(http_context& ctx, routes& r, sharded<repair_service>& repair) {
// returns immediately, not waiting for the repair to finish. The user
// then has other mechanisms to track the ongoing repair's progress,
// or stop it.
return repair_start(repair, validate_keyspace(ctx, req),
return repair_start(repair, validate_keyspace(ctx, req->param),
options_map).then([] (int i) {
return make_ready_future<json::json_return_type>(i);
});
@@ -425,7 +413,7 @@ void unset_repair(http_context& ctx, routes& r) {
void set_sstables_loader(http_context& ctx, routes& r, sharded<sstables_loader>& sst_loader) {
ss::load_new_ss_tables.set(r, [&ctx, &sst_loader](std::unique_ptr<http::request> req) {
auto ks = validate_keyspace(ctx, req);
auto ks = validate_keyspace(ctx, req->param);
auto cf = req->get_query_param("cf");
auto stream = req->get_query_param("load_and_stream");
auto primary_replica = req->get_query_param("primary_replica_only");
@@ -456,8 +444,8 @@ void unset_sstables_loader(http_context& ctx, routes& r) {
void set_view_builder(http_context& ctx, routes& r, sharded<db::view::view_builder>& vb) {
ss::view_build_statuses.set(r, [&ctx, &vb] (std::unique_ptr<http::request> req) {
auto keyspace = validate_keyspace(ctx, req);
auto view = req->get_path_param("view");
auto keyspace = validate_keyspace(ctx, req->param);
auto view = req->param["view"];
return vb.local().view_build_statuses(std::move(keyspace), std::move(view)).then([] (std::unordered_map<sstring, sstring> status) {
std::vector<storage_service_json::mapper> res;
return make_ready_future<json::json_return_type>(map_to_key_value(std::move(status), res));
@@ -594,7 +582,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
});
ss::get_range_to_endpoint_map.set(r, [&ctx, &ss](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto keyspace = validate_keyspace(ctx, req);
auto keyspace = validate_keyspace(ctx, req->param);
std::vector<ss::maplist_mapper> res;
co_return stream_range_as_array(co_await ss.local().get_range_to_address_map(keyspace),
[](const std::pair<dht::token_range, inet_address_vector_replica_set>& entry){
@@ -619,7 +607,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
ss::get_pending_range_to_endpoint_map.set(r, [&ctx](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
auto keyspace = validate_keyspace(ctx, req);
auto keyspace = validate_keyspace(ctx, req->param);
std::vector<ss::maplist_mapper> res;
return make_ready_future<json::json_return_type>(res);
});
@@ -635,7 +623,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
});
ss::describe_ring.set(r, [&ctx, &ss](std::unique_ptr<http::request> req) {
return describe_ring_as_json(ss, validate_keyspace(ctx, req));
return describe_ring_as_json(ss, validate_keyspace(ctx, req->param));
});
ss::get_host_id_map.set(r, [&ss](const_req req) {
@@ -668,7 +656,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
});
ss::get_natural_endpoints.set(r, [&ctx, &ss](const_req req) {
auto keyspace = validate_keyspace(ctx, req);
auto keyspace = validate_keyspace(ctx, req.param);
return container_to_vec(ss.local().get_natural_endpoints(keyspace, req.get_query_param("cf"),
req.get_query_param("key")));
});
@@ -681,50 +669,14 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
});
});
ss::force_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto& db = ctx.db;
auto params = req_params({
std::pair("flush_memtables", mandatory::no),
});
params.process(*req);
auto flush = params.get_as<bool>("flush_memtables").value_or(true);
apilog.info("force_compaction: flush={}", flush);
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
std::optional<major_compaction_task_impl::flush_mode> fmopt;
if (!flush) {
fmopt = major_compaction_task_impl::flush_mode::skip;
}
auto task = co_await compaction_module.make_and_start_task<global_major_compaction_task_impl>({}, db, fmopt);
try {
co_await task->done();
} catch (...) {
apilog.error("force_compaction failed: {}", std::current_exception());
throw;
}
co_return json_void();
});
ss::force_keyspace_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto& db = ctx.db;
auto params = req_params({
std::pair("keyspace", mandatory::yes),
std::pair("cf", mandatory::no),
std::pair("flush_memtables", mandatory::no),
});
params.process(*req);
auto keyspace = validate_keyspace(ctx, *params.get("keyspace"));
auto table_infos = parse_table_infos(keyspace, ctx, params.get("cf").value_or(""));
auto flush = params.get_as<bool>("flush_memtables").value_or(true);
apilog.debug("force_keyspace_compaction: keyspace={} tables={}, flush={}", keyspace, table_infos, flush);
auto keyspace = validate_keyspace(ctx, req->param);
auto table_infos = parse_table_infos(keyspace, ctx, req->query_parameters, "cf");
apilog.debug("force_keyspace_compaction: keyspace={} tables={}", keyspace, table_infos);
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
std::optional<major_compaction_task_impl::flush_mode> fmopt;
if (!flush) {
fmopt = major_compaction_task_impl::flush_mode::skip;
}
auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), tasks::task_id::create_null_id(), db, table_infos, fmopt);
auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), db, table_infos);
try {
co_await task->done();
} catch (...) {
@@ -737,7 +689,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
ss::force_keyspace_cleanup.set(r, [&ctx, &ss](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto& db = ctx.db;
auto keyspace = validate_keyspace(ctx, req);
auto keyspace = validate_keyspace(ctx, req->param);
auto table_infos = parse_table_infos(keyspace, ctx, req->query_parameters, "cf");
apilog.info("force_keyspace_cleanup: keyspace={} tables={}", keyspace, table_infos);
if (!co_await ss.local().is_cleanup_allowed(keyspace)) {
@@ -791,16 +743,8 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
co_return json::json_return_type(0);
}));
ss::force_flush.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
apilog.info("flush all tables");
co_await ctx.db.invoke_on_all([] (replica::database& db) {
return db.flush_all_tables();
});
co_return json_void();
});
ss::force_keyspace_flush.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto keyspace = validate_keyspace(ctx, req);
auto keyspace = validate_keyspace(ctx, req->param);
auto column_families = parse_tables(keyspace, ctx, req->query_parameters, "cf");
apilog.info("perform_keyspace_flush: keyspace={} tables={}", keyspace, column_families);
auto& db = ctx.db;
@@ -909,7 +853,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
ss::truncate.set(r, [&ctx](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
auto keyspace = validate_keyspace(ctx, req);
auto keyspace = validate_keyspace(ctx, req->param);
auto column_family = req->get_query_param("cf");
return make_ready_future<json::json_return_type>(json_void());
});
@@ -1043,14 +987,14 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
ss::bulk_load.set(r, [](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
auto path = req->get_path_param("path");
auto path = req->param["path"];
return make_ready_future<json::json_return_type>(json_void());
});
ss::bulk_load_async.set(r, [](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
auto path = req->get_path_param("path");
auto path = req->param["path"];
return make_ready_future<json::json_return_type>(json_void());
});
@@ -1138,7 +1082,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
});
ss::enable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {
auto keyspace = validate_keyspace(ctx, req);
auto keyspace = validate_keyspace(ctx, req->param);
auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");
apilog.info("enable_auto_compaction: keyspace={} tables={}", keyspace, tables);
@@ -1146,7 +1090,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
});
ss::disable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {
auto keyspace = validate_keyspace(ctx, req);
auto keyspace = validate_keyspace(ctx, req->param);
auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");
apilog.info("disable_auto_compaction: keyspace={} tables={}", keyspace, tables);
@@ -1154,7 +1098,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
});
ss::enable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {
auto keyspace = validate_keyspace(ctx, req);
auto keyspace = validate_keyspace(ctx, req->param);
auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");
apilog.info("enable_tombstone_gc: keyspace={} tables={}", keyspace, tables);
@@ -1162,7 +1106,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
});
ss::disable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {
auto keyspace = validate_keyspace(ctx, req);
auto keyspace = validate_keyspace(ctx, req->param);
auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");
apilog.info("disable_tombstone_gc: keyspace={} tables={}", keyspace, tables);
@@ -1258,7 +1202,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
});
ss::get_effective_ownership.set(r, [&ctx, &ss] (std::unique_ptr<http::request> req) {
auto keyspace_name = req->get_path_param("keyspace") == "null" ? "" : validate_keyspace(ctx, req);
auto keyspace_name = req->param["keyspace"] == "null" ? "" : validate_keyspace(ctx, req->param);
return ss.local().effective_ownership(keyspace_name).then([] (auto&& ownership) {
std::vector<storage_service_json::mapper> res;
return make_ready_future<json::json_return_type>(map_to_key_value(ownership, res));
@@ -1443,12 +1387,10 @@ void unset_storage_service(http_context& ctx, routes& r) {
ss::get_current_generation_number.unset(r);
ss::get_natural_endpoints.unset(r);
ss::cdc_streams_check_and_repair.unset(r);
ss::force_compaction.unset(r);
ss::force_keyspace_compaction.unset(r);
ss::force_keyspace_cleanup.unset(r);
ss::perform_keyspace_offstrategy_compaction.unset(r);
ss::upgrade_sstables.unset(r);
ss::force_flush.unset(r);
ss::force_keyspace_flush.unset(r);
ss::decommission.unset(r);
ss::move.unset(r);
@@ -1546,10 +1488,8 @@ void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_
});
}).then([&s] {
return s.write("]").then([&s] {
return s.flush();
return s.close();
});
}).finally([&s] {
return s.close();
});
});
};

View File

@@ -37,11 +37,11 @@ namespace api {
// verify that the keyspace is found, otherwise a bad_param_exception exception is thrown
// containing the description of the respective keyspace error.
sstring validate_keyspace(const http_context& ctx, sstring ks_name);
sstring validate_keyspace(http_context& ctx, sstring ks_name);
// verify that the keyspace parameter is found, otherwise a bad_param_exception exception is thrown
// containing the description of the respective keyspace error.
sstring validate_keyspace(const http_context& ctx, const std::unique_ptr<http::request>& req);
sstring validate_keyspace(http_context& ctx, const httpd::parameters& param);
// splits a request parameter assumed to hold a comma-separated list of table names
// verify that the tables are found, otherwise a bad_param_exception exception is thrown

View File

@@ -106,7 +106,7 @@ void set_stream_manager(http_context& ctx, routes& r, sharded<streaming::stream_
});
hs::get_total_incoming_bytes.set(r, [&sm](std::unique_ptr<request> req) {
gms::inet_address peer(req->get_path_param("peer"));
gms::inet_address peer(req->param["peer"]);
return sm.map_reduce0([peer](streaming::stream_manager& sm) {
return sm.get_progress_on_all_shards(peer).then([] (auto sbytes) {
return sbytes.bytes_received;
@@ -127,7 +127,7 @@ void set_stream_manager(http_context& ctx, routes& r, sharded<streaming::stream_
});
hs::get_total_outgoing_bytes.set(r, [&sm](std::unique_ptr<request> req) {
gms::inet_address peer(req->get_path_param("peer"));
gms::inet_address peer(req->param["peer"]);
return sm.map_reduce0([peer] (streaming::stream_manager& sm) {
return sm.get_progress_on_all_shards(peer).then([] (auto sbytes) {
return sbytes.bytes_sent;

View File

@@ -119,9 +119,9 @@ void set_system(http_context& ctx, routes& r) {
hs::get_logger_level.set(r, [](const_req req) {
try {
return logging::level_name(logging::logger_registry().get_logger_level(req.get_path_param("name")));
return logging::level_name(logging::logger_registry().get_logger_level(req.param["name"]));
} catch (std::out_of_range& e) {
throw bad_param_exception("Unknown logger name " + req.get_path_param("name"));
throw bad_param_exception("Unknown logger name " + req.param["name"]);
}
// just to keep the compiler happy
return sstring();
@@ -130,9 +130,9 @@ void set_system(http_context& ctx, routes& r) {
hs::set_logger_level.set(r, [](const_req req) {
try {
logging::log_level level = boost::lexical_cast<logging::log_level>(std::string(req.get_query_param("level")));
logging::logger_registry().set_logger_level(req.get_path_param("name"), level);
logging::logger_registry().set_logger_level(req.param["name"], level);
} catch (std::out_of_range& e) {
throw bad_param_exception("Unknown logger name " + req.get_path_param("name"));
throw bad_param_exception("Unknown logger name " + req.param["name"]);
} catch (boost::bad_lexical_cast& e) {
throw bad_param_exception("Unknown logging level " + req.get_query_param("level"));
}

View File

@@ -7,7 +7,6 @@
*/
#include <seastar/core/coroutine.hh>
#include <seastar/coroutine/exception.hh>
#include "task_manager.hh"
#include "api/api-doc/task_manager.json.hh"
@@ -112,20 +111,20 @@ future<full_task_status> retrieve_status(const tasks::task_manager::foreign_task
co_return s;
}
void set_task_manager(http_context& ctx, routes& r, db::config& cfg) {
tm::get_modules.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
std::vector<std::string> v = boost::copy_range<std::vector<std::string>>(ctx.tm.local().get_modules() | boost::adaptors::map_keys);
void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>& tm, db::config& cfg) {
tm::get_modules.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
std::vector<std::string> v = boost::copy_range<std::vector<std::string>>(tm.local().get_modules() | boost::adaptors::map_keys);
co_return v;
});
tm::get_tasks.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
tm::get_tasks.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
using chunked_stats = utils::chunked_vector<task_stats>;
auto internal = tasks::is_internal{req_param<bool>(*req, "internal", false)};
std::vector<chunked_stats> res = co_await ctx.tm.map([&req, internal] (tasks::task_manager& tm) {
std::vector<chunked_stats> res = co_await tm.map([&req, internal] (tasks::task_manager& tm) {
chunked_stats local_res;
tasks::task_manager::module_ptr module;
try {
module = tm.find_module(req->get_path_param("module"));
module = tm.find_module(req->param["module"]);
} catch (...) {
throw bad_param_exception(fmt::format("{}", std::current_exception()));
}
@@ -140,37 +139,28 @@ void set_task_manager(http_context& ctx, routes& r, db::config& cfg) {
std::function<future<>(output_stream<char>&&)> f = [r = std::move(res)] (output_stream<char>&& os) -> future<> {
auto s = std::move(os);
std::exception_ptr ex;
try {
auto res = std::move(r);
co_await s.write("[");
std::string delim = "";
for (auto& v: res) {
for (auto& stats: v) {
co_await s.write(std::exchange(delim, ", "));
tm::task_stats ts;
ts = stats;
co_await formatter::write(s, ts);
}
auto res = std::move(r);
co_await s.write("[");
std::string delim = "";
for (auto& v: res) {
for (auto& stats: v) {
co_await s.write(std::exchange(delim, ", "));
tm::task_stats ts;
ts = stats;
co_await formatter::write(s, ts);
}
co_await s.write("]");
co_await s.flush();
} catch (...) {
ex = std::current_exception();
}
co_await s.write("]");
co_await s.close();
if (ex) {
co_await coroutine::return_exception_ptr(std::move(ex));
}
};
co_return std::move(f);
});
tm::get_task_status.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->get_path_param("task_id")}};
tm::get_task_status.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->param["task_id"]}};
tasks::task_manager::foreign_task_ptr task;
try {
task = co_await tasks::task_manager::invoke_on_task(ctx.tm, id, std::function([] (tasks::task_manager::task_ptr task) -> future<tasks::task_manager::foreign_task_ptr> {
task = co_await tasks::task_manager::invoke_on_task(tm, id, std::function([] (tasks::task_manager::task_ptr task) -> future<tasks::task_manager::foreign_task_ptr> {
if (task->is_complete()) {
task->unregister_task();
}
@@ -183,10 +173,10 @@ void set_task_manager(http_context& ctx, routes& r, db::config& cfg) {
co_return make_status(s);
});
tm::abort_task.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->get_path_param("task_id")}};
tm::abort_task.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->param["task_id"]}};
try {
co_await tasks::task_manager::invoke_on_task(ctx.tm, id, [] (tasks::task_manager::task_ptr task) -> future<> {
co_await tasks::task_manager::invoke_on_task(tm, id, [] (tasks::task_manager::task_ptr task) -> future<> {
if (!task->is_abortable()) {
co_await coroutine::return_exception(std::runtime_error("Requested task cannot be aborted"));
}
@@ -198,11 +188,11 @@ void set_task_manager(http_context& ctx, routes& r, db::config& cfg) {
co_return json_void();
});
tm::wait_task.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->get_path_param("task_id")}};
tm::wait_task.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->param["task_id"]}};
tasks::task_manager::foreign_task_ptr task;
try {
task = co_await tasks::task_manager::invoke_on_task(ctx.tm, id, std::function([] (tasks::task_manager::task_ptr task) {
task = co_await tasks::task_manager::invoke_on_task(tm, id, std::function([] (tasks::task_manager::task_ptr task) {
return task->done().then_wrapped([task] (auto f) {
task->unregister_task();
// done() is called only because we want the task to be complete before getting its status.
@@ -218,16 +208,16 @@ void set_task_manager(http_context& ctx, routes& r, db::config& cfg) {
co_return make_status(s);
});
tm::get_task_status_recursively.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto& _ctx = ctx;
auto id = tasks::task_id{utils::UUID{req->get_path_param("task_id")}};
tm::get_task_status_recursively.set(r, [&_tm = tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto& tm = _tm;
auto id = tasks::task_id{utils::UUID{req->param["task_id"]}};
std::queue<tasks::task_manager::foreign_task_ptr> q;
utils::chunked_vector<full_task_status> res;
tasks::task_manager::foreign_task_ptr task;
try {
// Get requested task.
task = co_await tasks::task_manager::invoke_on_task(_ctx.tm, id, std::function([] (tasks::task_manager::task_ptr task) -> future<tasks::task_manager::foreign_task_ptr> {
task = co_await tasks::task_manager::invoke_on_task(tm, id, std::function([] (tasks::task_manager::task_ptr task) -> future<tasks::task_manager::foreign_task_ptr> {
if (task->is_complete()) {
task->unregister_task();
}
@@ -242,8 +232,8 @@ void set_task_manager(http_context& ctx, routes& r, db::config& cfg) {
while (!q.empty()) {
auto& current = q.front();
res.push_back(co_await retrieve_status(current));
for (auto& child: current->get_children()) {
q.push(co_await child.copy());
for (size_t i = 0; i < current->get_children().size(); ++i) {
q.push(co_await current->get_children()[i].copy());
}
q.pop();
}
@@ -274,4 +264,14 @@ void set_task_manager(http_context& ctx, routes& r, db::config& cfg) {
});
}
void unset_task_manager(http_context& ctx, routes& r) {
tm::get_modules.unset(r);
tm::get_tasks.unset(r);
tm::get_task_status.unset(r);
tm::abort_task.unset(r);
tm::wait_task.unset(r);
tm::get_task_status_recursively.unset(r);
tm::get_and_update_ttl.unset(r);
}
}

View File

@@ -8,11 +8,17 @@
#pragma once
#include <seastar/core/sharded.hh>
#include "api.hh"
#include "db/config.hh"
namespace tasks {
class task_manager;
}
namespace api {
void set_task_manager(http_context& ctx, httpd::routes& r, db::config& cfg);
void set_task_manager(http_context& ctx, httpd::routes& r, sharded<tasks::task_manager>& tm, db::config& cfg);
void unset_task_manager(http_context& ctx, httpd::routes& r);
}

View File

@@ -20,17 +20,17 @@ namespace tmt = httpd::task_manager_test_json;
using namespace json;
using namespace seastar::httpd;
void set_task_manager_test(http_context& ctx, routes& r) {
tmt::register_test_module.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
co_await ctx.tm.invoke_on_all([] (tasks::task_manager& tm) {
void set_task_manager_test(http_context& ctx, routes& r, sharded<tasks::task_manager>& tm) {
tmt::register_test_module.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
co_await tm.invoke_on_all([] (tasks::task_manager& tm) {
auto m = make_shared<tasks::test_module>(tm);
tm.register_module("test", m);
});
co_return json_void();
});
tmt::unregister_test_module.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
co_await ctx.tm.invoke_on_all([] (tasks::task_manager& tm) -> future<> {
tmt::unregister_test_module.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
co_await tm.invoke_on_all([] (tasks::task_manager& tm) -> future<> {
auto module_name = "test";
auto module = tm.find_module(module_name);
co_await module->stop();
@@ -38,8 +38,8 @@ void set_task_manager_test(http_context& ctx, routes& r) {
co_return json_void();
});
tmt::register_test_task.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
sharded<tasks::task_manager>& tms = ctx.tm;
tmt::register_test_task.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
sharded<tasks::task_manager>& tms = tm;
auto it = req->query_parameters.find("task_id");
auto id = it != req->query_parameters.end() ? tasks::task_id{utils::UUID{it->second}} : tasks::task_id::create_null_id();
it = req->query_parameters.find("shard");
@@ -54,7 +54,7 @@ void set_task_manager_test(http_context& ctx, routes& r) {
tasks::task_info data;
if (it != req->query_parameters.end()) {
data.id = tasks::task_id{utils::UUID{it->second}};
auto parent_ptr = co_await tasks::task_manager::lookup_task_on_all_shards(ctx.tm, data.id);
auto parent_ptr = co_await tasks::task_manager::lookup_task_on_all_shards(tm, data.id);
data.shard = parent_ptr->get_status().shard;
}
@@ -69,10 +69,10 @@ void set_task_manager_test(http_context& ctx, routes& r) {
co_return id.to_sstring();
});
tmt::unregister_test_task.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
tmt::unregister_test_task.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->query_parameters["task_id"]}};
try {
co_await tasks::task_manager::invoke_on_task(ctx.tm, id, [] (tasks::task_manager::task_ptr task) -> future<> {
co_await tasks::task_manager::invoke_on_task(tm, id, [] (tasks::task_manager::task_ptr task) -> future<> {
tasks::test_task test_task{task};
co_await test_task.unregister_task();
});
@@ -82,14 +82,14 @@ void set_task_manager_test(http_context& ctx, routes& r) {
co_return json_void();
});
tmt::finish_test_task.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->get_path_param("task_id")}};
tmt::finish_test_task.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->param["task_id"]}};
auto it = req->query_parameters.find("error");
bool fail = it != req->query_parameters.end();
std::string error = fail ? it->second : "";
try {
co_await tasks::task_manager::invoke_on_task(ctx.tm, id, [fail, error = std::move(error)] (tasks::task_manager::task_ptr task) {
co_await tasks::task_manager::invoke_on_task(tm, id, [fail, error = std::move(error)] (tasks::task_manager::task_ptr task) {
tasks::test_task test_task{task};
if (fail) {
test_task.finish_failed(std::make_exception_ptr(std::runtime_error(error)));
@@ -105,6 +105,14 @@ void set_task_manager_test(http_context& ctx, routes& r) {
});
}
void unset_task_manager_test(http_context& ctx, routes& r) {
tmt::register_test_module.unset(r);
tmt::unregister_test_module.unset(r);
tmt::register_test_task.unset(r);
tmt::unregister_test_task.unset(r);
tmt::finish_test_task.unset(r);
}
}
#endif

View File

@@ -10,11 +10,17 @@
#pragma once
#include <seastar/core/sharded.hh>
#include "api.hh"
namespace tasks {
class task_manager;
}
namespace api {
void set_task_manager_test(http_context& ctx, httpd::routes& r);
void set_task_manager_test(http_context& ctx, httpd::routes& r, sharded<tasks::task_manager>& tm);
void unset_task_manager_test(http_context& ctx, httpd::routes& r);
}

View File

@@ -245,8 +245,6 @@ future<authenticated_user> password_authenticator::authenticate(
std::throw_with_nested(exceptions::authentication_exception(e.what()));
} catch (exceptions::authentication_exception& e) {
std::throw_with_nested(e);
} catch (exceptions::unavailable_exception& e) {
std::throw_with_nested(exceptions::authentication_exception(e.get_message()));
} catch (...) {
std::throw_with_nested(exceptions::authentication_exception("authentication failed"));
}

View File

@@ -89,7 +89,7 @@ public:
// get the delimeter if any
auto it = ctx.begin();
auto end = ctx.end();
if (it != end && *it != '}') {
if (it != end) {
int group_size = *it++ - '0';
if (group_size < 0 ||
static_cast<size_t>(group_size) > sizeof(uint64_t)) {

View File

@@ -98,16 +98,7 @@ class cache_flat_mutation_reader final : public flat_mutation_reader_v2::impl {
bool _next_row_in_range = false;
bool _has_rt = false;
// True iff current population interval starts at before_all_clustered_rows
// and _last_row is unset. (And the read isn't reverse).
//
// Rationale: in the "most general" step of cache population,
// we mark the `(_last_row, ...] `range as continuous, which can involve doing something to `_last_row`.
// But when populating the range `(before_all_clustered_rows, ...)`,
// a rows_entry at `before_all_clustered_rows` needn't exist.
// Thus this case needs a special treatment which doesn't involve `_last_row`.
// And for that, this case it has to be recognized (via this flag).
//
// True iff current population interval, since the previous clustering row, starts before all clustered rows.
// We cannot just look at _lower_bound, because emission of range tombstones changes _lower_bound and
// because we mark clustering intervals as continuous when consuming a clustering_row, it would prevent
// us from marking the interval as continuous.
@@ -156,8 +147,6 @@ class cache_flat_mutation_reader final : public flat_mutation_reader_v2::impl {
bool maybe_add_to_cache(const range_tombstone_change& rtc);
void maybe_add_to_cache(const static_row& sr);
void maybe_set_static_row_continuous();
void set_rows_entry_continuous(rows_entry& e);
void restore_continuity_after_insertion(const mutation_partition::rows_type::iterator&);
void finish_reader() {
push_mutation_fragment(*_schema, _permit, partition_end());
_end_of_stream = true;
@@ -352,7 +341,7 @@ future<> cache_flat_mutation_reader::do_fill_buffer() {
});
}
_state = state::reading_from_underlying;
_population_range_starts_before_all_rows = _lower_bound.is_before_all_clustered_rows(*_schema) && !_read_context.is_reversed() && !_last_row;
_population_range_starts_before_all_rows = _lower_bound.is_before_all_clustered_rows(*_schema) && !_read_context.is_reversed();
_underlying_upper_bound = _next_row_in_range ? position_in_partition::before_key(_next_row.position())
: position_in_partition(_upper_bound);
if (!_read_context.partition_exists()) {
@@ -453,10 +442,7 @@ future<> cache_flat_mutation_reader::read_from_underlying() {
auto e = alloc_strategy_unique_ptr<rows_entry>(
current_allocator().construct<rows_entry>(_ck_ranges_curr->start()->value()));
// Use _next_row iterator only as a hint, because there could be insertions after _upper_bound.
auto insert_result = rows.insert_before_hint(
_next_row.at_a_row() ? _next_row.get_iterator_in_latest_version() : rows.begin(),
std::move(e),
cmp);
auto insert_result = rows.insert_before_hint(_next_row.get_iterator_in_latest_version(), std::move(e), cmp);
if (insert_result.second) {
auto it = insert_result.first;
_snp->tracker()->insert(*it);
@@ -473,22 +459,18 @@ future<> cache_flat_mutation_reader::read_from_underlying() {
auto e = alloc_strategy_unique_ptr<rows_entry>(
current_allocator().construct<rows_entry>(table_s, to_table_domain(_upper_bound), is_dummy::yes, is_continuous::no));
// Use _next_row iterator only as a hint, because there could be insertions after _upper_bound.
auto insert_result = rows.insert_before_hint(
_next_row.at_a_row() ? _next_row.get_iterator_in_latest_version() : rows.begin(),
std::move(e),
cmp);
auto insert_result = rows.insert_before_hint(_next_row.get_iterator_in_latest_version(), std::move(e), cmp);
if (insert_result.second) {
clogger.trace("csm {}: L{}: inserted dummy at {}", fmt::ptr(this), __LINE__, _upper_bound);
_snp->tracker()->insert(*insert_result.first);
restore_continuity_after_insertion(insert_result.first);
}
if (_read_context.is_reversed()) [[unlikely]] {
clogger.trace("csm {}: set_continuous({}), prev={}, rt={}", fmt::ptr(this), _last_row.position(), insert_result.first->position(), _current_tombstone);
set_rows_entry_continuous(*_last_row);
_last_row->set_continuous(true);
_last_row->set_range_tombstone(_current_tombstone);
} else {
clogger.trace("csm {}: set_continuous({}), prev={}, rt={}", fmt::ptr(this), insert_result.first->position(), _last_row.position(), _current_tombstone);
set_rows_entry_continuous(*insert_result.first);
insert_result.first->set_continuous(true);
insert_result.first->set_range_tombstone(_current_tombstone);
}
maybe_drop_last_entry(_current_tombstone);
@@ -523,11 +505,11 @@ bool cache_flat_mutation_reader::ensure_population_lower_bound() {
rows_entry::tri_compare cmp(*_schema);
partition_snapshot_row_cursor cur(*_schema, *_snp, false, _read_context.is_reversed());
if (!cur.advance_to(to_query_domain(_last_row.position()))) {
if (!cur.advance_to(_last_row.position())) {
return false;
}
if (cmp(cur.table_position(), _last_row.position()) != 0) {
if (cmp(cur.position(), _last_row.position()) != 0) {
return false;
}
@@ -549,7 +531,7 @@ void cache_flat_mutation_reader::maybe_update_continuity() {
position_in_partition::equal_compare eq(*_schema);
if (can_populate()
&& ensure_population_lower_bound()
&& !eq(_last_row.position(), _next_row.table_position())) {
&& !eq(_last_row.position(), _next_row.position())) {
with_allocator(_snp->region().allocator(), [&] {
rows_entry& e = _next_row.ensure_entry_in_latest().row;
auto& rows = _snp->version()->partition().mutable_clustered_rows();
@@ -571,14 +553,14 @@ void cache_flat_mutation_reader::maybe_update_continuity() {
}
clogger.trace("csm {}: set_continuous({}), prev={}, rt={}", fmt::ptr(this), insert_result.first->position(),
_last_row.position(), _current_tombstone);
set_rows_entry_continuous(*insert_result.first);
insert_result.first->set_continuous(true);
insert_result.first->set_range_tombstone(_current_tombstone);
clogger.trace("csm {}: set_continuous({})", fmt::ptr(this), _last_row.position());
set_rows_entry_continuous(*_last_row);
_last_row->set_continuous(true);
});
} else {
clogger.trace("csm {}: set_continuous({}), rt={}", fmt::ptr(this), _last_row.position(), _current_tombstone);
set_rows_entry_continuous(*_last_row);
_last_row->set_continuous(true);
_last_row->set_range_tombstone(_current_tombstone);
}
} else {
@@ -596,18 +578,18 @@ void cache_flat_mutation_reader::maybe_update_continuity() {
if (insert_result.second) {
clogger.trace("csm {}: L{}: inserted dummy at {}", fmt::ptr(this), __LINE__, insert_result.first->position());
_snp->tracker()->insert(*insert_result.first);
clogger.trace("csm {}: set_continuous({}), prev={}, rt={}", fmt::ptr(this), insert_result.first->position(),
_last_row.position(), _current_tombstone);
set_rows_entry_continuous(*insert_result.first);
insert_result.first->set_range_tombstone(_current_tombstone);
}
clogger.trace("csm {}: set_continuous({}), prev={}, rt={}", fmt::ptr(this), insert_result.first->position(),
_last_row.position(), _current_tombstone);
insert_result.first->set_continuous(true);
insert_result.first->set_range_tombstone(_current_tombstone);
clogger.trace("csm {}: set_continuous({})", fmt::ptr(this), e.position());
set_rows_entry_continuous(e);
e.set_continuous(true);
});
} else {
clogger.trace("csm {}: set_continuous({}), rt={}", fmt::ptr(this), e.position(), _current_tombstone);
e.set_range_tombstone(_current_tombstone);
set_rows_entry_continuous(e);
e.set_continuous(true);
}
}
maybe_drop_last_entry(_current_tombstone);
@@ -637,27 +619,26 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {
current_allocator().construct<rows_entry>(table_schema(), cr.key(), cr.as_deletable_row()));
new_entry->set_continuous(false);
new_entry->set_range_tombstone(_current_tombstone);
auto it = _next_row.iterators_valid() && _next_row.at_a_row() ? _next_row.get_iterator_in_latest_version()
auto it = _next_row.iterators_valid() ? _next_row.get_iterator_in_latest_version()
: mp.clustered_rows().lower_bound(cr.key(), cmp);
auto insert_result = mp.mutable_clustered_rows().insert_before_hint(it, std::move(new_entry), cmp);
it = insert_result.first;
if (insert_result.second) {
_snp->tracker()->insert(*it);
restore_continuity_after_insertion(it);
}
rows_entry& e = *it;
if (ensure_population_lower_bound()) {
if (_read_context.is_reversed()) [[unlikely]] {
clogger.trace("csm {}: set_continuous({})", fmt::ptr(this), _last_row.position());
set_rows_entry_continuous(*_last_row);
_last_row->set_continuous(true);
// _current_tombstone must also apply to _last_row itself (if it's non-dummy)
// because otherwise there would be a rtc after it, either creating a different entry,
// or clearing _last_row if population did not happen.
_last_row->set_range_tombstone(_current_tombstone);
} else {
clogger.trace("csm {}: set_continuous({})", fmt::ptr(this), e.position());
set_rows_entry_continuous(e);
e.set_continuous(true);
e.set_range_tombstone(_current_tombstone);
}
} else {
@@ -702,31 +683,26 @@ bool cache_flat_mutation_reader::maybe_add_to_cache(const range_tombstone_change
auto new_entry = alloc_strategy_unique_ptr<rows_entry>(
current_allocator().construct<rows_entry>(table_schema(), to_table_domain(rtc.position()), is_dummy::yes, is_continuous::no));
auto it = _next_row.iterators_valid() && _next_row.at_a_row() ? _next_row.get_iterator_in_latest_version()
auto it = _next_row.iterators_valid() ? _next_row.get_iterator_in_latest_version()
: mp.clustered_rows().lower_bound(to_table_domain(rtc.position()), cmp);
auto insert_result = mp.mutable_clustered_rows().insert_before_hint(it, std::move(new_entry), cmp);
it = insert_result.first;
if (insert_result.second) {
_snp->tracker()->insert(*it);
restore_continuity_after_insertion(it);
}
rows_entry& e = *it;
if (ensure_population_lower_bound()) {
// underlying may emit range_tombstone_change fragments with the same position.
// In such case, the range to which the tombstone from the first fragment applies is empty and should be ignored.
//
// Note: we are using a query schema comparator to compare table schema positions here,
// but this is okay because we are only checking for equality,
// which is preserved by schema reversals.
if (q_cmp(_last_row.position(), it->position()) != 0) {
if (q_cmp(_last_row.position(), it->position()) < 0) {
if (_read_context.is_reversed()) [[unlikely]] {
clogger.trace("csm {}: set_continuous({}), rt={}", fmt::ptr(this), _last_row.position(), prev);
set_rows_entry_continuous(*_last_row);
_last_row->set_continuous(true);
_last_row->set_range_tombstone(prev);
} else {
clogger.trace("csm {}: set_continuous({}), rt={}", fmt::ptr(this), e.position(), prev);
set_rows_entry_continuous(e);
e.set_continuous(true);
e.set_range_tombstone(prev);
}
}
@@ -905,10 +881,7 @@ void cache_flat_mutation_reader::move_to_range(query::clustering_row_ranges::con
auto& rows = _snp->version()->partition().mutable_clustered_rows();
auto new_entry = alloc_strategy_unique_ptr<rows_entry>(current_allocator().construct<rows_entry>(table_schema(),
to_table_domain(_lower_bound), is_dummy::yes, is_continuous::no));
return rows.insert_before_hint(
_next_row.at_a_row() ? _next_row.get_iterator_in_latest_version() : rows.begin(),
std::move(new_entry),
cmp);
return rows.insert_before_hint(_next_row.get_iterator_in_latest_version(), std::move(new_entry), cmp);
});
auto it = insert_result.first;
if (insert_result.second) {
@@ -1068,28 +1041,6 @@ void cache_flat_mutation_reader::maybe_set_static_row_continuous() {
}
}
// Last dummies can exist in a quasi-evicted state, where they are unlinked from LRU,
// but still alive.
// But while in this state, they mustn't carry any information (i.e. continuity),
// due to the "older versions are evicted first" rule of MVCC.
// Thus, when we make an entry continuous, we must ensure that it isn't an
// unlinked last dummy.
inline
void cache_flat_mutation_reader::set_rows_entry_continuous(rows_entry& e) {
e.set_continuous(true);
if (!e.is_linked()) [[unlikely]] {
_snp->tracker()->touch(e);
}
}
inline
void cache_flat_mutation_reader::restore_continuity_after_insertion(const mutation_partition::rows_type::iterator& it) {
if (auto x = std::next(it); x->continuous()) {
it->set_continuous(true);
it->set_range_tombstone(x->range_tombstone());
}
}
inline
bool cache_flat_mutation_reader::can_populate() const {
return _snp->at_latest_version() && _read_context.cache().phase_of(_read_context.key()) == _read_context.phase();

View File

@@ -51,16 +51,8 @@ namespace db {
namespace cdc {
api::timestamp_clock::duration get_generation_leeway() {
static thread_local auto generation_leeway =
std::chrono::duration_cast<api::timestamp_clock::duration>(std::chrono::seconds(5));
utils::get_local_injector().inject("increase_cdc_generation_leeway", [&] {
generation_leeway = std::chrono::duration_cast<api::timestamp_clock::duration>(std::chrono::minutes(5));
});
return generation_leeway;
}
extern const api::timestamp_clock::duration generation_leeway =
std::chrono::duration_cast<api::timestamp_clock::duration>(std::chrono::seconds(5));
static void copy_int_to_bytes(int64_t i, size_t offset, bytes& b) {
i = net::hton(i);
@@ -380,7 +372,7 @@ db_clock::time_point new_generation_timestamp(bool add_delay, std::chrono::milli
auto ts = db_clock::now();
if (add_delay && ring_delay != 0ms) {
ts += 2 * ring_delay + duration_cast<milliseconds>(get_generation_leeway());
ts += 2 * ring_delay + duration_cast<milliseconds>(generation_leeway);
}
return ts;
}

View File

@@ -46,8 +46,6 @@ namespace gms {
namespace cdc {
api::timestamp_clock::duration get_generation_leeway();
class stream_id final {
bytes _value;
public:

View File

@@ -160,7 +160,7 @@ public:
});
}
void on_before_create_column_family(const schema& schema, std::vector<mutation>& mutations, api::timestamp_type timestamp) override {
void on_before_create_column_family(const keyspace_metadata& ksm, const schema& schema, std::vector<mutation>& mutations, api::timestamp_type timestamp) override {
if (schema.cdc_options().enabled()) {
auto& db = _ctxt._proxy.get_db().local();
auto logname = log_name(schema.cf_name());

View File

@@ -15,6 +15,10 @@
extern logging::logger cdc_log;
namespace cdc {
extern const api::timestamp_clock::duration generation_leeway;
} // namespace cdc
static api::timestamp_type to_ts(db_clock::time_point tp) {
// This assumes that timestamp_clock and db_clock have the same epochs.
return std::chrono::duration_cast<api::timestamp_clock::duration>(tp.time_since_epoch()).count();
@@ -69,7 +73,7 @@ bool cdc::metadata::streams_available() const {
cdc::stream_id cdc::metadata::get_stream(api::timestamp_type ts, dht::token tok) {
auto now = api::new_timestamp();
if (ts > now + get_generation_leeway().count()) {
if (ts > now + generation_leeway.count()) {
throw exceptions::invalid_request_exception(format(
"cdc: attempted to get a stream \"from the future\" ({}; current server time: {})."
" With CDC you cannot send writes with timestamps arbitrarily into the future, because we don't"
@@ -82,43 +86,27 @@ cdc::stream_id cdc::metadata::get_stream(api::timestamp_type ts, dht::token tok)
// Nothing protects us from that until we start using transactions for generation switching.
}
auto it = gen_used_at(now - get_generation_leeway().count());
if (it != _gens.end()) {
// Garbage-collect generations that will no longer be used.
it = _gens.erase(_gens.begin(), it);
}
if (ts <= now - get_generation_leeway().count()) {
// We reject the write if `ts <= now - generation_leeway` and the write is not to the current generation, which
// happens iff one of the following is true:
// - the write is to no generation,
// - the write is to a generation older than the generation under `it`,
// - the write is to the generation under `it` and that generation is not the current generation.
// Note that we cannot distinguish the first and second cases because we garbage-collect obsolete generations,
// but we can check if one of them takes place (`it == _gens.end() || ts < it->first`). These three conditions
// are sufficient. The write with `ts <= now - generation_leeway` cannot be to one of the generations following
// the generation under `it` because that generation was operating at `now - generation_leeway`.
bool is_previous_gen = it != _gens.end() && std::next(it) != _gens.end() && std::next(it)->first <= now;
if (it == _gens.end() || ts < it->first || is_previous_gen) {
throw exceptions::invalid_request_exception(format(
"cdc: attempted to get a stream \"from the past\" ({}; current server time: {})."
" With CDC you cannot send writes with timestamps too far into the past, because that would break"
" consistency properties.\n"
"We *do* allow sending writes into the near past, but our ability to do that is limited."
" Are you using client-side timestamps? Make sure your clocks are well-synchronized"
" with the database's clocks.", format_timestamp(ts), format_timestamp(now)));
}
}
it = _gens.begin();
if (it == _gens.end() || ts < it->first) {
auto it = gen_used_at(now);
if (it == _gens.end()) {
throw std::runtime_error(format(
"cdc::metadata::get_stream: could not find any CDC stream for timestamp {}."
" Are we in the middle of a cluster upgrade?", format_timestamp(ts)));
"cdc::metadata::get_stream: could not find any CDC stream (current time: {})."
" Are we in the middle of a cluster upgrade?", format_timestamp(now)));
}
// Find the generation operating at `ts`.
// Garbage-collect generations that will no longer be used.
it = _gens.erase(_gens.begin(), it);
if (it->first > ts) {
throw exceptions::invalid_request_exception(format(
"cdc: attempted to get a stream from an earlier generation than the currently used one."
" With CDC you cannot send writes with timestamps too far into the past, because that would break"
" consistency properties (write timestamp: {}, current generation started at: {})",
format_timestamp(ts), format_timestamp(it->first)));
}
// With `generation_leeway` we allow sending writes to the near future. It might happen
// that `ts` doesn't belong to the current generation ("current" according to our clock),
// but to the next generation. Adjust for this case:
{
auto next_it = std::next(it);
while (next_it != _gens.end() && next_it->first <= ts) {
@@ -159,8 +147,8 @@ bool cdc::metadata::known_or_obsolete(db_clock::time_point tp) const {
++it;
}
// Check if the generation is obsolete.
return it != _gens.end() && it->first <= api::new_timestamp() - get_generation_leeway().count();
// Check if some new generation has already superseded this one.
return it != _gens.end() && it->first <= api::new_timestamp();
}
bool cdc::metadata::insert(db_clock::time_point tp, topology_description&& gen) {
@@ -169,7 +157,7 @@ bool cdc::metadata::insert(db_clock::time_point tp, topology_description&& gen)
}
auto now = api::new_timestamp();
auto it = gen_used_at(now - get_generation_leeway().count());
auto it = gen_used_at(now);
if (it != _gens.end()) {
// Garbage-collect generations that will no longer be used.

View File

@@ -42,9 +42,7 @@ class metadata final {
container_t::const_iterator gen_used_at(api::timestamp_type ts) const;
public:
/* Is a generation with the given timestamp already known or obsolete? It is obsolete if and only if
* it is older than the generation operating at `now - get_generation_leeway()`.
*/
/* Is a generation with the given timestamp already known or superseded by a newer generation? */
bool known_or_obsolete(db_clock::time_point) const;
/* Are there streams available. I.e. valid for time == now. If this is false, any writes to
@@ -56,9 +54,8 @@ public:
*
* If the provided timestamp is too far away "into the future" (where "now" is defined according to our local clock),
* we reject the get_stream query. This is because the resulting stream might belong to a generation which we don't
* yet know about. Similarly, we reject queries to the previous generations if the timestamp is too far away "into
* the past". The amount of leeway (how much "into the future" or "into the past" we allow `ts` to be) is defined by
* `get_generation_leeway()`.
* yet know about. The amount of leeway (how much "into the future" we allow `ts` to be) is defined
* by the `cdc::generation_leeway` constant.
*/
stream_id get_stream(api::timestamp_type ts, dht::token tok);

27
cmake/Findrapidxml.cmake Normal file
View File

@@ -0,0 +1,27 @@
#
# Copyright 2023-present ScyllaDB
#
#
# SPDX-License-Identifier: AGPL-3.0-or-later
#
find_path(rapidxml_INCLUDE_DIR
NAMES rapidxml.h rapidxml/rapidxml.hpp)
mark_as_advanced(
rapidxml_INCLUDE_DIR)
include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(rapidxml
REQUIRED_VARS
rapidxml_INCLUDE_DIR)
if(rapidxml_FOUND)
if(NOT TARGET rapidxml::rapidxml)
add_library(rapidxml::rapidxml INTERFACE IMPORTED)
set_target_properties(rapidxml::rapidxml
PROPERTIES
INTERFACE_INCLUDE_DIRECTORIES ${rapidxml_INCLUDE_DIR})
endif()
endif()

View File

@@ -21,7 +21,6 @@ add_compile_options(
set(Seastar_DEFINITIONS_RELEASE
SCYLLA_BUILD_MODE=release)
set(CMAKE_EXE_LINKER_FLAGS_RELEASE
"-Wl,--gc-sections")
add_link_options("LINKER:--gc-sections")
set(stack_usage_threshold_in_KB 13)

View File

@@ -11,7 +11,7 @@ foreach(warning ${disabled_warnings})
endif()
endforeach()
list(TRANSFORM _supported_warnings PREPEND "-Wno-")
string(JOIN " " CMAKE_CXX_FLAGS
add_compile_options(
"-Wall"
"-Werror"
"-Wno-error=deprecated-declarations"
@@ -34,14 +34,94 @@ function(default_target_arch arch)
endif()
endfunction()
function(pad_at_begin output fill str length)
# pad the given `${str} with `${fill}`, right aligned. with the syntax of
# fmtlib:
# fmt::print("{:#>{}}", str, length)
# where `#` is the `${fill}` char
string(LENGTH "${str}" str_len)
math(EXPR padding_len "${length} - ${str_len}")
if(padding_len GREATER 0)
string(REPEAT ${fill} ${padding_len} padding)
endif()
set(${output} "${padding}${str}" PARENT_SCOPE)
endfunction()
# The relocatable package includes its own dynamic linker. We don't
# know the path it will be installed to, so for now use a very long
# path so that patchelf doesn't need to edit the program headers. The
# kernel imposes a limit of 4096 bytes including the null. The other
# constraint is that the build-id has to be in the first page, so we
# can't use all 4096 bytes for the dynamic linker.
# In here we just guess that 2000 extra / should be enough to cover
# any path we get installed to but not so large that the build-id is
# pushed to the second page.
# At the end of the build we check that the build-id is indeed in the
# first page. At install time we check that patchelf doesn't modify
# the program headers.
function(get_padded_dynamic_linker_option output length)
set(dynamic_linker_option "-dynamic-linker")
# capture the drive-generated command line first
execute_process(
COMMAND ${CMAKE_C_COMPILER} "-###" /dev/null -o t
ERROR_VARIABLE driver_command_line
ERROR_STRIP_TRAILING_WHITESPACE)
# extract the argument for the "-dynamic-linker" option
if(driver_command_line MATCHES ".*\"?${dynamic_linker_option}\"? \"?([^ \"]*)\"? .*")
set(dynamic_linker ${CMAKE_MATCH_1})
else()
message(FATAL_ERROR "Unable to find ${dynamic_linker_option} in driver-generated command: "
"${driver_command_line}")
endif()
# prefixing a path with "/"s does not actually change it means
pad_at_begin(padded_dynamic_linker "/" "${dynamic_linker}" ${length})
set(${output} "${dynamic_linker_option}=${padded_dynamic_linker}" PARENT_SCOPE)
endfunction()
add_compile_options("-ffile-prefix-map=${CMAKE_SOURCE_DIR}=.")
default_target_arch(target_arch)
if(target_arch)
string(APPEND CMAKE_CXX_FLAGS " -march=${target_arch}")
add_compile_options("-march=${target_arch}")
endif()
math(EXPR _stack_usage_threshold_in_bytes "${stack_usage_threshold_in_KB} * 1024")
set(_stack_usage_threshold_flag "-Wstack-usage=${_stack_usage_threshold_in_bytes}")
check_cxx_compiler_flag(${_stack_usage_threshold_flag} _stack_usage_flag_supported)
if(_stack_usage_flag_supported)
string(APPEND CMAKE_CXX_FLAGS " ${_stack_usage_threshold_flag}")
add_compile_options("${_stack_usage_threshold_flag}")
endif()
# Force SHA1 build-id generation
add_link_options("LINKER:--build-id=sha1")
include(CheckLinkerFlag)
set(Scylla_USE_LINKER
""
CACHE
STRING
"Use specified linker instead of the default one")
if(Scylla_USE_LINKER)
set(linkers "${Scylla_USE_LINKER}")
else()
set(linkers "lld" "gold")
endif()
foreach(linker ${linkers})
set(linker_flag "-fuse-ld=${linker}")
check_linker_flag(CXX ${linker_flag} "CXX_LINKER_HAVE_${linker}")
if(CXX_LINKER_HAVE_${linker})
add_link_options("${linker_flag}")
break()
elseif(Scylla_USE_LINKER)
message(FATAL_ERROR "${Scylla_USE_LINKER} is not supported.")
endif()
endforeach()
if(DEFINED ENV{NIX_CC})
get_padded_dynamic_linker_option(dynamic_linker_option 0)
else()
# gdb has a SO_NAME_MAX_PATH_SIZE of 512, so limit the path size to
# that. The 512 includes the null at the end, hence the 511 bellow.
get_padded_dynamic_linker_option(dynamic_linker_option 511)
endif()
add_link_options("${dynamic_linker_option}")

View File

@@ -144,21 +144,12 @@ std::ostream& operator<<(std::ostream& os, compaction_type_options::scrub::quara
}
static api::timestamp_type get_max_purgeable_timestamp(const table_state& table_s, sstable_set::incremental_selector& selector,
const std::unordered_set<shared_sstable>& compacting_set, const dht::decorated_key& dk, uint64_t& bloom_filter_checks,
const api::timestamp_type compacting_max_timestamp) {
const std::unordered_set<shared_sstable>& compacting_set, const dht::decorated_key& dk, uint64_t& bloom_filter_checks) {
if (!table_s.tombstone_gc_enabled()) [[unlikely]] {
return api::min_timestamp;
}
auto timestamp = api::max_timestamp;
auto memtable_min_timestamp = table_s.min_memtable_timestamp();
// Use memtable timestamp if it contains data older than the sstables being compacted,
// and if the memtable also contains the key we're calculating max purgeable timestamp for.
// First condition helps to not penalize the common scenario where memtable only contains
// newer data.
if (memtable_min_timestamp <= compacting_max_timestamp && table_s.memtable_has_key(dk)) {
timestamp = memtable_min_timestamp;
}
auto timestamp = table_s.min_memtable_timestamp();
std::optional<utils::hashed_key> hk;
for (auto&& sst : boost::range::join(selector.select(dk).sstables, table_s.compacted_undeleted_sstables())) {
if (compacting_set.contains(sst)) {
@@ -334,16 +325,21 @@ public:
void consume_end_of_stream();
};
using use_backlog_tracker = bool_class<class use_backlog_tracker_tag>;
struct compaction_read_monitor_generator final : public read_monitor_generator {
class compaction_read_monitor final : public sstables::read_monitor, public backlog_read_progress_manager {
sstables::shared_sstable _sst;
table_state& _table_s;
const sstables::reader_position_tracker* _tracker = nullptr;
uint64_t _last_position_seen = 0;
use_backlog_tracker _use_backlog_tracker;
public:
virtual void on_read_started(const sstables::reader_position_tracker& tracker) override {
_tracker = &tracker;
_table_s.get_backlog_tracker().register_compacting_sstable(_sst, *this);
if (_use_backlog_tracker) {
_table_s.get_backlog_tracker().register_compacting_sstable(_sst, *this);
}
}
virtual void on_read_completed() override {
@@ -361,19 +357,19 @@ struct compaction_read_monitor_generator final : public read_monitor_generator {
}
void remove_sstable() {
if (_sst) {
if (_sst && _use_backlog_tracker) {
_table_s.get_backlog_tracker().revert_charges(_sst);
}
_sst = {};
}
compaction_read_monitor(sstables::shared_sstable sst, table_state& table_s)
: _sst(std::move(sst)), _table_s(table_s) { }
compaction_read_monitor(sstables::shared_sstable sst, table_state& table_s, use_backlog_tracker use_backlog_tracker)
: _sst(std::move(sst)), _table_s(table_s), _use_backlog_tracker(use_backlog_tracker) { }
~compaction_read_monitor() {
// We failed to finish handling this SSTable, so we have to update the backlog_tracker
// about it.
if (_sst) {
if (_sst && _use_backlog_tracker) {
_table_s.get_backlog_tracker().revert_charges(_sst);
}
}
@@ -382,12 +378,16 @@ struct compaction_read_monitor_generator final : public read_monitor_generator {
};
virtual sstables::read_monitor& operator()(sstables::shared_sstable sst) override {
auto p = _generated_monitors.emplace(sst->generation(), compaction_read_monitor(sst, _table_s));
auto p = _generated_monitors.emplace(sst->generation(), compaction_read_monitor(sst, _table_s, _use_backlog_tracker));
return p.first->second;
}
explicit compaction_read_monitor_generator(table_state& table_s)
: _table_s(table_s) {}
explicit compaction_read_monitor_generator(table_state& table_s, use_backlog_tracker use_backlog_tracker = use_backlog_tracker::yes)
: _table_s(table_s), _use_backlog_tracker(use_backlog_tracker) {}
uint64_t compacted() const {
return boost::accumulate(_generated_monitors | boost::adaptors::map_values | boost::adaptors::transformed([](auto& monitor) { return monitor.compacted(); }), uint64_t(0));
}
void remove_exhausted_sstables(const std::vector<sstables::shared_sstable>& exhausted_sstables) {
for (auto& sst : exhausted_sstables) {
@@ -400,8 +400,29 @@ struct compaction_read_monitor_generator final : public read_monitor_generator {
private:
table_state& _table_s;
std::unordered_map<generation_type, compaction_read_monitor> _generated_monitors;
use_backlog_tracker _use_backlog_tracker;
friend class compaction_progress_monitor;
};
void compaction_progress_monitor::set_generator(std::unique_ptr<read_monitor_generator> generator) {
_generator = std::move(generator);
}
void compaction_progress_monitor::reset_generator() {
if (_generator) {
_progress = dynamic_cast<compaction_read_monitor_generator&>(*_generator).compacted();
}
_generator = nullptr;
}
uint64_t compaction_progress_monitor::get_progress() const {
if (_generator) {
return dynamic_cast<compaction_read_monitor_generator&>(*_generator).compacted();
}
return _progress;
}
class formatted_sstables_list {
bool _include_origin = true;
std::vector<std::string> _ssts;
@@ -450,9 +471,7 @@ protected:
uint64_t _end_size = 0;
// fully expired files, which are skipped, aren't taken into account.
uint64_t _compacting_data_file_size = 0;
api::timestamp_type _compacting_max_timestamp = api::min_timestamp;
uint64_t _estimated_partitions = 0;
double _estimated_droppable_tombstone_ratio = 0;
uint64_t _bloom_filter_checks = 0;
db::replay_position _rp;
encoding_stats_collector _stats_collector;
@@ -477,32 +496,15 @@ protected:
std::vector<shared_sstable> _used_garbage_collected_sstables;
utils::observable<> _stop_request_observable;
private:
// Keeps track of monitors for input sstable.
// If _update_backlog_tracker is set to true, monitors are responsible for adjusting backlog as compaction progresses.
compaction_progress_monitor& _progress_monitor;
compaction_data& init_compaction_data(compaction_data& cdata, const compaction_descriptor& descriptor) const {
cdata.compaction_fan_in = descriptor.fan_in();
return cdata;
}
// Called in a seastar thread
dht::partition_range_vector
get_ranges_for_invalidation(const std::vector<shared_sstable>& sstables) {
// If owned ranges is disengaged, it means no cleanup work was done and
// so nothing needs to be invalidated.
if (!_owned_ranges) {
return dht::partition_range_vector{};
}
auto owned_ranges = dht::to_partition_ranges(*_owned_ranges, utils::can_yield::yes);
auto non_owned_ranges = boost::copy_range<dht::partition_range_vector>(sstables
| boost::adaptors::transformed([] (const shared_sstable& sst) {
seastar::thread::maybe_yield();
return dht::partition_range::make({sst->get_first_decorated_key(), true},
{sst->get_last_decorated_key(), true});
}));
return dht::subtract_ranges(*_schema, non_owned_ranges, std::move(owned_ranges)).get();
}
protected:
compaction(table_state& table_s, compaction_descriptor descriptor, compaction_data& cdata)
compaction(table_state& table_s, compaction_descriptor descriptor, compaction_data& cdata, compaction_progress_monitor& progress_monitor, use_backlog_tracker use_backlog_tracker)
: _cdata(init_compaction_data(cdata, descriptor))
, _table_s(table_s)
, _sstable_creator(std::move(descriptor.creator))
@@ -521,6 +523,7 @@ protected:
, _owned_ranges(std::move(descriptor.owned_ranges))
, _sharder(descriptor.sharder)
, _owned_ranges_checker(_owned_ranges ? std::optional<dht::incremental_owned_ranges_checker>(*_owned_ranges) : std::nullopt)
, _progress_monitor(progress_monitor)
{
for (auto& sst : _sstables) {
_stats_collector.update(sst->get_encoding_stats_for_compaction());
@@ -529,6 +532,14 @@ protected:
_contains_multi_fragment_runs = std::any_of(_sstables.begin(), _sstables.end(), [&ssts_run_ids] (shared_sstable& sst) {
return !ssts_run_ids.insert(sst->run_identifier()).second;
});
_progress_monitor.set_generator(std::make_unique<compaction_read_monitor_generator>(_table_s, use_backlog_tracker));
}
read_monitor_generator& unwrap_monitor_generator() const {
if (_progress_monitor._generator) {
return *_progress_monitor._generator;
}
return default_read_monitor_generator();
}
virtual uint64_t partitions_per_sstable() const {
@@ -536,7 +547,7 @@ protected:
auto max_sstable_size = std::max<uint64_t>(_max_sstable_size, 1);
uint64_t estimated_sstables = std::max(1UL, uint64_t(ceil(double(_compacting_data_file_size) / max_sstable_size)));
return std::min(uint64_t(ceil(double(_estimated_partitions) / estimated_sstables)),
_table_s.get_compaction_strategy().adjust_partition_estimate(_ms_metadata, _estimated_partitions, _schema));
_table_s.get_compaction_strategy().adjust_partition_estimate(_ms_metadata, _estimated_partitions));
}
void setup_new_sstable(shared_sstable& sst) {
@@ -580,10 +591,9 @@ protected:
return _stats_collector.get();
}
compaction_completion_desc
virtual compaction_completion_desc
get_compaction_completion_desc(std::vector<shared_sstable> input_sstables, std::vector<shared_sstable> output_sstables) {
auto ranges_for_for_invalidation = get_ranges_for_invalidation(input_sstables);
return compaction_completion_desc{std::move(input_sstables), std::move(output_sstables), std::move(ranges_for_for_invalidation)};
return compaction_completion_desc{std::move(input_sstables), std::move(output_sstables)};
}
// Tombstone expiration is enabled based on the presence of sstable set.
@@ -599,8 +609,7 @@ protected:
sstable_writer_config cfg = _table_s.configure_writer("garbage_collection");
cfg.run_identifier = gc_run;
cfg.monitor = monitor.get();
uint64_t estimated_partitions = std::max(1UL, uint64_t(ceil(partitions_per_sstable() * _estimated_droppable_tombstone_ratio)));
auto writer = sst->get_writer(*schema(), estimated_partitions, cfg, get_encoding_stats());
auto writer = sst->get_writer(*schema(), partitions_per_sstable(), cfg, get_encoding_stats());
return compaction_writer(std::move(monitor), std::move(writer), std::move(sst));
}
@@ -654,6 +663,7 @@ public:
compaction& operator=(compaction&& other) = delete;
virtual ~compaction() {
_progress_monitor.reset_generator();
}
private:
// Default range sstable reader that will only return mutation that belongs to current shard.
@@ -719,7 +729,6 @@ private:
auto fully_expired = _table_s.fully_expired_sstables(_sstables, gc_clock::now());
min_max_tracker<api::timestamp_type> timestamp_tracker;
double sum_of_estimated_droppable_tombstone_ratio = 0;
_input_sstable_generations.reserve(_sstables.size());
for (auto& sst : _sstables) {
co_await coroutine::maybe_yield();
@@ -740,16 +749,14 @@ private:
continue;
}
_cdata.compaction_size += sst->data_size();
// We also capture the sstable, so we keep it alive while the read isn't done
ssts->insert(sst);
// FIXME: If the sstables have cardinality estimation bitmaps, use that
// for a better estimate for the number of partitions in the merged
// sstable than just adding up the lengths of individual sstables.
_estimated_partitions += sst->get_estimated_key_count();
auto gc_before = sst->get_gc_before_for_drop_estimation(gc_clock::now(), _table_s.get_tombstone_gc_state(), _schema);
sum_of_estimated_droppable_tombstone_ratio += sst->estimate_droppable_tombstone_ratio(gc_before);
_compacting_data_file_size += sst->ondisk_data_size();
// TODO:
// Note that this is not fully correct. Since we might be merging sstables that originated on
// another shard (#cpu changed), we might be comparing RP:s with differing shard ids,
@@ -758,16 +765,12 @@ private:
// this is kind of ok, esp. since we will hopefully not be trying to recover based on
// compacted sstables anyway (CL should be clean by then).
_rp = std::max(_rp, sst_stats.position);
_compacting_max_timestamp = std::max(_compacting_max_timestamp, sst->get_stats_metadata().max_timestamp);
}
log_info("{} {}", report_start_desc(), formatted_msg);
if (ssts->size() < _sstables.size()) {
log_debug("{} out of {} input sstables are fully expired sstables that will not be actually compacted",
_sstables.size() - ssts->size(), _sstables.size());
}
// _estimated_droppable_tombstone_ratio could exceed 1.0 in certain cases, so limit it to 1.0.
_estimated_droppable_tombstone_ratio = std::min(1.0, sum_of_estimated_droppable_tombstone_ratio / ssts->size());
_compacting = std::move(ssts);
@@ -882,7 +885,7 @@ private:
};
}
return [this] (const dht::decorated_key& dk) {
return get_max_purgeable_timestamp(_table_s, *_selector, _compacting_for_max_purgeable_func, dk, _bloom_filter_checks, _compacting_max_timestamp);
return get_max_purgeable_timestamp(_table_s, *_selector, _compacting_for_max_purgeable_func, dk, _bloom_filter_checks);
};
}
@@ -1083,13 +1086,10 @@ void compacted_fragments_writer::consume_end_of_stream() {
}
class regular_compaction : public compaction {
// keeps track of monitors for input sstable, which are responsible for adjusting backlog as compaction progresses.
mutable compaction_read_monitor_generator _monitor_generator;
seastar::semaphore _replacer_lock = {1};
public:
regular_compaction(table_state& table_s, compaction_descriptor descriptor, compaction_data& cdata)
: compaction(table_s, std::move(descriptor), cdata)
, _monitor_generator(_table_s)
regular_compaction(table_state& table_s, compaction_descriptor descriptor, compaction_data& cdata, compaction_progress_monitor& progress_monitor, use_backlog_tracker use_backlog_tracker = use_backlog_tracker::yes)
: compaction(table_s, std::move(descriptor), cdata, progress_monitor, use_backlog_tracker)
{
}
@@ -1107,7 +1107,7 @@ public:
std::move(trace),
sm_fwd,
mr_fwd,
_monitor_generator);
unwrap_monitor_generator());
}
std::string_view report_start_desc() const override {
@@ -1178,7 +1178,7 @@ private:
log_debug("Replacing earlier exhausted sstable(s) {} by new sstable(s) {}", formatted_sstables_list(exhausted_ssts, false), formatted_sstables_list(_new_unused_sstables, true));
_replacer(get_compaction_completion_desc(exhausted_ssts, std::move(_new_unused_sstables)));
_sstables.erase(exhausted, _sstables.end());
_monitor_generator.remove_exhausted_sstables(exhausted_ssts);
dynamic_cast<compaction_read_monitor_generator&>(unwrap_monitor_generator()).remove_exhausted_sstables(exhausted_ssts);
}
}
@@ -1225,8 +1225,8 @@ private:
return bool(_replacer);
}
public:
reshape_compaction(table_state& table_s, compaction_descriptor descriptor, compaction_data& cdata)
: regular_compaction(table_s, std::move(descriptor), cdata) {
reshape_compaction(table_state& table_s, compaction_descriptor descriptor, compaction_data& cdata, compaction_progress_monitor& progress_monitor)
: regular_compaction(table_s, std::move(descriptor), cdata, progress_monitor, use_backlog_tracker::no) {
}
virtual sstables::sstable_set make_sstable_set_for_input() const override {
@@ -1252,7 +1252,7 @@ public:
std::move(trace),
sm_fwd,
mr_fwd,
default_read_monitor_generator());
unwrap_monitor_generator());
}
std::string_view report_start_desc() const override {
@@ -1289,9 +1289,31 @@ public:
};
class cleanup_compaction final : public regular_compaction {
private:
// Called in a seastar thread
dht::partition_range_vector
get_ranges_for_invalidation(const std::vector<shared_sstable>& sstables) {
auto owned_ranges = dht::to_partition_ranges(*_owned_ranges, utils::can_yield::yes);
auto non_owned_ranges = boost::copy_range<dht::partition_range_vector>(sstables
| boost::adaptors::transformed([] (const shared_sstable& sst) {
seastar::thread::maybe_yield();
return dht::partition_range::make({sst->get_first_decorated_key(), true},
{sst->get_last_decorated_key(), true});
}));
return dht::subtract_ranges(*_schema, non_owned_ranges, std::move(owned_ranges)).get();
}
protected:
virtual compaction_completion_desc
get_compaction_completion_desc(std::vector<shared_sstable> input_sstables, std::vector<shared_sstable> output_sstables) override {
auto ranges_for_for_invalidation = get_ranges_for_invalidation(input_sstables);
return compaction_completion_desc{std::move(input_sstables), std::move(output_sstables), std::move(ranges_for_for_invalidation)};
}
public:
cleanup_compaction(table_state& table_s, compaction_descriptor descriptor, compaction_data& cdata)
: regular_compaction(table_s, std::move(descriptor), cdata)
cleanup_compaction(table_state& table_s, compaction_descriptor descriptor, compaction_data& cdata, compaction_progress_monitor& progress_monitor)
: regular_compaction(table_s, std::move(descriptor), cdata, progress_monitor)
{
}
@@ -1520,8 +1542,8 @@ private:
mutable uint64_t _validation_errors = 0;
public:
scrub_compaction(table_state& table_s, compaction_descriptor descriptor, compaction_data& cdata, compaction_type_options::scrub options)
: regular_compaction(table_s, std::move(descriptor), cdata)
scrub_compaction(table_state& table_s, compaction_descriptor descriptor, compaction_data& cdata, compaction_type_options::scrub options, compaction_progress_monitor& progress_monitor)
: regular_compaction(table_s, std::move(descriptor), cdata, progress_monitor, use_backlog_tracker::no)
, _options(options)
, _scrub_start_description(fmt::format("Scrubbing in {} mode", _options.operation_mode))
, _scrub_finish_description(fmt::format("Finished scrubbing in {} mode", _options.operation_mode)) {
@@ -1548,7 +1570,7 @@ public:
if (!range.is_full()) {
on_internal_error(clogger, fmt::format("Scrub compaction in mode {} expected full partition range, but got {} instead", _options.operation_mode, range));
}
auto crawling_reader = _compacting->make_crawling_reader(std::move(s), std::move(permit), nullptr);
auto crawling_reader = _compacting->make_crawling_reader(std::move(s), std::move(permit), nullptr, unwrap_monitor_generator());
return make_flat_mutation_reader_v2<reader>(std::move(crawling_reader), _options.operation_mode, _validation_errors);
}
@@ -1614,11 +1636,11 @@ private:
uint64_t partitions_per_sstable(shard_id s) const {
uint64_t estimated_sstables = std::max(uint64_t(1), uint64_t(ceil(double(_estimation_per_shard[s].estimated_size) / _max_sstable_size)));
return std::min(uint64_t(ceil(double(_estimation_per_shard[s].estimated_partitions) / estimated_sstables)),
_table_s.get_compaction_strategy().adjust_partition_estimate(_ms_metadata, _estimation_per_shard[s].estimated_partitions, _schema));
_table_s.get_compaction_strategy().adjust_partition_estimate(_ms_metadata, _estimation_per_shard[s].estimated_partitions));
}
public:
resharding_compaction(table_state& table_s, sstables::compaction_descriptor descriptor, compaction_data& cdata)
: compaction(table_s, std::move(descriptor), cdata)
resharding_compaction(table_state& table_s, sstables::compaction_descriptor descriptor, compaction_data& cdata, compaction_progress_monitor& progress_monitor)
: compaction(table_s, std::move(descriptor), cdata, progress_monitor, use_backlog_tracker::no)
, _estimation_per_shard(smp::count)
, _run_identifiers(smp::count)
{
@@ -1652,7 +1674,8 @@ public:
slice,
nullptr,
sm_fwd,
mr_fwd);
mr_fwd,
unwrap_monitor_generator());
}
@@ -1724,47 +1747,49 @@ compaction_type compaction_type_options::type() const {
return index_to_type[_options.index()];
}
static std::unique_ptr<compaction> make_compaction(table_state& table_s, sstables::compaction_descriptor descriptor, compaction_data& cdata) {
static std::unique_ptr<compaction> make_compaction(table_state& table_s, sstables::compaction_descriptor descriptor, compaction_data& cdata, compaction_progress_monitor& progress_monitor) {
struct {
table_state& table_s;
sstables::compaction_descriptor&& descriptor;
compaction_data& cdata;
compaction_progress_monitor& progress_monitor;
std::unique_ptr<compaction> operator()(compaction_type_options::reshape) {
return std::make_unique<reshape_compaction>(table_s, std::move(descriptor), cdata);
return std::make_unique<reshape_compaction>(table_s, std::move(descriptor), cdata, progress_monitor);
}
std::unique_ptr<compaction> operator()(compaction_type_options::reshard) {
return std::make_unique<resharding_compaction>(table_s, std::move(descriptor), cdata);
return std::make_unique<resharding_compaction>(table_s, std::move(descriptor), cdata, progress_monitor);
}
std::unique_ptr<compaction> operator()(compaction_type_options::regular) {
return std::make_unique<regular_compaction>(table_s, std::move(descriptor), cdata);
return std::make_unique<regular_compaction>(table_s, std::move(descriptor), cdata, progress_monitor);
}
std::unique_ptr<compaction> operator()(compaction_type_options::cleanup) {
return std::make_unique<cleanup_compaction>(table_s, std::move(descriptor), cdata);
return std::make_unique<cleanup_compaction>(table_s, std::move(descriptor), cdata, progress_monitor);
}
std::unique_ptr<compaction> operator()(compaction_type_options::upgrade) {
return std::make_unique<cleanup_compaction>(table_s, std::move(descriptor), cdata);
return std::make_unique<cleanup_compaction>(table_s, std::move(descriptor), cdata, progress_monitor);
}
std::unique_ptr<compaction> operator()(compaction_type_options::scrub scrub_options) {
return std::make_unique<scrub_compaction>(table_s, std::move(descriptor), cdata, scrub_options);
return std::make_unique<scrub_compaction>(table_s, std::move(descriptor), cdata, scrub_options, progress_monitor);
}
} visitor_factory{table_s, std::move(descriptor), cdata};
} visitor_factory{table_s, std::move(descriptor), cdata, progress_monitor};
return descriptor.options.visit(visitor_factory);
}
static future<compaction_result> scrub_sstables_validate_mode(sstables::compaction_descriptor descriptor, compaction_data& cdata, table_state& table_s) {
static future<compaction_result> scrub_sstables_validate_mode(sstables::compaction_descriptor descriptor, compaction_data& cdata, table_state& table_s, read_monitor_generator& monitor_generator) {
auto schema = table_s.schema();
auto permit = table_s.make_compaction_reader_permit();
uint64_t validation_errors = 0;
cdata.compaction_size = boost::accumulate(descriptor.sstables | boost::adaptors::transformed([] (auto& sst) { return sst->data_size(); }), int64_t(0));
for (const auto& sst : descriptor.sstables) {
clogger.info("Scrubbing in validate mode {}", sst->get_filename());
validation_errors += co_await sst->validate(permit, cdata.abort, [&schema] (sstring what) {
scrub_compaction::report_validation_error(compaction_type::Scrub, *schema, what);
});
}, monitor_generator(sst));
// Did validation actually finish because aborted?
if (cdata.is_stop_requested()) {
// Compaction manager will catch this exception and re-schedule the compaction.
@@ -1790,8 +1815,15 @@ static future<compaction_result> scrub_sstables_validate_mode(sstables::compacti
};
}
future<compaction_result> scrub_sstables_validate_mode(sstables::compaction_descriptor descriptor, compaction_data& cdata, table_state& table_s, compaction_progress_monitor& progress_monitor) {
progress_monitor.set_generator(std::make_unique<compaction_read_monitor_generator>(table_s, use_backlog_tracker::no));
auto d = defer([&] { progress_monitor.reset_generator(); });
auto res = co_await scrub_sstables_validate_mode(descriptor, cdata, table_s, *progress_monitor._generator);
co_return res;
}
future<compaction_result>
compact_sstables(sstables::compaction_descriptor descriptor, compaction_data& cdata, table_state& table_s) {
compact_sstables(sstables::compaction_descriptor descriptor, compaction_data& cdata, table_state& table_s, compaction_progress_monitor& progress_monitor) {
if (descriptor.sstables.empty()) {
return make_exception_future<compaction_result>(std::runtime_error(format("Called {} compaction with empty set on behalf of {}.{}",
compaction_name(descriptor.options.type()), table_s.schema()->ks_name(), table_s.schema()->cf_name())));
@@ -1799,9 +1831,9 @@ compact_sstables(sstables::compaction_descriptor descriptor, compaction_data& cd
if (descriptor.options.type() == compaction_type::Scrub
&& std::get<compaction_type_options::scrub>(descriptor.options.options()).operation_mode == compaction_type_options::scrub::mode::validate) {
// Bypass the usual compaction machinery for dry-mode scrub
return scrub_sstables_validate_mode(std::move(descriptor), cdata, table_s);
return scrub_sstables_validate_mode(std::move(descriptor), cdata, table_s, progress_monitor);
}
return compaction::run(make_compaction(table_s, std::move(descriptor), cdata));
return compaction::run(make_compaction(table_s, std::move(descriptor), cdata, progress_monitor));
}
std::unordered_set<sstables::shared_sstable>

View File

@@ -48,6 +48,7 @@ struct compaction_info {
};
struct compaction_data {
uint64_t compaction_size = 0;
uint64_t total_partitions = 0;
uint64_t total_keys_written = 0;
sstring stop_requested;
@@ -100,12 +101,27 @@ struct compaction_result {
compaction_stats stats;
};
class read_monitor_generator;
class compaction_progress_monitor {
std::unique_ptr<read_monitor_generator> _generator = nullptr;
uint64_t _progress = 0;
public:
void set_generator(std::unique_ptr<read_monitor_generator> generator);
void reset_generator();
// Returns number of bytes processed with _generator.
uint64_t get_progress() const;
friend class compaction;
friend future<compaction_result> scrub_sstables_validate_mode(sstables::compaction_descriptor, compaction_data&, table_state&, compaction_progress_monitor&);
};
// Compact a list of N sstables into M sstables.
// Returns info about the finished compaction, which includes vector to new sstables.
//
// compaction_descriptor is responsible for specifying the type of compaction, and influencing
// compaction behavior through its available member fields.
future<compaction_result> compact_sstables(sstables::compaction_descriptor descriptor, compaction_data& cdata, table_state& table_s);
future<compaction_result> compact_sstables(sstables::compaction_descriptor descriptor, compaction_data& cdata, table_state& table_s, compaction_progress_monitor& progress_monitor);
// Return list of expired sstables for column family cf.
// A sstable is fully expired *iff* its max_local_deletion_time precedes gc_before and its

View File

@@ -22,7 +22,6 @@
#include "sstables/exceptions.hh"
#include "sstables/sstable_directory.hh"
#include "locator/abstract_replication_strategy.hh"
#include "utils/error_injection.hh"
#include "utils/fb_utilities.hh"
#include "utils/UUID_gen.hh"
#include "db/system_keyspace.hh"
@@ -437,7 +436,7 @@ future<sstables::compaction_result> compaction_task_executor::compact_sstables(s
}
}
co_return co_await sstables::compact_sstables(std::move(descriptor), cdata, t);
co_return co_await sstables::compact_sstables(std::move(descriptor), cdata, t, _progress_monitor);
}
future<> compaction_task_executor::update_history(table_state& t, const sstables::compaction_result& res, const sstables::compaction_data& cdata) {
auto ended_at = std::chrono::duration_cast<std::chrono::milliseconds>(res.stats.ended_at.time_since_epoch());
@@ -478,12 +477,17 @@ public:
: compaction_task_executor(mgr, do_throw_if_stopping, t, compaction_type, std::move(desc))
, sstables_compaction_task_impl(mgr._task_manager_module, tasks::task_id::create_random_id(), 0, "compaction group", t->schema()->ks_name(), t->schema()->cf_name(), std::move(entity), parent_id)
{
_status.progress_units = "bytes";
set_sstables(std::move(sstables));
}
virtual ~sstables_task_executor() = default;
virtual void release_resources() noexcept override;
virtual future<tasks::task_manager::task::progress> get_progress() const override {
return compaction_task_impl::get_progress(_compaction_data, _progress_monitor);
}
protected:
virtual future<> run() override {
return perform();
@@ -498,7 +502,13 @@ public:
tasks::task_id parent_id)
: compaction_task_executor(mgr, do_throw_if_stopping, t, sstables::compaction_type::Compaction, "Major compaction")
, major_compaction_task_impl(mgr._task_manager_module, tasks::task_id::create_random_id(), 0, "compaction group", t->schema()->ks_name(), t->schema()->cf_name(), "", parent_id)
{}
{
_status.progress_units = "bytes";
}
virtual future<tasks::task_manager::task::progress> get_progress() const override {
return compaction_task_impl::get_progress(_compaction_data, _progress_monitor);
}
protected:
virtual future<> run() override {
return perform();
@@ -593,18 +603,24 @@ future<> compaction_manager::perform_major_compaction(table_state& t, std::optio
namespace compaction {
class custom_compaction_task_executor : public compaction_task_executor, public compaction_task_impl {
noncopyable_function<future<>(sstables::compaction_data&)> _job;
noncopyable_function<future<>(sstables::compaction_data&, sstables::compaction_progress_monitor&)> _job;
public:
custom_compaction_task_executor(compaction_manager& mgr, throw_if_stopping do_throw_if_stopping, table_state* t, tasks::task_id parent_id, sstables::compaction_type type, sstring desc, noncopyable_function<future<>(sstables::compaction_data&)> job)
custom_compaction_task_executor(compaction_manager& mgr, throw_if_stopping do_throw_if_stopping, table_state* t, tasks::task_id parent_id, sstables::compaction_type type, sstring desc, noncopyable_function<future<>(sstables::compaction_data&, sstables::compaction_progress_monitor&)> job)
: compaction_task_executor(mgr, do_throw_if_stopping, t, type, std::move(desc))
, compaction_task_impl(mgr._task_manager_module, tasks::task_id::create_random_id(), 0, "compaction group", t->schema()->ks_name(), t->schema()->cf_name(), "", parent_id)
, _job(std::move(job))
{}
{
_status.progress_units = "bytes";
}
virtual std::string type() const override {
return fmt::format("{} compaction", compaction_type());
}
virtual future<tasks::task_manager::task::progress> get_progress() const override {
return compaction_task_impl::get_progress(_compaction_data, _progress_monitor);
}
protected:
virtual future<> run() override {
return perform();
@@ -625,7 +641,7 @@ protected:
// NOTE:
// no need to register shared sstables because they're excluded from non-resharding
// compaction and some of them may not even belong to current shard.
co_await _job(compaction_data());
co_await _job(compaction_data(), _progress_monitor);
finish_compaction();
co_return std::nullopt;
@@ -634,7 +650,7 @@ protected:
}
future<> compaction_manager::run_custom_job(table_state& t, sstables::compaction_type type, const char* desc, noncopyable_function<future<>(sstables::compaction_data&)> job, std::optional<tasks::task_info> info, throw_if_stopping do_throw_if_stopping) {
future<> compaction_manager::run_custom_job(table_state& t, sstables::compaction_type type, const char* desc, noncopyable_function<future<>(sstables::compaction_data&, sstables::compaction_progress_monitor&)> job, std::optional<tasks::task_info> info, throw_if_stopping do_throw_if_stopping) {
auto gh = start_compaction(t);
if (!gh) {
co_return;
@@ -1148,11 +1164,6 @@ protected:
}
virtual future<compaction_manager::compaction_stats_opt> do_run() override {
if (!is_system_keyspace(_status.keyspace)) {
co_await utils::get_local_injector().inject_with_handler("compaction_regular_compaction_task_executor_do_run",
[] (auto& handler) { return handler.wait_for_message(db::timeout_clock::now() + 10s); });
}
co_await coroutine::switch_to(_cm.compaction_sg());
for (;;) {
@@ -1297,12 +1308,17 @@ public:
, offstrategy_compaction_task_impl(mgr._task_manager_module, tasks::task_id::create_random_id(), parent_id ? 0 : mgr._task_manager_module->new_sequence_number(), "compaction group", t->schema()->ks_name(), t->schema()->cf_name(), "", parent_id)
, _performed(performed)
{
_status.progress_units = "bytes";
_performed = false;
}
bool performed() const noexcept {
return _performed;
}
virtual future<tasks::task_manager::task::progress> get_progress() const override {
return compaction_task_impl::get_progress(_compaction_data, _progress_monitor);
}
protected:
virtual future<> run() override {
return perform();
@@ -1327,20 +1343,13 @@ private:
}));
};
auto get_next_job = [&] () -> future<std::optional<sstables::compaction_descriptor>> {
auto candidates = get_reshape_candidates();
if (candidates.empty()) {
co_return std::nullopt;
}
// all sstables added to maintenance set share the same underlying storage.
auto& storage = candidates.front()->get_storage();
sstables::reshape_config cfg = co_await sstables::make_reshape_config(storage, sstables::reshape_mode::strict);
auto desc = t.get_compaction_strategy().get_reshaping_job(get_reshape_candidates(), t.schema(), cfg);
co_return desc.sstables.size() ? std::make_optional(std::move(desc)) : std::nullopt;
auto get_next_job = [&] () -> std::optional<sstables::compaction_descriptor> {
auto desc = t.get_compaction_strategy().get_reshaping_job(get_reshape_candidates(), t.schema(), sstables::reshape_mode::strict);
return desc.sstables.size() ? std::make_optional(std::move(desc)) : std::nullopt;
};
std::exception_ptr err;
while (auto desc = co_await get_next_job()) {
while (auto desc = get_next_job()) {
auto compacting = compacting_sstable_registration(_cm, _cm.get_compaction_state(&t), desc->sstables);
auto on_replace = compacting.update_on_sstable_replacement();
@@ -1583,7 +1592,7 @@ private:
sstables::compaction_descriptor::default_max_sstable_bytes,
sst->run_identifier(),
sstables::compaction_type_options::make_scrub(sstables::compaction_type_options::scrub::mode::validate));
co_return co_await sstables::compact_sstables(std::move(desc), _compaction_data, *_compacting_table);
co_return co_await sstables::compact_sstables(std::move(desc), _compaction_data, *_compacting_table, _progress_monitor);
} catch (sstables::compaction_stopped_exception&) {
// ignore, will be handled by can_proceed()
} catch (storage_io_error& e) {
@@ -1643,6 +1652,7 @@ public:
// will have more space available released by previous jobs.
std::ranges::sort(_pending_cleanup_jobs, std::ranges::greater(), std::mem_fn(&sstables::compaction_descriptor::sstables_size));
_cm._stats.pending_tasks += _pending_cleanup_jobs.size();
_status.progress_units = "bytes";
}
virtual ~cleanup_sstables_compaction_task_executor() = default;
@@ -1653,6 +1663,10 @@ public:
_compacting.release_all();
_owned_ranges_ptr = nullptr;
}
virtual future<tasks::task_manager::task::progress> get_progress() const override {
return compaction_task_impl::get_progress(_compaction_data, _progress_monitor);
}
protected:
virtual future<> run() override {
return perform();
@@ -1802,11 +1816,7 @@ future<> compaction_manager::perform_cleanup(owned_ranges_ptr sorted_owned_range
};
cmlog.debug("perform_cleanup: waiting for sstables to become eligible for cleanup");
try {
co_await t.get_staging_done_condition().when(sleep_duration, [&] { return has_sstables_eligible_for_compaction(); });
} catch (const seastar::condition_variable_timed_out&) {
// Ignored. Keep retrying for max_idle_duration
}
co_await t.get_staging_done_condition().when(sleep_duration, [&] { return has_sstables_eligible_for_compaction(); });
if (!has_sstables_eligible_for_compaction()) {
continue;
@@ -1858,9 +1868,6 @@ future<> compaction_manager::try_perform_cleanup(owned_ranges_ptr sorted_owned_r
if (found_maintenance_sstables) {
co_await perform_offstrategy(t, info);
}
if (utils::get_local_injector().enter("major_compaction_before_cleanup")) {
co_await perform_major_compaction(t, info);
}
// Called with compaction_disabled
auto get_sstables = [this, &t] () -> future<std::vector<sstables::shared_sstable>> {

View File

@@ -345,7 +345,7 @@ public:
// parameter type is the compaction type the operation can most closely be
// associated with, use compaction_type::Compaction, if none apply.
// parameter job is a function that will carry the operation
future<> run_custom_job(compaction::table_state& s, sstables::compaction_type type, const char *desc, noncopyable_function<future<>(sstables::compaction_data&)> job, std::optional<tasks::task_info> info, throw_if_stopping do_throw_if_stopping);
future<> run_custom_job(compaction::table_state& s, sstables::compaction_type type, const char *desc, noncopyable_function<future<>(sstables::compaction_data&, sstables::compaction_progress_monitor&)> job, std::optional<tasks::task_info> info, throw_if_stopping do_throw_if_stopping);
class compaction_reenabler {
compaction_manager& _cm;
@@ -475,6 +475,7 @@ protected:
sstables::compaction_data _compaction_data;
state _state = state::none;
throw_if_stopping _do_throw_if_stopping;
sstables::compaction_progress_monitor _progress_monitor;
private:
shared_future<compaction_manager::compaction_stats_opt> _compaction_done = make_ready_future<compaction_manager::compaction_stats_opt>();

View File

@@ -66,7 +66,7 @@ bool compaction_strategy_impl::worth_dropping_tombstones(const shared_sstable& s
return sst->estimate_droppable_tombstone_ratio(gc_before) >= _tombstone_threshold;
}
uint64_t compaction_strategy_impl::adjust_partition_estimate(const mutation_source_metadata& ms_meta, uint64_t partition_estimate, schema_ptr schema) const {
uint64_t compaction_strategy_impl::adjust_partition_estimate(const mutation_source_metadata& ms_meta, uint64_t partition_estimate) const {
return partition_estimate;
}
@@ -75,7 +75,7 @@ reader_consumer_v2 compaction_strategy_impl::make_interposer_consumer(const muta
}
compaction_descriptor
compaction_strategy_impl::get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_config cfg) const {
compaction_strategy_impl::get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_mode mode) const {
return compaction_descriptor();
}
@@ -700,12 +700,12 @@ compaction_backlog_tracker compaction_strategy::make_backlog_tracker() const {
}
sstables::compaction_descriptor
compaction_strategy::get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_config cfg) const {
return _compaction_strategy_impl->get_reshaping_job(std::move(input), schema, cfg);
compaction_strategy::get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_mode mode) const {
return _compaction_strategy_impl->get_reshaping_job(std::move(input), schema, mode);
}
uint64_t compaction_strategy::adjust_partition_estimate(const mutation_source_metadata& ms_meta, uint64_t partition_estimate, schema_ptr schema) const {
return _compaction_strategy_impl->adjust_partition_estimate(ms_meta, partition_estimate, std::move(schema));
uint64_t compaction_strategy::adjust_partition_estimate(const mutation_source_metadata& ms_meta, uint64_t partition_estimate) const {
return _compaction_strategy_impl->adjust_partition_estimate(ms_meta, partition_estimate);
}
reader_consumer_v2 compaction_strategy::make_interposer_consumer(const mutation_source_metadata& ms_meta, reader_consumer_v2 end_consumer) const {
@@ -739,13 +739,6 @@ compaction_strategy make_compaction_strategy(compaction_strategy_type strategy,
return compaction_strategy(std::move(impl));
}
future<reshape_config> make_reshape_config(const sstables::storage& storage, reshape_mode mode) {
co_return sstables::reshape_config{
.mode = mode,
.free_storage_space = co_await storage.free_space() / smp::count,
};
}
}
namespace compaction {

View File

@@ -31,7 +31,6 @@ class sstable;
class sstable_set;
struct compaction_descriptor;
struct resharding_descriptor;
class storage;
class compaction_strategy {
::shared_ptr<compaction_strategy_impl> _compaction_strategy_impl;
@@ -105,7 +104,7 @@ public:
compaction_backlog_tracker make_backlog_tracker() const;
uint64_t adjust_partition_estimate(const mutation_source_metadata& ms_meta, uint64_t partition_estimate, schema_ptr) const;
uint64_t adjust_partition_estimate(const mutation_source_metadata& ms_meta, uint64_t partition_estimate) const;
reader_consumer_v2 make_interposer_consumer(const mutation_source_metadata& ms_meta, reader_consumer_v2 end_consumer) const;
@@ -123,13 +122,11 @@ public:
//
// The caller should also pass a maximum number of SSTables which is the maximum amount of
// SSTables that can be added into a single job.
compaction_descriptor get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_config cfg) const;
compaction_descriptor get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_mode mode) const;
};
// Creates a compaction_strategy object from one of the strategies available.
compaction_strategy make_compaction_strategy(compaction_strategy_type strategy, const std::map<sstring, sstring>& options);
future<reshape_config> make_reshape_config(const sstables::storage& storage, reshape_mode mode);
}

View File

@@ -68,7 +68,7 @@ public:
virtual std::unique_ptr<compaction_backlog_tracker::impl> make_backlog_tracker() const = 0;
virtual uint64_t adjust_partition_estimate(const mutation_source_metadata& ms_meta, uint64_t partition_estimate, schema_ptr schema) const;
virtual uint64_t adjust_partition_estimate(const mutation_source_metadata& ms_meta, uint64_t partition_estimate) const;
virtual reader_consumer_v2 make_interposer_consumer(const mutation_source_metadata& ms_meta, reader_consumer_v2 end_consumer) const;
@@ -76,6 +76,6 @@ public:
return false;
}
virtual compaction_descriptor get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_config cfg) const;
virtual compaction_descriptor get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_mode mode) const;
};
}

View File

@@ -8,8 +8,6 @@
#pragma once
#include <cstdint>
namespace sstables {
enum class compaction_strategy_type {
@@ -20,10 +18,4 @@ enum class compaction_strategy_type {
};
enum class reshape_mode { strict, relaxed };
struct reshape_config {
reshape_mode mode;
const uint64_t free_storage_space;
};
}

View File

@@ -146,8 +146,7 @@ int64_t leveled_compaction_strategy::estimated_pending_compactions(table_state&
}
compaction_descriptor
leveled_compaction_strategy::get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_config cfg) const {
auto mode = cfg.mode;
leveled_compaction_strategy::get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_mode mode) const {
std::array<std::vector<shared_sstable>, leveled_manifest::MAX_LEVELS> level_info;
auto is_disjoint = [schema] (const std::vector<shared_sstable>& sstables, unsigned tolerance) -> std::tuple<bool, unsigned> {
@@ -204,7 +203,7 @@ leveled_compaction_strategy::get_reshaping_job(std::vector<shared_sstable> input
if (level_info[0].size() > offstrategy_threshold) {
size_tiered_compaction_strategy stcs(_stcs_options);
return stcs.get_reshaping_job(std::move(level_info[0]), schema, cfg);
return stcs.get_reshaping_job(std::move(level_info[0]), schema, mode);
}
for (unsigned level = leveled_manifest::MAX_LEVELS - 1; level > 0; --level) {

View File

@@ -74,7 +74,7 @@ public:
virtual std::unique_ptr<compaction_backlog_tracker::impl> make_backlog_tracker() const override;
virtual compaction_descriptor get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_config cfg) const override;
virtual compaction_descriptor get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_mode mode) const override;
};
}

View File

@@ -297,9 +297,8 @@ size_tiered_compaction_strategy::most_interesting_bucket(const std::vector<sstab
}
compaction_descriptor
size_tiered_compaction_strategy::get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_config cfg) const
size_tiered_compaction_strategy::get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_mode mode) const
{
auto mode = cfg.mode;
size_t offstrategy_threshold = std::max(schema->min_compaction_threshold(), 4);
size_t max_sstables = std::max(schema->max_compaction_threshold(), int(offstrategy_threshold));

View File

@@ -96,7 +96,7 @@ public:
virtual std::unique_ptr<compaction_backlog_tracker::impl> make_backlog_tracker() const override;
virtual compaction_descriptor get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_config cfg) const override;
virtual compaction_descriptor get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_mode mode) const override;
friend class ::size_tiered_backlog_tracker;
};

View File

@@ -48,7 +48,6 @@ public:
virtual sstables::shared_sstable make_sstable() const = 0;
virtual sstables::sstable_writer_config configure_writer(sstring origin) const = 0;
virtual api::timestamp_type min_memtable_timestamp() const = 0;
virtual bool memtable_has_key(const dht::decorated_key& key) const = 0;
virtual future<> on_compaction_completion(sstables::compaction_completion_desc desc, sstables::offstrategy offstrategy) = 0;
virtual bool is_auto_compaction_disabled_by_user() const noexcept = 0;
virtual bool tombstone_gc_enabled() const noexcept = 0;

View File

@@ -14,9 +14,6 @@
#include "sstables/sstables.hh"
#include "sstables/sstable_directory.hh"
#include "utils/pretty_printers.hh"
#include "db/config.hh"
using namespace std::chrono_literals;
namespace replica {
@@ -159,7 +156,7 @@ future<> reshard(sstables::sstable_directory& dir, sstables::sstable_directory::
// jobs. But only one will run in parallel at a time
auto& t = table.as_table_state();
co_await coroutine::parallel_for_each(buckets, [&] (std::vector<sstables::shared_sstable>& sstlist) mutable {
return table.get_compaction_manager().run_custom_job(table.as_table_state(), sstables::compaction_type::Reshard, "Reshard compaction", [&] (sstables::compaction_data& info) -> future<> {
return table.get_compaction_manager().run_custom_job(table.as_table_state(), sstables::compaction_type::Reshard, "Reshard compaction", [&] (sstables::compaction_data& info, sstables::compaction_progress_monitor& progress_monitor) -> future<> {
auto erm = table.get_effective_replication_map(); // keep alive around compaction.
sstables::compaction_descriptor desc(sstlist);
@@ -168,7 +165,7 @@ future<> reshard(sstables::sstable_directory& dir, sstables::sstable_directory::
desc.sharder = &erm->get_sharder(*table.schema());
desc.owned_ranges = owned_ranges_ptr;
auto result = co_await sstables::compact_sstables(std::move(desc), info, t);
auto result = co_await sstables::compact_sstables(std::move(desc), info, t, progress_monitor);
// input sstables are moved, to guarantee their resources are released once we're done
// resharding them.
co_await when_all_succeed(dir.collect_output_unshared_sstables(std::move(result.new_sstables), sstables::sstable_directory::can_be_remote::yes), dir.remove_sstables(std::move(sstlist))).discard_result();
@@ -257,129 +254,22 @@ future<> run_table_tasks(replica::database& db, std::vector<table_tasks_info> ta
}
}
struct keyspace_tasks_info {
tasks::task_manager::task_ptr task;
sstring keyspace;
std::vector<table_info> table_infos;
keyspace_tasks_info(tasks::task_manager::task_ptr t, sstring ks_name, std::vector<table_info> t_infos)
: task(t)
, keyspace(std::move(ks_name))
, table_infos(std::move(t_infos))
{}
};
future<> run_keyspace_tasks(replica::database& db, std::vector<keyspace_tasks_info> keyspace_tasks, seastar::condition_variable& cv, tasks::task_manager::task_ptr& current_task, bool sort) {
std::exception_ptr ex;
// While compaction is run on one table, the size of tables may significantly change.
// Thus, they are sorted before each invidual compaction and the smallest keyspace is chosen.
while (!keyspace_tasks.empty()) {
try {
if (sort) {
// Major compact smaller tables first, to increase chances of success if low on space.
// Tables will be kept in descending order.
std::ranges::sort(keyspace_tasks, std::greater<>(), [&] (const keyspace_tasks_info& kti) {
try {
return std::accumulate(kti.table_infos.begin(), kti.table_infos.end(), int64_t(0), [&] (int64_t sum, const table_info& t) {
try {
sum += db.find_column_family(t.id).get_stats().live_disk_space_used;
} catch (const replica::no_such_column_family&) {
// ignore
}
return sum;
});
} catch (const replica::no_such_keyspace&) {
return int64_t(-1);
}
});
}
// Task responsible for the smallest keyspace.
current_task = keyspace_tasks.back().task;
keyspace_tasks.pop_back();
cv.broadcast();
co_await current_task->done();
} catch (...) {
ex = std::current_exception();
current_task = nullptr;
cv.broken(ex);
break;
}
future<tasks::task_manager::task::progress> compaction_task_impl::get_progress(const sstables::compaction_data& cdata, const sstables::compaction_progress_monitor& progress_monitor) const {
if (cdata.compaction_size == 0) {
co_return get_binary_progress();
}
if (ex) {
// Wait for all tasks even on failure.
for (auto& kti: keyspace_tasks) {
co_await kti.task->done();
}
co_await coroutine::return_exception_ptr(std::move(ex));
}
}
sstring major_compaction_task_impl::to_string(flush_mode fm) {
switch (fm) {
case flush_mode::skip: return "skip";
case flush_mode::compacted_tables: return "compacted_tables";
case flush_mode::all_tables: return "all_tables";
}
__builtin_unreachable();
}
static future<bool> maybe_flush_all_tables(sharded<replica::database>& db) {
auto interval = db.local().get_config().compaction_flush_all_tables_before_major_seconds();
if (interval) {
auto when = db_clock::now() - interval * 1s;
if (co_await replica::database::get_all_tables_flushed_at(db) <= when) {
co_await db.invoke_on_all([&] (replica::database& db) -> future<> {
co_await db.flush_all_tables();
});
co_return true;
}
}
co_return false;
}
future<> global_major_compaction_task_impl::run() {
bool flushed_all_tables = false;
if (_flush_mode == flush_mode::all_tables) {
flushed_all_tables = co_await maybe_flush_all_tables(_db);
}
std::unordered_map<sstring, std::vector<table_info>> tables_by_keyspace;
auto tables_meta = _db.local().get_tables_metadata().get_column_families_copy();
for (const auto& [table_id, t] : tables_meta) {
const auto& ks_name = t->schema()->ks_name();
const auto& table_name = t->schema()->cf_name();
tables_by_keyspace[ks_name].emplace_back(table_name, table_id);
}
seastar::condition_variable cv;
tasks::task_manager::task_ptr current_task;
tasks::task_info parent_info{_status.id, _status.shard};
std::vector<keyspace_tasks_info> keyspace_tasks;
flush_mode fm = flushed_all_tables ? flush_mode::skip : _flush_mode;
for (auto& [ks, table_infos] : tables_by_keyspace) {
auto task = co_await _module->make_and_start_task<major_keyspace_compaction_task_impl>(parent_info, ks, parent_info.id, _db, table_infos, fm,
&cv, &current_task);
keyspace_tasks.emplace_back(std::move(task), ks, std::move(table_infos));
}
co_await run_keyspace_tasks(_db.local(), keyspace_tasks, cv, current_task, false);
co_return tasks::task_manager::task::progress{
.completed = is_done() ? cdata.compaction_size : progress_monitor.get_progress(), // Consider tasks which skip all files.
.total = cdata.compaction_size
};
}
future<> major_keyspace_compaction_task_impl::run() {
if (_cv) {
co_await wait_for_your_turn(*_cv, *_current_task, _status.id);
}
bool flushed_all_tables = false;
if (_flush_mode == flush_mode::all_tables) {
flushed_all_tables = co_await maybe_flush_all_tables(_db);
}
flush_mode fm = flushed_all_tables ? flush_mode::skip : _flush_mode;
co_await _db.invoke_on_all([&] (replica::database& db) -> future<> {
tasks::task_info parent_info{_status.id, _status.shard};
auto& module = db.get_compaction_manager().get_task_manager_module();
auto task = co_await module.make_and_start_task<shard_major_keyspace_compaction_task_impl>(parent_info, _status.keyspace, _status.id, db, _table_infos, fm);
auto task = co_await module.make_and_start_task<shard_major_keyspace_compaction_task_impl>(parent_info, _status.keyspace, _status.id, db, _table_infos);
co_await task->done();
});
}
@@ -390,7 +280,7 @@ future<> shard_major_keyspace_compaction_task_impl::run() {
tasks::task_info parent_info{_status.id, _status.shard};
std::vector<table_tasks_info> table_tasks;
for (auto& ti : _local_tables) {
table_tasks.emplace_back(co_await _module->make_and_start_task<table_major_keyspace_compaction_task_impl>(parent_info, _status.keyspace, ti.name, _status.id, _db, ti, cv, current_task, _flush_mode), ti);
table_tasks.emplace_back(co_await _module->make_and_start_task<table_major_keyspace_compaction_task_impl>(parent_info, _status.keyspace, ti.name, _status.id, _db, ti, cv, current_task), ti);
}
co_await run_table_tasks(_db, std::move(table_tasks), cv, current_task, true);
@@ -399,9 +289,8 @@ future<> shard_major_keyspace_compaction_task_impl::run() {
future<> table_major_keyspace_compaction_task_impl::run() {
co_await wait_for_your_turn(_cv, _current_task, _status.id);
tasks::task_info info{_status.id, _status.shard};
replica::table::do_flush do_flush(_flush_mode != flush_mode::skip);
co_await run_on_table("force_keyspace_compaction", _db, _status.keyspace, _ti, [info, do_flush] (replica::table& t) {
return t.compact_all_sstables(info, do_flush);
co_await run_on_table("force_keyspace_compaction", _db, _status.keyspace, _ti, [info] (replica::table& t) {
return t.compact_all_sstables(info);
});
}
@@ -555,13 +444,7 @@ future<> shard_reshaping_compaction_task_impl::run() {
| boost::adaptors::filtered([&filter = _filter] (const auto& sst) {
return filter(sst);
}));
if (reshape_candidates.empty()) {
break;
}
// all sstables were found in the same sstable_directory instance, so they share the same underlying storage.
auto& storage = reshape_candidates.front()->get_storage();
auto cfg = co_await sstables::make_reshape_config(storage, _mode);
auto desc = table.get_compaction_strategy().get_reshaping_job(std::move(reshape_candidates), table.schema(), cfg);
auto desc = table.get_compaction_strategy().get_reshaping_job(std::move(reshape_candidates), table.schema(), _mode);
if (desc.sstables.empty()) {
break;
}
@@ -580,8 +463,8 @@ future<> shard_reshaping_compaction_task_impl::run() {
std::exception_ptr ex;
try {
co_await table.get_compaction_manager().run_custom_job(table.as_table_state(), sstables::compaction_type::Reshape, "Reshape compaction", [&dir = _dir, &table, sstlist = std::move(sstlist), desc = std::move(desc)] (sstables::compaction_data& info) mutable -> future<> {
sstables::compaction_result result = co_await sstables::compact_sstables(std::move(desc), info, table.as_table_state());
co_await table.get_compaction_manager().run_custom_job(table.as_table_state(), sstables::compaction_type::Reshape, "Reshape compaction", [&dir = _dir, &table, sstlist = std::move(sstlist), desc = std::move(desc)] (sstables::compaction_data& info, sstables::compaction_progress_monitor& progress_monitor) mutable -> future<> {
sstables::compaction_result result = co_await sstables::compact_sstables(std::move(desc), info, table.as_table_state(), progress_monitor);
co_await dir.remove_unshared_sstables(std::move(sstlist));
co_await dir.collect_output_unshared_sstables(std::move(result.new_sstables), sstables::sstable_directory::can_be_remote::no);
}, info, throw_if_stopping::yes);

View File

@@ -8,8 +8,6 @@
#pragma once
#include <fmt/format.h>
#include "compaction/compaction.hh"
#include "replica/database_fwd.hh"
#include "schema/schema_fwd.hh"
@@ -43,16 +41,12 @@ public:
virtual std::string type() const override = 0;
protected:
virtual future<> run() override = 0;
future<tasks::task_manager::task::progress> get_progress(const sstables::compaction_data& cdata, const sstables::compaction_progress_monitor& progress_monitor) const;
};
class major_compaction_task_impl : public compaction_task_impl {
public:
enum class flush_mode {
skip, // Skip flushing. Useful when application explicitly flushes all tables prior to compaction
compacted_tables, // Flush only the compacted keyspace/tables
all_tables // Flush all tables in the database prior to compaction
};
major_compaction_task_impl(tasks::task_manager::module_ptr module,
tasks::task_id id,
unsigned sequence_number,
@@ -60,10 +54,8 @@ public:
std::string keyspace,
std::string table,
std::string entity,
tasks::task_id parent_id,
flush_mode fm = flush_mode::compacted_tables) noexcept
tasks::task_id parent_id) noexcept
: compaction_task_impl(module, id, sequence_number, std::move(scope), std::move(keyspace), std::move(table), std::move(entity), parent_id)
, _flush_mode(fm)
{
// FIXME: add progress units
}
@@ -71,54 +63,22 @@ public:
virtual std::string type() const override {
return "major compaction";
}
static sstring to_string(flush_mode);
protected:
flush_mode _flush_mode;
virtual future<> run() override = 0;
};
class global_major_compaction_task_impl : public major_compaction_task_impl {
private:
sharded<replica::database>& _db;
public:
global_major_compaction_task_impl(tasks::task_manager::module_ptr module,
sharded<replica::database>& db,
std::optional<flush_mode> fm = std::nullopt) noexcept
: major_compaction_task_impl(module, tasks::task_id::create_random_id(), module->new_sequence_number(), "global", "", "", "", tasks::task_id::create_null_id(),
fm.value_or(flush_mode::all_tables))
, _db(db)
{}
protected:
virtual future<> run() override;
};
class major_keyspace_compaction_task_impl : public major_compaction_task_impl {
private:
sharded<replica::database>& _db;
std::vector<table_info> _table_infos;
// _cvp and _current_task are engaged when the task is invoked from
// global_major_compaction_task_impl
seastar::condition_variable* _cv;
tasks::task_manager::task_ptr* _current_task;
public:
major_keyspace_compaction_task_impl(tasks::task_manager::module_ptr module,
std::string keyspace,
tasks::task_id parent_id,
sharded<replica::database>& db,
std::vector<table_info> table_infos,
std::optional<flush_mode> fm = std::nullopt,
seastar::condition_variable* cv = nullptr,
tasks::task_manager::task_ptr* current_task = nullptr) noexcept
: major_compaction_task_impl(module, tasks::task_id::create_random_id(),
parent_id ? 0 : module->new_sequence_number(),
"keyspace", std::move(keyspace), "", "", parent_id,
fm.value_or(flush_mode::all_tables))
std::vector<table_info> table_infos) noexcept
: major_compaction_task_impl(module, tasks::task_id::create_random_id(), module->new_sequence_number(), "keyspace", std::move(keyspace), "", "", tasks::task_id::create_null_id())
, _db(db)
, _table_infos(std::move(table_infos))
, _cv(cv)
, _current_task(current_task)
{}
protected:
virtual future<> run() override;
@@ -133,9 +93,8 @@ public:
std::string keyspace,
tasks::task_id parent_id,
replica::database& db,
std::vector<table_info> local_tables,
flush_mode fm) noexcept
: major_compaction_task_impl(module, tasks::task_id::create_random_id(), 0, "shard", std::move(keyspace), "", "", parent_id, fm)
std::vector<table_info> local_tables) noexcept
: major_compaction_task_impl(module, tasks::task_id::create_random_id(), 0, "shard", std::move(keyspace), "", "", parent_id)
, _db(db)
, _local_tables(std::move(local_tables))
{}
@@ -157,9 +116,8 @@ public:
replica::database& db,
table_info ti,
seastar::condition_variable& cv,
tasks::task_manager::task_ptr& current_task,
flush_mode fm) noexcept
: major_compaction_task_impl(module, tasks::task_id::create_random_id(), 0, "table", std::move(keyspace), std::move(table), "", parent_id, fm)
tasks::task_manager::task_ptr& current_task) noexcept
: major_compaction_task_impl(module, tasks::task_id::create_random_id(), 0, "table", std::move(keyspace), std::move(table), "", parent_id)
, _db(db)
, _ti(std::move(ti))
, _cv(cv)
@@ -704,21 +662,8 @@ public:
virtual std::string type() const override {
return "regular compaction";
}
virtual tasks::is_internal is_internal() const noexcept override {
return tasks::is_internal::yes;
}
protected:
virtual future<> run() override = 0;
};
} // namespace compaction
template <>
struct fmt::formatter<major_compaction_task_impl::flush_mode> {
constexpr auto parse(format_parse_context& ctx) { return ctx.begin(); }
template <typename FormatContext>
auto format(const major_compaction_task_impl::flush_mode& fm, FormatContext& ctx) const {
return fmt::format_to(ctx.out(), "{}", major_compaction_task_impl::to_string(fm));
}
};
}

View File

@@ -184,27 +184,16 @@ public:
};
};
uint64_t time_window_compaction_strategy::adjust_partition_estimate(const mutation_source_metadata& ms_meta, uint64_t partition_estimate, schema_ptr s) const {
// If not enough information, we assume the worst
auto estimated_window_count = max_data_segregation_window_count;
auto default_ttl = std::chrono::duration_cast<std::chrono::microseconds>(s->default_time_to_live());
bool min_and_max_ts_available = ms_meta.min_timestamp && ms_meta.max_timestamp;
auto estimate_window_count = [this] (timestamp_type min_window, timestamp_type max_window) {
const auto window_size = get_window_size(_options);
return (max_window + (window_size - 1) - min_window) / window_size;
};
if (!min_and_max_ts_available && default_ttl.count()) {
auto min_window = get_window_for(_options, timestamp_type(0));
auto max_window = get_window_for(_options, timestamp_type(default_ttl.count()));
estimated_window_count = estimate_window_count(min_window, max_window);
} else if (min_and_max_ts_available) {
auto min_window = get_window_for(_options, *ms_meta.min_timestamp);
auto max_window = get_window_for(_options, *ms_meta.max_timestamp);
estimated_window_count = estimate_window_count(min_window, max_window);
uint64_t time_window_compaction_strategy::adjust_partition_estimate(const mutation_source_metadata& ms_meta, uint64_t partition_estimate) const {
if (!ms_meta.min_timestamp || !ms_meta.max_timestamp) {
// Not enough information, we assume the worst
return partition_estimate / max_data_segregation_window_count;
}
const auto min_window = get_window_for(_options, *ms_meta.min_timestamp);
const auto max_window = get_window_for(_options, *ms_meta.max_timestamp);
const auto window_size = get_window_size(_options);
auto estimated_window_count = (max_window + (window_size - 1) - min_window) / window_size;
return partition_estimate / std::max(1UL, uint64_t(estimated_window_count));
}
@@ -223,14 +212,12 @@ reader_consumer_v2 time_window_compaction_strategy::make_interposer_consumer(con
}
compaction_descriptor
time_window_compaction_strategy::get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_config cfg) const {
auto mode = cfg.mode;
time_window_compaction_strategy::get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_mode mode) const {
std::vector<shared_sstable> single_window;
std::vector<shared_sstable> multi_window;
size_t offstrategy_threshold = std::max(schema->min_compaction_threshold(), 4);
size_t max_sstables = std::max(schema->max_compaction_threshold(), int(offstrategy_threshold));
const uint64_t target_job_size = cfg.free_storage_space * reshape_target_space_overhead;
if (mode == reshape_mode::relaxed) {
offstrategy_threshold = max_sstables;
@@ -262,40 +249,22 @@ time_window_compaction_strategy::get_reshaping_job(std::vector<shared_sstable> i
multi_window.size(), !multi_window.empty() && sstable_set_overlapping_count(schema, multi_window) == 0,
single_window.size(), !single_window.empty() && sstable_set_overlapping_count(schema, single_window) == 0);
auto get_job_size = [] (const std::vector<shared_sstable>& ssts) {
return boost::accumulate(ssts | boost::adaptors::transformed(std::mem_fn(&sstable::bytes_on_disk)), uint64_t(0));
};
// Targets a space overhead of 10%. All disjoint sstables can be compacted together as long as they won't
// cause an overhead above target. Otherwise, the job targets a maximum of #max_threshold sstables.
auto need_trimming = [&] (const std::vector<shared_sstable>& ssts, const uint64_t job_size, bool is_disjoint) {
const size_t min_sstables = 2;
auto is_above_target_size = job_size > target_job_size;
return (ssts.size() > max_sstables && !is_disjoint) ||
(ssts.size() > min_sstables && is_above_target_size);
};
auto maybe_trim_job = [&need_trimming] (std::vector<shared_sstable>& ssts, uint64_t job_size, bool is_disjoint) {
while (need_trimming(ssts, job_size, is_disjoint)) {
auto sst = ssts.back();
ssts.pop_back();
job_size -= sst->bytes_on_disk();
}
auto need_trimming = [max_sstables, schema, &is_disjoint] (const std::vector<shared_sstable>& ssts) {
// All sstables can be compacted at once if they're disjoint, given that partitioned set
// will incrementally open sstables which translates into bounded memory usage.
return ssts.size() > max_sstables && !is_disjoint(ssts);
};
if (!multi_window.empty()) {
auto disjoint = is_disjoint(multi_window);
auto job_size = get_job_size(multi_window);
// Everything that spans multiple windows will need reshaping
if (need_trimming(multi_window, job_size, disjoint)) {
if (need_trimming(multi_window)) {
// When trimming, let's keep sstables with overlapping time window, so as to reduce write amplification.
// For example, if there are N sstables spanning window W, where N <= 32, then we can produce all data for W
// in a single compaction round, removing the need to later compact W to reduce its number of files.
boost::partial_sort(multi_window, multi_window.begin() + max_sstables, [](const shared_sstable &a, const shared_sstable &b) {
return a->get_stats_metadata().max_timestamp < b->get_stats_metadata().max_timestamp;
});
maybe_trim_job(multi_window, job_size, disjoint);
multi_window.resize(max_sstables);
}
compaction_descriptor desc(std::move(multi_window));
desc.options = compaction_type_options::make_reshape();
@@ -314,17 +283,15 @@ time_window_compaction_strategy::get_reshaping_job(std::vector<shared_sstable> i
std::copy(ssts.begin(), ssts.end(), std::back_inserter(single_window));
continue;
}
// reuse STCS reshape logic which will only compact similar-sized files, to increase overall efficiency
// when reshaping time buckets containing a huge amount of files
auto desc = size_tiered_compaction_strategy(_stcs_options).get_reshaping_job(std::move(ssts), schema, cfg);
auto desc = size_tiered_compaction_strategy(_stcs_options).get_reshaping_job(std::move(ssts), schema, mode);
if (!desc.sstables.empty()) {
return desc;
}
}
}
if (!single_window.empty()) {
maybe_trim_job(single_window, get_job_size(single_window), all_disjoint);
compaction_descriptor desc(std::move(single_window));
desc.options = compaction_type_options::make_reshape();
return desc;

View File

@@ -78,7 +78,6 @@ public:
// To prevent an explosion in the number of sstables we cap it.
// Better co-locate some windows into the same sstables than OOM.
static constexpr uint64_t max_data_segregation_window_count = 100;
static constexpr float reshape_target_space_overhead = 0.1f;
using bucket_t = std::vector<shared_sstable>;
enum class bucket_compaction_mode { none, size_tiered, major };
@@ -163,7 +162,7 @@ public:
virtual std::unique_ptr<compaction_backlog_tracker::impl> make_backlog_tracker() const override;
virtual uint64_t adjust_partition_estimate(const mutation_source_metadata& ms_meta, uint64_t partition_estimate, schema_ptr s) const override;
virtual uint64_t adjust_partition_estimate(const mutation_source_metadata& ms_meta, uint64_t partition_estimate) const override;
virtual reader_consumer_v2 make_interposer_consumer(const mutation_source_metadata& ms_meta, reader_consumer_v2 end_consumer) const override;
@@ -171,7 +170,7 @@ public:
return true;
}
virtual compaction_descriptor get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_config cfg) const override;
virtual compaction_descriptor get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_mode mode) const override;
};
}

View File

@@ -572,7 +572,7 @@ murmur3_partitioner_ignore_msb_bits: 12
force_schema_commit_log: true
# Time for which task manager task is kept in memory after it completes.
# task_ttl_in_seconds: 0
task_ttl_in_seconds: 10
# Use Raft to consistently manage schema information in the cluster.
# Refer to https://docs.scylladb.com/master/architecture/raft.html for more details.

View File

@@ -27,16 +27,6 @@ tempfile.tempdir = f"{outdir}/tmp"
configure_args = str.join(' ', [shlex.quote(x) for x in sys.argv[1:] if not x.startswith('--out=')])
employ_ld_trickery = True
# distro-specific setup
def distro_setup_nix():
global employ_ld_trickery
employ_ld_trickery = False
if os.environ.get('NIX_CC'):
distro_setup_nix()
# distribution "internationalization", converting package names.
# Fedora name is key, values is distro -> package name dict.
i18n_xlat = {
@@ -852,7 +842,6 @@ scylla_core = (['message/messaging_service.cc',
'utils/rjson.cc',
'utils/human_readable.cc',
'utils/histogram_metrics_helper.cc',
'utils/on_internal_error.cc',
'utils/pretty_printers.cc',
'converting_mutation_partition_applier.cc',
'readers/combined.cc',
@@ -1127,7 +1116,6 @@ scylla_core = (['message/messaging_service.cc',
'utils/lister.cc',
'repair/repair.cc',
'repair/row_level.cc',
'repair/table_check.cc',
'exceptions/exceptions.cc',
'auth/allow_all_authenticator.cc',
'auth/allow_all_authorizer.cc',
@@ -1242,8 +1230,6 @@ api = ['api/api.cc',
Json2Code('api/api-doc/error_injection.json'),
'api/authorization_cache.cc',
Json2Code('api/api-doc/authorization_cache.json'),
'api/raft.cc',
Json2Code('api/api-doc/raft.json'),
]
alternator = [
@@ -1317,8 +1303,6 @@ idls = ['idl/gossip_digest.idl.hh',
'idl/utils.idl.hh',
]
headers = find_headers('.', excluded_dirs=['idl', 'build', 'seastar', '.git'])
scylla_tests_generic_dependencies = [
'test/lib/cql_test_env.cc',
'test/lib/test_services.cc',
@@ -1455,7 +1439,7 @@ deps['test/boost/bytes_ostream_test'] = [
"test/lib/log.cc",
]
deps['test/boost/input_stream_test'] = ['test/boost/input_stream_test.cc']
deps['test/boost/UUID_test'] = ['utils/UUID_gen.cc', 'test/boost/UUID_test.cc', 'utils/uuid.cc', 'utils/dynamic_bitset.cc', 'utils/hashers.cc', 'utils/on_internal_error.cc']
deps['test/boost/UUID_test'] = ['utils/UUID_gen.cc', 'test/boost/UUID_test.cc', 'utils/uuid.cc', 'utils/dynamic_bitset.cc', 'utils/hashers.cc']
deps['test/boost/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'test/boost/murmur_hash_test.cc']
deps['test/boost/allocation_strategy_test'] = ['test/boost/allocation_strategy_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['test/boost/log_heap_test'] = ['test/boost/log_heap_test.cc']
@@ -1516,26 +1500,29 @@ wasm_deps['wasm/test_UDA_final.wat'] = 'test/resource/wasm/c/test_UDA_final.c'
wasm_deps['wasm/test_UDA_scalar.wat'] = 'test/resource/wasm/c/test_UDA_scalar.c'
wasm_deps['wasm/test_word_double.wat'] = 'test/resource/wasm/c/test_word_double.c'
warnings = [
'-Wall',
'-Werror',
'-Wimplicit-fallthrough',
'-Wno-mismatched-tags', # clang-only
'-Wno-c++11-narrowing',
'-Wno-overloaded-virtual',
'-Wno-unused-command-line-argument',
'-Wno-unsupported-friend',
'-Wno-implicit-int-float-conversion',
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77728
'-Wno-psabi',
'-Wno-narrowing',
]
warnings = [w
for w in warnings
if flag_supported(flag=w, compiler=args.cxx)]
def get_warning_options(cxx):
warnings = [
'-Wall',
'-Werror',
'-Wimplicit-fallthrough',
'-Wno-mismatched-tags', # clang-only
'-Wno-c++11-narrowing',
'-Wno-overloaded-virtual',
'-Wno-unused-command-line-argument',
'-Wno-unsupported-friend',
'-Wno-implicit-int-float-conversion',
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77728
'-Wno-psabi',
'-Wno-narrowing',
]
warnings = [w
for w in warnings
if flag_supported(flag=w, compiler=cxx)]
return ' '.join(warnings + ['-Wno-error=deprecated-declarations'])
warnings = ' '.join(warnings + ['-Wno-error=deprecated-declarations'])
def get_clang_inline_threshold():
if args.clang_inline_threshold != -1:
@@ -1593,6 +1580,7 @@ pkgs = []
pkgs.append('lua53' if have_pkg('lua53') else 'lua')
pkgs.append('libsystemd')
pkgs.append('jsoncpp')
has_sanitize_address_use_after_scope = try_compile(compiler=args.cxx, flags=['-fsanitize-address-use-after-scope'], source='int f() {}')
@@ -1652,15 +1640,22 @@ for m, mode_config in modes.items():
# At the end of the build we check that the build-id is indeed in the
# first page. At install time we check that patchelf doesn't modify
# the program headers.
def dynamic_linker_option():
gcc_linker_output = subprocess.check_output(['gcc', '-###', '/dev/null', '-o', 't'], stderr=subprocess.STDOUT).decode('utf-8')
original_dynamic_linker = re.search('-dynamic-linker ([^ ]*)', gcc_linker_output).groups()[0]
gcc_linker_output = subprocess.check_output(['gcc', '-###', '/dev/null', '-o', 't'], stderr=subprocess.STDOUT).decode('utf-8')
original_dynamic_linker = re.search('-dynamic-linker ([^ ]*)', gcc_linker_output).groups()[0]
if employ_ld_trickery:
# gdb has a SO_NAME_MAX_PATH_SIZE of 512, so limit the path size to
# that. The 512 includes the null at the end, hence the 511 bellow.
dynamic_linker = '/' * (511 - len(original_dynamic_linker)) + original_dynamic_linker
else:
dynamic_linker = original_dynamic_linker
employ_ld_trickery = True
# distro-specific setup
if os.environ.get('NIX_CC'):
employ_ld_trickery = False
if employ_ld_trickery:
# gdb has a SO_NAME_MAX_PATH_SIZE of 512, so limit the path size to
# that. The 512 includes the null at the end, hence the 511 bellow.
dynamic_linker = '/' * (511 - len(original_dynamic_linker)) + original_dynamic_linker
else:
dynamic_linker = original_dynamic_linker
return f'--dynamic-linker={dynamic_linker}'
forced_ldflags = '-Wl,'
@@ -1670,7 +1665,7 @@ forced_ldflags = '-Wl,'
# explicitly ask for SHA1 build-ids.
forced_ldflags += '--build-id=sha1,'
forced_ldflags += f'--dynamic-linker={dynamic_linker}'
forced_ldflags += dynamic_linker_option()
user_ldflags = forced_ldflags + ' ' + args.user_ldflags
@@ -1748,10 +1743,7 @@ if not args.dist_only:
for mode, mode_config in build_modes.items():
configure_seastar(outdir, mode, mode_config)
pc = {mode: f'{outdir}/{mode}/seastar/seastar.pc' for mode in build_modes}
def query_seastar_flags(pc_file, link_static_cxx=False):
use_shared_libs = modes[mode]['build_seastar_shared_libs']
def query_seastar_flags(pc_file, use_shared_libs, link_static_cxx=False):
if use_shared_libs:
opt = '--shared'
else:
@@ -1764,13 +1756,11 @@ def query_seastar_flags(pc_file, link_static_cxx=False):
if link_static_cxx:
libs = libs.replace('-lstdc++ ', '')
return cflags, libs
testing_libs = pkg_config(pc_file.replace('seastar.pc', 'seastar-testing.pc'), '--libs', '--static')
return {'seastar_cflags': cflags,
'seastar_libs': libs,
'seastar_testing_libs': testing_libs}
for mode in build_modes:
seastar_pc_cflags, seastar_pc_libs = query_seastar_flags(pc[mode], link_static_cxx=args.staticcxx)
modes[mode]['seastar_cflags'] = seastar_pc_cflags
modes[mode]['seastar_libs'] = seastar_pc_libs
modes[mode]['seastar_testing_libs'] = pkg_config(pc[mode].replace('seastar.pc', 'seastar-testing.pc'), '--libs', '--static')
abseil_pkgs = [
'absl_raw_hash_set',
@@ -1779,8 +1769,7 @@ abseil_pkgs = [
pkgs += abseil_pkgs
user_cflags += " " + pkg_config('jsoncpp', '--cflags')
libs = ' '.join([maybe_static(args.staticyamlcpp, '-lyaml-cpp'), '-latomic', '-llz4', '-lz', '-lsnappy', pkg_config('jsoncpp', '--libs'),
libs = ' '.join([maybe_static(args.staticyamlcpp, '-lyaml-cpp'), '-latomic', '-llz4', '-lz', '-lsnappy',
' -lstdc++fs', ' -lcrypt', ' -lcryptopp', ' -lpthread',
# Must link with static version of libzstd, since
# experimental APIs that we use are only present there.
@@ -1817,6 +1806,7 @@ def write_build_file(f,
scylla_version,
scylla_release,
args):
warnings = get_warning_options(args.cxx)
f.write(textwrap.dedent('''\
configure_args = {configure_args}
builddir = {outdir}
@@ -1911,8 +1901,13 @@ def write_build_file(f,
else:
f.write(f'build $builddir/{wasm}: c2wasm {src}\n')
f.write(f'build $builddir/{binary}: wasm2wat $builddir/{wasm}\n')
for mode in build_modes:
modeval = modes[mode]
modeval.update(query_seastar_flags(f'{outdir}/{mode}/seastar/seastar.pc',
modeval['build_seastar_shared_libs'],
args.staticcxx))
fmt_lib = 'fmt'
f.write(textwrap.dedent('''\
cxx_ld_flags_{mode} = {cxx_ld_flags}
@@ -1973,7 +1968,7 @@ def write_build_file(f,
command = CARGO_BUILD_DEP_INFO_BASEDIR='.' cargo build --locked --manifest-path=rust/Cargo.toml --target-dir=$builddir/{mode} --profile=rust-{mode} $
&& touch $out
description = RUST_LIB $out
''').format(mode=mode, antlr3_exec=args.antlr3_exec, fmt_lib=fmt_lib, test_repeat=test_repeat, test_timeout=test_timeout, **modeval))
''').format(mode=mode, antlr3_exec=args.antlr3_exec, fmt_lib=fmt_lib, test_repeat=args.test_repeat, test_timeout=args.test_timeout, **modeval))
f.write(
'build {mode}-build: phony {artifacts} {wasms}\n'.format(
mode=mode,
@@ -2074,6 +2069,8 @@ def write_build_file(f,
objs=' '.join(compiles)
)
)
headers = find_headers('.', excluded_dirs=['idl', 'build', 'seastar', '.git'])
f.write(
'build {mode}-headers: phony {header_objs}\n'.format(
mode=mode,

View File

@@ -7,7 +7,35 @@ generate_cql_grammar(
SOURCES cql_grammar_srcs)
set_source_files_properties(${cql_grammar_srcs}
PROPERTIES
COMPILE_FLAGS "-Wno-uninitialized -Wno-parentheses-equality")
COMPILE_OPTIONS "-Wno-uninitialized;-Wno-parentheses-equality")
set(cql_parser_srcs ${cql_grammar_srcs})
list(FILTER cql_parser_srcs INCLUDE REGEX "Parser.cpp$")
set(unoptimized_levels "0" "g" "s")
if(Seastar_OptimizationLevel_${build_mode} IN_LIST unoptimized_levels)
# Unoptimized parsers end up using huge amounts of stack space and
# overflowing their stack
list(APPEND cql_parser_compile_options
"-O1")
endif()
include(CheckCXXCompilerFlag)
check_cxx_compiler_flag("-fsanitize-address-use-after-scope"
_sanitize_address_use_after_scope_supported)
if(_sanitize_address_use_after_scope_supported)
# use-after-scope sanitizer also uses large amount of stack space
# and overflows the stack of CqlParser
list(APPEND cql_parser_compile_options
"-fno-sanitize-address-use-after-scope")
endif()
if(DEFINED cql_parser_compile_options)
set_property(
SOURCE ${cql_parser_srcs}
APPEND
PROPERTY COMPILE_OPTIONS ${cql_parser_compile_options})
endif()
add_library(cql3 STATIC)
target_sources(cql3

View File

@@ -197,6 +197,7 @@ concept LeafExpression
/// A column, usually encountered on the left side of a restriction.
/// An expression like `mycol < 5` would be expressed as a binary_operator
/// with column_value on the left hand side.
/// The column_definition* points inside the schema_ptr used during preparation.
struct column_value {
const column_definition* col;
@@ -207,8 +208,8 @@ struct column_value {
/// A subscripted value, eg list_colum[2], val[sub]
struct subscript {
expression val;
expression sub;
expression val; // The value that is being subscripted
expression sub; // The value between the square braces
data_type type; // may be null before prepare
friend bool operator==(const subscript&, const subscript&) = default;
@@ -235,7 +236,8 @@ enum class null_handling_style {
lwt_nulls, // evaluate(NULL = NULL) -> TRUE, evaluate(NULL < x) -> exception
};
/// Operator restriction: LHS op RHS.
// An operation on two items (left hand side and right hand side).
// For example: "col = 2", "(col1, col2) = (?, 3)"
struct binary_operator {
expression lhs;
oper_t op;
@@ -248,14 +250,18 @@ struct binary_operator {
friend bool operator==(const binary_operator&, const binary_operator&) = default;
};
/// A conjunction of restrictions.
// A conjunction of expressions separated by the AND keyword.
// For example: "a < 3 AND col1 = ? AND pk IN (1, 2)"
struct conjunction {
std::vector<expression> children;
friend bool operator==(const conjunction&, const conjunction&) = default;
};
// Gets resolved eventually into a column_value.
// A string that represents a column name.
// It's not validated in any way, it's just a name that someone wrote.
// During preparation it's resolved and converted into a validated column_value.
// For example "my_col", "pk1"
struct unresolved_identifier {
::shared_ptr<column_identifier_raw> ident;
@@ -265,6 +271,7 @@ struct unresolved_identifier {
};
// An attribute attached to a column mutation: writetime or ttl
// For example: "WRITETIME(my_col)", "TTL(some_col)"
struct column_mutation_attribute {
enum class attribute_kind { writetime, ttl };
@@ -276,7 +283,11 @@ struct column_mutation_attribute {
friend bool operator==(const column_mutation_attribute&, const column_mutation_attribute&) = default;
};
// Function call.
// For example: "some_func(123, 456)", "token(col1, col2)"
struct function_call {
// Before preparation "func" is a function_name.
// During preparation it's converted into db::functions::function
std::variant<functions::function_name, shared_ptr<db::functions::function>> func;
std::vector<expression> args;
@@ -312,8 +323,14 @@ struct function_call {
friend bool operator==(const function_call&, const function_call&) = default;
};
// Represents casting an expression to a given type.
// There are two types of casts - C style and SQL style.
// For example: "(text)ascii_column", "CAST(int_column as blob)"
struct cast {
enum class cast_style { c, sql };
enum class cast_style {
c, // (int)arg
sql // CAST(arg as int)
};
cast_style style;
expression arg;
std::variant<data_type, shared_ptr<cql3_type::raw>> type;
@@ -321,6 +338,8 @@ struct cast {
friend bool operator==(const cast&, const cast&) = default;
};
// Represents accessing a field inside a struct (user defined type).
// For example: "udt_val.udt_field"
struct field_selection {
expression structure;
shared_ptr<column_identifier_raw> field;
@@ -330,7 +349,12 @@ struct field_selection {
friend bool operator==(const field_selection&, const field_selection&) = default;
};
// Represents a bind marker, both named and unnamed.
// For example: "?", ":myvar"
// It contains only the index, for named bind markers the names are kept inside query_options.
struct bind_variable {
// Index of this bind marker inside the query string.
// Consecutive bind markers are numbered 0, 1, 2, 3, ...
int32_t bind_index;
// Describes where this bound value will be assigned.
@@ -342,6 +366,8 @@ struct bind_variable {
// A constant which does not yet have a date type. It is partially typed
// (we know if it's floating or int) but not sized.
// For example: "123", "1.341", "null"
// During preparation it's assigned an exact type and converted into expr::constant.
struct untyped_constant {
enum type_class { integer, floating_point, string, boolean, duration, uuid, hex, null };
type_class partial_type;
@@ -354,7 +380,9 @@ untyped_constant make_untyped_null();
// Represents a constant value with known value and type
// For null and unset the type can sometimes be set to empty_type
// For example: "123", "abcddef", "[1, 2, 3, 4, 5]"
struct constant {
// The CQL value, serialized to binary representation.
cql3::raw_value value;
// Never nullptr, for NULL and UNSET might be empty_type
@@ -374,7 +402,9 @@ struct constant {
friend bool operator==(const constant&, const constant&) = default;
};
// Denotes construction of a tuple from its elements, e.g. ('a', ?, some_column) in CQL.
// Denotes construction of a tuple from its elements.
// For example: "('a', ?, some_column)"
// During preparation tuple constructors with constant values are converted to expr::constant.
struct tuple_constructor {
std::vector<expression> elements;
@@ -386,9 +416,13 @@ struct tuple_constructor {
};
// Constructs a collection of same-typed elements
// For example: "[1, 2, ?]", "{5, 6, 7}", {1: 2, 3: 4}"
// During preparation collection constructors with constant values are converted to expr::constant.
struct collection_constructor {
enum class style_type { list, set, map };
style_type style;
// For map constructors, elements is a list of key-pair tuples.
std::vector<expression> elements;
// Might be nullptr before prepare.
@@ -399,6 +433,8 @@ struct collection_constructor {
};
// Constructs an object of a user-defined type
// For example: "{field1: 23343, field2: ?}"
// During preparation usertype constructors with constant values are converted to expr::constant.
struct usertype_constructor {
using elements_map_type = std::unordered_map<column_identifier, expression>;
elements_map_type elements;

View File

@@ -338,9 +338,6 @@ functions::get(data_dictionary::database db,
if (!receiver_cf.has_value()) {
throw exceptions::invalid_request_exception("functions::get for token doesn't have a known column family");
}
if (schema == nullptr) {
throw exceptions::invalid_request_exception(format("functions::get for token cannot find {} table", *receiver_cf));
}
auto fun = ::make_shared<token_fct>(schema);
validate_types(db, keyspace, schema.get(), fun, provided_args, receiver_ks, receiver_cf);
return fun;

View File

@@ -815,7 +815,7 @@ bool query_processor::has_more_results(cql3::internal_query_state& state) const
future<> query_processor::for_each_cql_result(
cql3::internal_query_state& state,
noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set::row&)> f) {
noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set::row&)>&& f) {
do {
auto msg = co_await execute_paged_internal(state);
for (auto& row : *msg) {
@@ -1065,9 +1065,6 @@ void query_processor::migration_subscriber::on_update_aggregate(const sstring& k
void query_processor::migration_subscriber::on_update_view(
const sstring& ks_name,
const sstring& view_name, bool columns_changed) {
// scylladb/scylladb#16392 - Materialized views are also tables so we need at least handle
// them as such when changed.
on_update_column_family(ks_name, view_name, columns_changed);
}
void query_processor::migration_subscriber::on_update_tablet_metadata() {
@@ -1116,14 +1113,14 @@ future<> query_processor::query_internal(
db::consistency_level cl,
const std::initializer_list<data_value>& values,
int32_t page_size,
noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set_row&)> f) {
noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set_row&)>&& f) {
auto query_state = create_paged_state(query_string, cl, values, page_size);
co_return co_await for_each_cql_result(query_state, std::move(f));
}
future<> query_processor::query_internal(
const sstring& query_string,
noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set_row&)> f) {
noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set_row&)>&& f) {
return query_internal(query_string, db::consistency_level::ONE, {}, 1000, std::move(f));
}

View File

@@ -307,7 +307,7 @@ public:
db::consistency_level cl,
const std::initializer_list<data_value>& values,
int32_t page_size,
noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set_row&)> f);
noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set_row&)>&& f);
/*
* \brief iterate over all cql results using paging
@@ -322,7 +322,7 @@ public:
*/
future<> query_internal(
const sstring& query_string,
noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set_row&)> f);
noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set_row&)>&& f);
class cache_internal_tag;
using cache_internal = bool_class<cache_internal_tag>;
@@ -479,7 +479,7 @@ private:
*/
future<> for_each_cql_result(
cql3::internal_query_state& state,
noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set_row&)> f);
noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set_row&)>&& f);
/*!
* \brief check, based on the state if there are additional results

View File

@@ -541,32 +541,22 @@ std::pair<std::optional<secondary_index::index>, expr::expression> statement_res
int chosen_index_score = 0;
expr::expression chosen_index_restrictions = expr::conjunction({});
// Several indexes may be usable for this query. When their score is tied,
// let's pick one by order of the columns mentioned in the restriction
// expression. This specific order isn't important (and maybe in the
// future we could plan a better order based on the specificity of each
// index), but it is critical that two coordinators - or the same
// coordinator over time - must choose the same index for the same query.
// Otherwise, paging can break (see issue #7969).
for (const expr::expression& restriction : index_restrictions()) {
if (has_partition_token(restriction, *_schema) || contains_multi_column_restriction(restriction)) {
continue;
}
expr::for_each_expression<expr::column_value>(restriction, [&](const expr::column_value& cval) {
auto& cdef = cval.col;
expr::expression col_restrictions = expr::conjunction {
.children = expr::extract_single_column_restrictions_for_column(restriction, *cdef)
};
for (const auto& index : sim.list_indexes()) {
if (cdef->name_as_text() == index.target_column() &&
expr::is_supported_by(col_restrictions, index) &&
score(index) > chosen_index_score) {
chosen_index = index;
chosen_index_score = score(index);
chosen_index_restrictions = restriction;
}
for (const auto& index : sim.list_indexes()) {
auto cdef = _schema->get_column_definition(to_bytes(index.target_column()));
for (const expr::expression& restriction : index_restrictions()) {
if (has_partition_token(restriction, *_schema) || contains_multi_column_restriction(restriction)) {
continue;
}
});
expr::single_column_restrictions_map rmap = expr::get_single_column_restrictions_map(restriction);
const auto found = rmap.find(cdef);
if (found != rmap.end() && is_supported_by(found->second, index)
&& score(index) > chosen_index_score) {
chosen_index = index;
chosen_index_score = score(index);
chosen_index_restrictions = restriction;
}
}
}
return {chosen_index, chosen_index_restrictions};
}
@@ -1142,14 +1132,13 @@ bool starts_before_start(
const auto len1 = r1.start()->value().representation().size();
const auto len2 = r2.start()->value().representation().size();
if (len1 == len2) { // The values truly are equal.
// (a)>=(1) starts before (a)>(1)
return r1.start()->is_inclusive() && !r2.start()->is_inclusive();
} else if (len1 < len2) { // r1 start is a prefix of r2 start.
// (a)>=(1) starts before (a,b)>=(1,1), but (a)>(1) doesn't.
return r1.start()->is_inclusive();
} else { // r2 start is a prefix of r1 start.
// (a,b)>=(1,1) starts before (a)>(1) but after (a)>=(1).
return !r2.start()->is_inclusive();
return r2.start()->is_inclusive();
}
}
@@ -1174,7 +1163,6 @@ bool starts_before_or_at_end(
const auto len1 = r1.start()->value().representation().size();
const auto len2 = r2.end()->value().representation().size();
if (len1 == len2) { // The values truly are equal.
// (a)>=(1) starts at end of (a)<=(1)
return r1.start()->is_inclusive() && r2.end()->is_inclusive();
} else if (len1 < len2) { // r1 start is a prefix of r2 end.
// a>=(1) starts before (a,b)<=(1,1) ends, but (a)>(1) doesn't.
@@ -1206,7 +1194,6 @@ bool ends_before_end(
const auto len1 = r1.end()->value().representation().size();
const auto len2 = r2.end()->value().representation().size();
if (len1 == len2) { // The values truly are equal.
// (a)<(1) ends before (a)<=(1) ends
return !r1.end()->is_inclusive() && r2.end()->is_inclusive();
} else if (len1 < len2) { // r1 end is a prefix of r2 end.
// (a)<(1) ends before (a,b)<=(1,1), but (a)<=(1) doesn't.
@@ -1222,10 +1209,7 @@ std::optional<query::clustering_range> intersection(
const query::clustering_range& r1,
const query::clustering_range& r2,
const clustering_key_prefix::prefix_equal_tri_compare& cmp) {
// If needed, swap r1 and r2 so that r1's start is to the left of r2's
// start. Note that to avoid infinite recursion (#18688) the function
// starts_before_start() must never return true for both (r1,r2) and
// (r2,r1) - in other words, it must be a *strict* partial order.
// Assume r1's start is to the left of r2's start.
if (starts_before_start(r2, r1, cmp)) {
return intersection(r2, r1, cmp);
}

View File

@@ -433,17 +433,12 @@ protected:
}
};
// Return a list of columns that "SELECT *" should show - these are all
// columns except potentially some that are is_hidden_from_cql() (currently,
// those can be the "virtual columns" used in materialized views).
// The list points to column_definition objects in the given schema_ptr,
// which can be used only as long as the caller keeps the schema_ptr alive.
std::vector<const column_definition*> selection::wildcard_columns(schema_ptr schema) {
::shared_ptr<selection> selection::wildcard(schema_ptr schema) {
auto columns = schema->all_columns_in_select_order();
// filter out hidden columns, which should not be seen by the
// user when doing "SELECT *". We also disallow selecting them
// individually (see column_identifier::new_selector_factory()).
return boost::copy_range<std::vector<const column_definition*>>(
auto cds = boost::copy_range<std::vector<const column_definition*>>(
columns |
boost::adaptors::filtered([](const column_definition& c) {
return !c.is_hidden_from_cql();
@@ -451,10 +446,7 @@ std::vector<const column_definition*> selection::wildcard_columns(schema_ptr sch
boost::adaptors::transformed([](const column_definition& c) {
return &c;
}));
}
::shared_ptr<selection> selection::wildcard(schema_ptr schema) {
return simple_selection::make(schema, wildcard_columns(schema), true);
return simple_selection::make(schema, std::move(cds), true);
}
::shared_ptr<selection> selection::for_columns(schema_ptr schema, std::vector<const column_definition*> columns) {

View File

@@ -118,7 +118,6 @@ public:
}
static ::shared_ptr<selection> wildcard(schema_ptr schema);
static std::vector<const column_definition*> wildcard_columns(schema_ptr schema);
static ::shared_ptr<selection> for_columns(schema_ptr schema, std::vector<const column_definition*> columns);
// Adds a column to the selection and result set. Returns an index within the result set row.

View File

@@ -135,18 +135,6 @@ user_type alter_type_statement::add_or_alter::do_add(data_dictionary::database d
throw exceptions::invalid_request_exception(format("Cannot add new field to type {}: maximum number of fields reached", _name));
}
if (_field_type->is_duration()) {
auto&& ks = db.find_keyspace(keyspace());
for (auto&& schema : ks.metadata()->cf_meta_data() | boost::adaptors::map_values) {
for (auto&& column : schema->clustering_key_columns()) {
if (column.type->references_user_type(_name.get_keyspace(), _name.get_user_type_name())) {
throw exceptions::invalid_request_exception(format("Cannot add new field to type {} because it is used in the clustering key column {} of table {}.{} where durations are not allowed",
_name.to_cql_string(), column.name_as_text(), schema->ks_name(), schema->cf_name()));
}
}
}
}
std::vector<bytes> new_names(to_update->field_names());
new_names.push_back(_field_name->name());
std::vector<data_type> new_types(to_update->field_types());

View File

@@ -226,8 +226,7 @@ future<> select_statement::check_access(query_processor& qp, const service::clie
}
if (!_selection->is_trivial()) {
std::vector<::shared_ptr<functions::function>> used_functions = _selection->used_functions();
auto not_native = [] (::shared_ptr<functions::function> func) { return !func->is_native(); };
for (const auto& used_function : used_functions | std::ranges::views::filter(not_native)) {
for (const auto& used_function : used_functions) {
sstring encoded_signature = auth::encode_signature(used_function->name().name, used_function->arg_types());
co_await state.has_function_access(used_function->name().keyspace, encoded_signature, auth::permission::EXECUTE);
}
@@ -1661,7 +1660,7 @@ schema_ptr mutation_fragments_select_statement::generate_output_schema(schema_pt
future<exceptions::coordinator_result<service::storage_proxy_coordinator_query_result>>
mutation_fragments_select_statement::do_query(
locator::host_id this_node,
const locator::node* this_node,
service::storage_proxy& sp,
schema_ptr schema,
lw_shared_ptr<query::read_command> cmd,
@@ -1671,7 +1670,7 @@ mutation_fragments_select_statement::do_query(
auto res = co_await replica::mutation_dump::dump_mutations(sp.get_db(), schema, _underlying_schema, partition_ranges, *cmd, optional_params.timeout(sp));
service::replicas_per_token_range last_replicas;
if (this_node) {
last_replicas.emplace(dht::token_range::make_open_ended_both_sides(), std::vector<locator::host_id>{this_node});
last_replicas.emplace(dht::token_range::make_open_ended_both_sides(), std::vector<locator::host_id>{this_node->host_id()});
}
co_return service::storage_proxy_coordinator_query_result{std::move(res), std::move(last_replicas), {}};
}
@@ -1732,17 +1731,12 @@ mutation_fragments_select_statement::do_execute(query_processor& qp, service::qu
auto timeout_duration = get_timeout(state.get_client_state(), options);
auto timeout = db::timeout_clock::now() + timeout_duration;
auto& tbl = qp.proxy().local_db().find_column_family(_underlying_schema);
// Since this query doesn't go through storage-proxy, we have to take care of pinning erm here.
auto erm_keepalive = tbl.get_effective_replication_map();
if (!aggregate && !_restrictions_need_filtering && (page_size <= 0
|| !service::pager::query_pagers::may_need_paging(*_schema, page_size,
*command, key_ranges))) {
return do_query({}, qp.proxy(), _schema, command, std::move(key_ranges), cl,
{timeout, state.get_permit(), state.get_client_state(), state.get_trace_state(), {}, {}})
.then(wrap_result_to_error_message([&, this, erm_keepalive] (service::storage_proxy_coordinator_query_result&& qr) {
.then(wrap_result_to_error_message([&, this] (service::storage_proxy_coordinator_query_result&& qr) {
cql3::selection::result_set_builder builder(*_selection, now);
query::result_view::consume(*qr.query_result, std::move(slice),
cql3::selection::result_set_builder::visitor(builder, *_schema, *_selection));
@@ -1751,14 +1745,16 @@ mutation_fragments_select_statement::do_execute(query_processor& qp, service::qu
}));
}
locator::host_id this_node;
const locator::node* this_node = nullptr;
{
auto& topo = erm_keepalive->get_topology();
this_node = topo.this_node()->host_id();
auto& tbl = qp.proxy().local_db().find_column_family(_underlying_schema);
auto& erm = tbl.get_effective_replication_map();
auto& topo = erm->get_topology();
this_node = topo.this_node();
auto state = options.get_paging_state();
if (state && !state->get_last_replicas().empty()) {
auto last_host = state->get_last_replicas().begin()->second.front();
if (last_host != this_node) {
if (last_host != this_node->host_id()) {
const auto last_node = topo.find_node(last_host);
throw exceptions::invalid_request_exception(format(
"Moving between coordinators is not allowed in SELECT FROM MUTATION_FRAGMENTS() statements, last page's coordinator was {}{}",
@@ -1778,10 +1774,7 @@ mutation_fragments_select_statement::do_execute(query_processor& qp, service::qu
command,
std::move(key_ranges),
_restrictions_need_filtering ? _restrictions : nullptr,
[this, erm_keepalive, this_node] (service::storage_proxy& sp, schema_ptr schema, lw_shared_ptr<query::read_command> cmd, dht::partition_range_vector partition_ranges,
db::consistency_level cl, service::storage_proxy_coordinator_query_options optional_params) {
return do_query(this_node, sp, std::move(schema), std::move(cmd), std::move(partition_ranges), cl, std::move(optional_params));
});
std::bind_front(&mutation_fragments_select_statement::do_query, this, this_node));
if (_selection->is_trivial() && !_restrictions_need_filtering && !_per_partition_limit) {
return p->fetch_page_generator_result(page_size, now, timeout, _stats).then(wrap_result_to_error_message([this, p = std::move(p)] (result_generator&& generator) {
@@ -1908,21 +1901,6 @@ std::unique_ptr<prepared_statement> select_statement::prepare(data_dictionary::d
// Force aggregation if GROUP BY is used. This will wrap every column x as first(x).
if (!_group_by_columns.empty()) {
aggregation_depth = std::max(aggregation_depth, 1u);
if (prepared_selectors.empty()) {
// We have a "SELECT * GROUP BY". If we leave prepared_selectors
// empty, below we choose selection::wildcard() for SELECT *, and
// forget to do the "levellize" trick needed for the GROUP BY.
// So we need to set prepared_selectors. See #16531.
auto all_columns = selection::selection::wildcard_columns(schema);
std::vector<::shared_ptr<selection::raw_selector>> select_all;
select_all.reserve(all_columns.size());
for (const column_definition *cdef : all_columns) {
auto name = ::make_shared<cql3::column_identifier::raw>(cdef->name_as_text(), true);
select_all.push_back(::make_shared<selection::raw_selector>(
expr::unresolved_identifier(std::move(name)), nullptr));
}
prepared_selectors = selection::raw_selector::to_prepared_selectors(select_all, *schema, db, keyspace());
}
}
for (auto& ps : prepared_selectors) {
@@ -2004,10 +1982,7 @@ std::unique_ptr<prepared_statement> select_statement::prepare(data_dictionary::d
)
&& !restrictions->need_filtering() // No filtering
&& group_by_cell_indices->empty() // No GROUP BY
&& db.get_config().enable_parallelized_aggregation()
&& !( // Do not parallelize the request if it's single partition read
restrictions->partition_key_restrictions_is_all_eq()
&& restrictions->partition_key_restrictions_size() == schema->partition_key_size());
&& db.get_config().enable_parallelized_aggregation();
};
if (_parameters->is_prune_materialized_view()) {

View File

@@ -19,7 +19,10 @@
#include "index/secondary_index_manager.hh"
#include "exceptions/exceptions.hh"
#include "exceptions/coordinator_result.hh"
#include "locator/host_id.hh"
namespace locator {
class node;
} // namespace locator
namespace service {
class client_state;
@@ -338,7 +341,7 @@ public:
private:
future<exceptions::coordinator_result<service::storage_proxy_coordinator_query_result>>
do_query(
locator::host_id this_node,
const locator::node* this_node,
service::storage_proxy& sp,
schema_ptr schema,
lw_shared_ptr<query::read_command> cmd,

View File

@@ -56,11 +56,7 @@ future<> use_statement::check_access(query_processor& qp, const service::client_
future<::shared_ptr<cql_transport::messages::result_message>>
use_statement::execute(query_processor& qp, service::query_state& state, const query_options& options, std::optional<service::group0_guard> guard) const {
try {
state.get_client_state().set_keyspace(qp.db().real_database(), _keyspace);
} catch(...) {
return make_exception_future<::shared_ptr<cql_transport::messages::result_message>>(std::current_exception());
}
state.get_client_state().set_keyspace(qp.db().real_database(), _keyspace);
auto result =::make_shared<cql_transport::messages::result_message::set_keyspace>(_keyspace);
return make_ready_future<::shared_ptr<cql_transport::messages::result_message>>(result);
}

View File

@@ -151,15 +151,13 @@ static bytes from_json_object_aux(const map_type_impl& t, const rjson::value& va
std::map<bytes, bytes, serialized_compare> raw_map(t.get_keys_type()->as_less_comparator());
for (auto it = value.MemberBegin(); it != value.MemberEnd(); ++it) {
bytes value = from_json_object(*t.get_values_type(), it->value);
// For all native (non-collection, non-tuple) key types, they are
// represented as a string in JSON. For more elaborate types, they
// can also be a string representation of another JSON type, which
// needs to be reparsed as JSON. For example,
// map<frozen<list<int>>, int> will be represented as:
// { "[1, 3, 6]": 3, "[]": 0, "[1, 2]": 2 }
if (t.get_keys_type()->underlying_type()->is_native()) {
if (t.get_keys_type()->underlying_type() == ascii_type ||
t.get_keys_type()->underlying_type() == utf8_type) {
raw_map.emplace(from_json_object(*t.get_keys_type(), it->name), std::move(value));
} else {
// Keys in maps can only be strings in JSON, but they can also be a string representation
// of another JSON type, which needs to be reparsed. Example - map<frozen<list<int>>, int>
// will be represented like this: { "[1, 3, 6]": 3, "[]": 0, "[1, 2]": 2 }
try {
rjson::value map_key = rjson::parse(rjson::to_string_view(it->name));
raw_map.emplace(from_json_object(*t.get_keys_type(), map_key), std::move(value));
@@ -504,7 +502,7 @@ struct to_json_string_visitor {
sstring operator()(const tuple_type_impl& t) { return to_json_string_aux(t, bv); }
sstring operator()(const user_type_impl& t) { return to_json_string_aux(t, bv); }
sstring operator()(const simple_date_type_impl& t) { return quote_json_string(t.to_string(bv)); }
sstring operator()(const time_type_impl& t) { return quote_json_string(t.to_string(bv)); }
sstring operator()(const time_type_impl& t) { return t.to_string(bv); }
sstring operator()(const empty_type_impl& t) { return "null"; }
sstring operator()(const duration_type_impl& t) {
auto v = t.deserialize(bv);

View File

@@ -135,7 +135,7 @@ future<> db::batchlog_manager::stop() {
}
future<size_t> db::batchlog_manager::count_all_batches() const {
sstring query = format("SELECT count(*) FROM {}.{} BYPASS CACHE", system_keyspace::NAME, system_keyspace::BATCHLOG);
sstring query = format("SELECT count(*) FROM {}.{}", system_keyspace::NAME, system_keyspace::BATCHLOG);
return _qp.execute_internal(query, cql3::query_processor::cache_internal::yes).then([](::shared_ptr<cql3::untyped_result_set> rs) {
return size_t(rs->one().get_as<int64_t>("count"));
});
@@ -154,26 +154,26 @@ future<> db::batchlog_manager::replay_all_failed_batches() {
auto throttle = _replay_rate / _qp.proxy().get_token_metadata_ptr()->count_normal_token_owners();
auto limiter = make_lw_shared<utils::rate_limiter>(throttle);
auto batch = [this, limiter](const cql3::untyped_result_set::row& row) -> future<stop_iteration> {
auto batch = [this, limiter](const cql3::untyped_result_set::row& row) {
auto written_at = row.get_as<db_clock::time_point>("written_at");
auto id = row.get_as<utils::UUID>("id");
// enough time for the actual write + batchlog entry mutation delivery (two separate requests).
auto timeout = get_batch_log_timeout();
if (db_clock::now() < written_at + timeout) {
blogger.debug("Skipping replay of {}, too fresh", id);
return make_ready_future<stop_iteration>(stop_iteration::no);
return make_ready_future<>();
}
// check version of serialization format
if (!row.has("version")) {
blogger.warn("Skipping logged batch because of unknown version");
return make_ready_future<stop_iteration>(stop_iteration::no);
return make_ready_future<>();
}
auto version = row.get_as<int32_t>("version");
if (version != netw::messaging_service::current_version) {
blogger.warn("Skipping logged batch because of incorrect version");
return make_ready_future<stop_iteration>(stop_iteration::no);
return make_ready_future<>();
}
auto data = row.get_blob("data");
@@ -255,20 +255,49 @@ future<> db::batchlog_manager::replay_all_failed_batches() {
auto now = service::client_state(service::client_state::internal_tag()).get_timestamp();
m.partition().apply_delete(*schema, clustering_key_prefix::make_empty(), tombstone(now, gc_clock::now()));
return _qp.proxy().mutate_locally(m, tracing::trace_state_ptr(), db::commitlog::force_sync::no);
}).then([] { return make_ready_future<stop_iteration>(stop_iteration::no); });
});
};
return seastar::with_gate(_gate, [this, batch = std::move(batch)] () mutable {
return seastar::with_gate(_gate, [this, batch = std::move(batch)] {
blogger.debug("Started replayAllFailedBatches (cpu {})", this_shard_id());
return _qp.query_internal(
format("SELECT id, data, written_at, version FROM {}.{} BYPASS CACHE", system_keyspace::NAME, system_keyspace::BATCHLOG),
db::consistency_level::ONE,
{},
page_size,
std::move(batch)).then([this] {
// Replaying batches could have generated tombstones, flush to disk,
// where they can be compacted away.
return replica::database::flush_table_on_all_shards(_qp.proxy().get_db(), system_keyspace::NAME, system_keyspace::BATCHLOG);
typedef ::shared_ptr<cql3::untyped_result_set> page_ptr;
sstring query = format("SELECT id, data, written_at, version FROM {}.{} LIMIT {:d}", system_keyspace::NAME, system_keyspace::BATCHLOG, page_size);
return _qp.execute_internal(query, cql3::query_processor::cache_internal::yes).then([this, batch = std::move(batch)](page_ptr page) {
return do_with(std::move(page), [this, batch = std::move(batch)](page_ptr & page) mutable {
return repeat([this, &page, batch = std::move(batch)]() mutable {
if (page->empty()) {
return make_ready_future<stop_iteration>(stop_iteration::yes);
}
auto id = page->back().get_as<utils::UUID>("id");
return parallel_for_each(*page, batch).then([this, &page, id]() {
if (page->size() < page_size) {
return make_ready_future<stop_iteration>(stop_iteration::yes); // we've exhausted the batchlog, next query would be empty.
}
sstring query = format("SELECT id, data, written_at, version FROM {}.{} WHERE token(id) > token(?) LIMIT {:d}",
system_keyspace::NAME,
system_keyspace::BATCHLOG,
page_size);
return _qp.execute_internal(query, {id}, cql3::query_processor::cache_internal::yes).then([&page](auto res) {
page = std::move(res);
return make_ready_future<stop_iteration>(stop_iteration::no);
});
});
});
});
}).then([] {
// TODO FIXME : cleanup()
#if 0
ColumnFamilyStore cfs = Keyspace.open(SystemKeyspace.NAME).getColumnFamilyStore(SystemKeyspace.BATCHLOG);
cfs.forceBlockingFlush();
Collection<Descriptor> descriptors = new ArrayList<>();
for (SSTableReader sstr : cfs.getSSTables())
descriptors.add(sstr.descriptor);
if (!descriptors.isEmpty()) // don't pollute the logs if there is nothing to compact.
CompactionManager.instance.submitUserDefined(cfs, descriptors, Integer.MAX_VALUE).get();
#endif
}).then([] {
blogger.debug("Finished replayAllFailedBatches");
});

View File

@@ -2628,20 +2628,12 @@ db::commitlog::read_log_file(sstring filename, sstring pfx, commit_load_reader_f
return eof || next == pos;
}
future<> skip(size_t bytes) {
auto n = std::min(file_size - pos, bytes);
pos += n;
if (pos == file_size) {
pos += bytes;
if (pos > file_size) {
eof = true;
pos = file_size;
}
if (n < bytes) {
// if we are trying to skip past end, we have at least
// the bytes skipped or the source from where we read
// this corrupt. So add at least four bytes. This is
// inexact, but adding the full "bytes" is equally wrong
// since it could be complete garbled junk.
corrupt_size += std::max(n, sizeof(uint32_t));
}
return fin.skip(n);
return fin.skip(bytes);
}
void stop() {
eof = true;

View File

@@ -341,10 +341,6 @@ db::config::config(std::shared_ptr<db::extensions> exts)
"If set to higher than 0, ignore the controller's output and set the compaction shares statically. Do not set this unless you know what you are doing and suspect a problem in the controller. This option will be retired when the controller reaches more maturity")
, compaction_enforce_min_threshold(this, "compaction_enforce_min_threshold", liveness::LiveUpdate, value_status::Used, false,
"If set to true, enforce the min_threshold option for compactions strictly. If false (default), Scylla may decide to compact even if below min_threshold")
, compaction_flush_all_tables_before_major_seconds(this, "compaction_flush_all_tables_before_major_seconds", value_status::Used, 86400,
"Set the minimum interval in seconds between flushing all tables before each major compaction (default is 86400). "
"This option is useful for maximizing tombstone garbage collection by releasing all active commitlog segments. "
"Set to 0 to disable automatic flushing all tables before major compaction")
/**
* @Group Initialization properties
* @GroupDescription The minimal properties needed for configuring a cluster.
@@ -489,8 +485,6 @@ db::config::config(std::shared_ptr<db::extensions> exts)
"Adjusts the sensitivity of the failure detector on an exponential scale. Generally this setting never needs adjusting.\n"
"Related information: Failure detection and recovery")
, failure_detector_timeout_in_ms(this, "failure_detector_timeout_in_ms", liveness::LiveUpdate, value_status::Used, 20 * 1000, "Maximum time between two successful echo message before gossip mark a node down in milliseconds.\n")
, direct_failure_detector_ping_timeout_in_ms(this, "direct_failure_detector_ping_timeout_in_ms", value_status::Used, 600, "Duration after which the direct failure detector aborts a ping message, so the next ping can start.\n"
"Note: this failure detector is used by Raft, and is different from gossiper's failure detector (configured by `failure_detector_timeout_in_ms`).\n")
/**
* @Group Performance tuning properties
* @GroupDescription Tuning performance and system resource utilization, including commit log, compaction, memory, disk I/O, CPU, reads, and writes.
@@ -680,9 +674,6 @@ db::config::config(std::shared_ptr<db::extensions> exts)
"The maximum number of tombstones a query can scan before aborting.")
, query_tombstone_page_limit(this, "query_tombstone_page_limit", liveness::LiveUpdate, value_status::Used, 10000,
"The number of tombstones after which a query cuts a page, even if not full or even empty.")
, query_page_size_in_bytes(this, "query_page_size_in_bytes", liveness::LiveUpdate, value_status::Used, 1 << 20,
"The size of pages in bytes, after a page accumulates this much data, the page is cut and sent to the client."
" Setting a too large value increases the risk of OOM.")
/**
* @Group Network timeout settings
*/
@@ -931,8 +922,6 @@ db::config::config(std::shared_ptr<db::extensions> exts)
, enable_repair_based_node_ops(this, "enable_repair_based_node_ops", liveness::LiveUpdate, value_status::Used, true, "Set true to use enable repair based node operations instead of streaming based")
, allowed_repair_based_node_ops(this, "allowed_repair_based_node_ops", liveness::LiveUpdate, value_status::Used, "replace,removenode,rebuild,bootstrap,decommission", "A comma separated list of node operations which are allowed to enable repair based node operations. The operations can be bootstrap, replace, removenode, decommission and rebuild")
, enable_compacting_data_for_streaming_and_repair(this, "enable_compacting_data_for_streaming_and_repair", liveness::LiveUpdate, value_status::Used, true, "Enable the compacting reader, which compacts the data for streaming and repair (load'n'stream included) before sending it to, or synchronizing it with peers. Can reduce the amount of data to be processed by removing dead data, but adds CPU overhead.")
, repair_partition_count_estimation_ratio(this, "repair_partition_count_estimation_ratio", liveness::LiveUpdate, value_status::Used, 0.1,
"Specify the fraction of partitions written by repair out of the total partitions. The value is currently only used for bloom filter estimation. Value is between 0 and 1.")
, ring_delay_ms(this, "ring_delay_ms", value_status::Used, 30 * 1000, "Time a node waits to hear from other nodes before joining the ring in milliseconds. Same as -Dcassandra.ring_delay_ms in cassandra.")
, shadow_round_ms(this, "shadow_round_ms", value_status::Used, 300 * 1000, "The maximum gossip shadow round time. Can be used to reduce the gossip feature check time during node boot up.")
, fd_max_interval_ms(this, "fd_max_interval_ms", value_status::Used, 2 * 1000, "The maximum failure_detector interval time in milliseconds. Interval larger than the maximum will be ignored. Larger cluster may need to increase the default.")
@@ -951,7 +940,6 @@ db::config::config(std::shared_ptr<db::extensions> exts)
, unspooled_dirty_soft_limit(this, "unspooled_dirty_soft_limit", value_status::Used, 0.6, "Soft limit of unspooled dirty memory expressed as a portion of the hard limit")
, sstable_summary_ratio(this, "sstable_summary_ratio", value_status::Used, 0.0005, "Enforces that 1 byte of summary is written for every N (2000 by default) "
"bytes written to data file. Value must be between 0 and 1.")
, components_memory_reclaim_threshold(this, "components_memory_reclaim_threshold", liveness::LiveUpdate, value_status::Used, .2, "Ratio of available memory for all in-memory components of SSTables in a shard beyond which the memory will be reclaimed from components until it falls back under the threshold. Currently, this limit is only enforced for bloom filters.")
, large_memory_allocation_warning_threshold(this, "large_memory_allocation_warning_threshold", value_status::Used, size_t(1) << 20, "Warn about memory allocations above this size; set to zero to disable")
, enable_deprecated_partitioners(this, "enable_deprecated_partitioners", value_status::Used, false, "Enable the byteordered and random partitioners. These partitioners are deprecated and will be removed in a future version.")
, enable_keyspace_column_family_metrics(this, "enable_keyspace_column_family_metrics", value_status::Used, false, "Enable per keyspace and per column family metrics reporting")
@@ -991,8 +979,6 @@ db::config::config(std::shared_ptr<db::extensions> exts)
"Start serializing reads after their collective memory consumption goes above $normal_limit * $multiplier.")
, reader_concurrency_semaphore_kill_limit_multiplier(this, "reader_concurrency_semaphore_kill_limit_multiplier", liveness::LiveUpdate, value_status::Used, 4,
"Start killing reads after their collective memory consumption goes above $normal_limit * $multiplier.")
, reader_concurrency_semaphore_cpu_concurrency(this, "reader_concurrency_semaphore_cpu_concurrency", liveness::LiveUpdate, value_status::Used, 1,
"Admit new reads while there are less than this number of requests that need CPU.")
, twcs_max_window_count(this, "twcs_max_window_count", liveness::LiveUpdate, value_status::Used, 50,
"The maximum number of compaction windows allowed when making use of TimeWindowCompactionStrategy. A setting of 0 effectively disables the restriction.")
, initial_sstable_loading_concurrency(this, "initial_sstable_loading_concurrency", value_status::Used, 4u,

View File

@@ -163,7 +163,6 @@ public:
named_value<float> memtable_flush_static_shares;
named_value<float> compaction_static_shares;
named_value<bool> compaction_enforce_min_threshold;
named_value<uint32_t> compaction_flush_all_tables_before_major_seconds;
named_value<sstring> cluster_name;
named_value<sstring> listen_address;
named_value<sstring> listen_interface;
@@ -196,7 +195,6 @@ public:
named_value<bool> snapshot_before_compaction;
named_value<uint32_t> phi_convict_threshold;
named_value<uint32_t> failure_detector_timeout_in_ms;
named_value<uint32_t> direct_failure_detector_ping_timeout_in_ms;
named_value<sstring> commitlog_sync;
named_value<uint32_t> commitlog_segment_size_in_mb;
named_value<uint32_t> schema_commitlog_segment_size_in_mb;
@@ -255,7 +253,6 @@ public:
named_value<uint32_t> tombstone_warn_threshold;
named_value<uint32_t> tombstone_failure_threshold;
named_value<uint64_t> query_tombstone_page_limit;
named_value<uint64_t> query_page_size_in_bytes;
named_value<uint32_t> range_request_timeout_in_ms;
named_value<uint32_t> read_request_timeout_in_ms;
named_value<uint32_t> counter_write_request_timeout_in_ms;
@@ -331,7 +328,6 @@ public:
named_value<bool> enable_repair_based_node_ops;
named_value<sstring> allowed_repair_based_node_ops;
named_value<bool> enable_compacting_data_for_streaming_and_repair;
named_value<double> repair_partition_count_estimation_ratio;
named_value<uint32_t> ring_delay_ms;
named_value<uint32_t> shadow_round_ms;
named_value<uint32_t> fd_max_interval_ms;
@@ -349,7 +345,6 @@ public:
named_value<unsigned> murmur3_partitioner_ignore_msb_bits;
named_value<double> unspooled_dirty_soft_limit;
named_value<double> sstable_summary_ratio;
named_value<double> components_memory_reclaim_threshold;
named_value<size_t> large_memory_allocation_warning_threshold;
named_value<bool> enable_deprecated_partitioners;
named_value<bool> enable_keyspace_column_family_metrics;
@@ -373,7 +368,6 @@ public:
named_value<uint64_t> max_memory_for_unlimited_query_hard_limit;
named_value<uint32_t> reader_concurrency_semaphore_serialize_limit_multiplier;
named_value<uint32_t> reader_concurrency_semaphore_kill_limit_multiplier;
named_value<uint32_t> reader_concurrency_semaphore_cpu_concurrency;
named_value<uint32_t> twcs_max_window_count;
named_value<unsigned> initial_sstable_loading_concurrency;
named_value<bool> enable_3_1_0_compatibility_mode;

View File

@@ -155,7 +155,7 @@ future<> cql_table_large_data_handler::try_record(std::string_view large_table,
const auto sstable_name = large_data_handler::sst_filename(sst);
std::string pk_str = key_to_str(partition_key.to_partition_key(s), s);
auto timestamp = db_clock::now();
large_data_logger.warn("Writing large {} {}/{}: {} ({} bytes) to {}", desc, ks_name, cf_name, extra_path, size, sstable_name);
large_data_logger.warn("Writing large {} {}/{}: {}{} ({} bytes) to {}", desc, ks_name, cf_name, pk_str, extra_path, size, sstable_name);
return _sys_ks->execute_cql(req, ks_name, cf_name, sstable_name, size, pk_str, timestamp, args...)
.discard_result()
.handle_exception([ks_name, cf_name, large_table, sstable_name] (std::exception_ptr ep) {
@@ -182,10 +182,10 @@ future<> cql_table_large_data_handler::internal_record_large_cells(const sstable
if (clustering_key) {
const schema &s = *sst.get_schema();
auto ck_str = key_to_str(*clustering_key, s);
return try_record("cell", sst, partition_key, int64_t(cell_size), cell_type, column_name, extra_fields, ck_str, column_name);
return try_record("cell", sst, partition_key, int64_t(cell_size), cell_type, format("/{}/{}", ck_str, column_name), extra_fields, ck_str, column_name);
} else {
auto desc = format("static {}", cell_type);
return try_record("cell", sst, partition_key, int64_t(cell_size), desc, column_name, extra_fields, data_value::make_null(utf8_type), column_name);
return try_record("cell", sst, partition_key, int64_t(cell_size), desc, format("//{}", column_name), extra_fields, data_value::make_null(utf8_type), column_name);
}
}
@@ -197,10 +197,10 @@ future<> cql_table_large_data_handler::internal_record_large_cells_and_collectio
if (clustering_key) {
const schema &s = *sst.get_schema();
auto ck_str = key_to_str(*clustering_key, s);
return try_record("cell", sst, partition_key, int64_t(cell_size), cell_type, column_name, extra_fields, ck_str, column_name, data_value((int64_t)collection_elements));
return try_record("cell", sst, partition_key, int64_t(cell_size), cell_type, format("/{}/{}", ck_str, column_name), extra_fields, ck_str, column_name, data_value((int64_t)collection_elements));
} else {
auto desc = format("static {}", cell_type);
return try_record("cell", sst, partition_key, int64_t(cell_size), desc, column_name, extra_fields, data_value::make_null(utf8_type), column_name, data_value((int64_t)collection_elements));
return try_record("cell", sst, partition_key, int64_t(cell_size), desc, format("//{}", column_name), extra_fields, data_value::make_null(utf8_type), column_name, data_value((int64_t)collection_elements));
}
}
@@ -210,7 +210,7 @@ future<> cql_table_large_data_handler::record_large_rows(const sstables::sstable
if (clustering_key) {
const schema &s = *sst.get_schema();
std::string ck_str = key_to_str(*clustering_key, s);
return try_record("row", sst, partition_key, int64_t(row_size), "row", "", extra_fields, ck_str);
return try_record("row", sst, partition_key, int64_t(row_size), "row", format("/{}", ck_str), extra_fields, ck_str);
} else {
return try_record("row", sst, partition_key, int64_t(row_size), "static row", "", extra_fields, data_value::make_null(utf8_type));
}

View File

@@ -55,10 +55,6 @@ public:
return ser::serialize_to_buffer<bytes>(_paxos_gc_sec);
}
std::string options_to_string() const override {
return std::to_string(_paxos_gc_sec);
}
static int32_t deserialize(const bytes_view& buffer) {
return ser::deserialize_from_buffer(buffer, boost::type<int32_t>());
}

View File

@@ -973,7 +973,7 @@ future<> merge_schema(sharded<db::system_keyspace>& sys_ks, distributed<service:
if (this_shard_id() != 0) {
// mutations must be applied on the owning shard (0).
co_await smp::submit_to(0, [&, fmuts = freeze(mutations)] () mutable -> future<> {
return merge_schema(sys_ks, proxy, feat, unfreeze(fmuts), reload);
return merge_schema(sys_ks, proxy, feat, unfreeze(fmuts));
});
co_return;
}

View File

@@ -29,6 +29,7 @@
#include <seastar/core/coroutine.hh>
#include <seastar/core/future-util.hh>
#include <seastar/coroutine/maybe_yield.hh>
#include <seastar/coroutine/parallel_for_each.hh>
#include <boost/range/adaptor/transformed.hpp>
@@ -214,54 +215,25 @@ static thread_local std::pair<std::string_view, data_type> new_columns[] {
{"workload_type", utf8_type}
};
static bool has_missing_columns(data_dictionary::database db) noexcept {
assert(this_shard_id() == 0);
try {
auto schema = db.find_schema(system_distributed_keyspace::NAME, system_distributed_keyspace::SERVICE_LEVELS);
for (const auto& col : new_columns) {
auto& [col_name, col_type] = col;
bytes options_name = to_bytes(col_name.data());
if (schema->get_column_definition(options_name)) {
continue;
}
return true;
}
} catch (...) {
dlogger.warn("Failed to update options column in the role attributes table: {}", std::current_exception());
return true;
}
return false;
static schema_ptr get_current_service_levels(data_dictionary::database db) {
return db.has_schema(system_distributed_keyspace::NAME, system_distributed_keyspace::SERVICE_LEVELS)
? db.find_schema(system_distributed_keyspace::NAME, system_distributed_keyspace::SERVICE_LEVELS)
: service_levels();
}
static future<> add_new_columns_if_missing(replica::database& db, ::service::migration_manager& mm, ::service::group0_guard group0_guard) noexcept {
static schema_ptr get_updated_service_levels(data_dictionary::database db) {
assert(this_shard_id() == 0);
try {
auto schema = db.find_schema(system_distributed_keyspace::NAME, system_distributed_keyspace::SERVICE_LEVELS);
schema_builder b(schema);
bool updated = false;
for (const auto& col : new_columns) {
auto& [col_name, col_type] = col;
bytes options_name = to_bytes(col_name.data());
if (schema->get_column_definition(options_name)) {
continue;
}
updated = true;
b.with_column(options_name, col_type, column_kind::regular_column);
auto schema = get_current_service_levels(db);
schema_builder b(schema);
for (const auto& col : new_columns) {
auto& [col_name, col_type] = col;
bytes options_name = to_bytes(col_name.data());
if (schema->get_column_definition(options_name)) {
continue;
}
if (updated) {
schema_ptr table = b.build();
try {
auto ts = group0_guard.write_timestamp();
co_return co_await mm.announce(co_await service::prepare_column_family_update_announcement(mm.get_storage_proxy(), table, false,
std::vector<view_ptr>(), ts), std::move(group0_guard), "Add new columns to system_distributed.service_levels");
} catch (...) {}
}
} catch (...) {
// FIXME: do we really want to allow the node to boot if the table fails to update?
// Will this not prevent other components from working correctly?
dlogger.warn("Failed to update options column in the role attributes table: {}", std::current_exception());
b.with_column(options_name, col_type, column_kind::regular_column);
}
return b.build();
}
future<> system_distributed_keyspace::start() {
@@ -270,79 +242,81 @@ future<> system_distributed_keyspace::start() {
co_return;
}
// FIXME: fix this code to `announce` once
auto db = _sp.data_dictionary();
auto tables = ensured_tables();
if (!_sp.get_db().local().has_keyspace(NAME)) {
auto group0_guard = co_await _mm.start_group0_operation();
auto ts = group0_guard.write_timestamp();
// Check if there is any work to do before taking the group 0 guard.
bool keyspaces_setup = db.has_keyspace(NAME) && db.has_keyspace(NAME_EVERYWHERE);
bool tables_setup = std::all_of(tables.begin(), tables.end(), [db] (schema_ptr t) { return db.has_schema(t->ks_name(), t->cf_name()); } );
bool service_levels_up_to_date = get_current_service_levels(db)->equal_columns(*get_updated_service_levels(db));
if (keyspaces_setup && tables_setup && service_levels_up_to_date) {
dlogger.info("system_distributed(_everywhere) keyspaces and tables are up-to-date. Not creating");
_started = true;
co_return;
}
try {
auto ksm = keyspace_metadata::new_keyspace(
NAME,
"org.apache.cassandra.locator.SimpleStrategy",
{{"replication_factor", "3"}},
true /* durable_writes */);
co_await _mm.announce(service::prepare_new_keyspace_announcement(_sp.local_db(), ksm, ts), std::move(group0_guard),
"Create system_distributed keyspace");
} catch (exceptions::already_exists_exception&) {}
auto group0_guard = co_await _mm.start_group0_operation();
auto ts = group0_guard.write_timestamp();
std::vector<mutation> mutations;
sstring description;
auto sd_ksm = keyspace_metadata::new_keyspace(
NAME,
"org.apache.cassandra.locator.SimpleStrategy",
{{"replication_factor", "3"}},
true /* durable_writes */);
if (!db.has_keyspace(NAME)) {
mutations = service::prepare_new_keyspace_announcement(db.real_database(), sd_ksm, ts);
description += format(" create {} keyspace;", NAME);
} else {
dlogger.info("{} keyspace is already present. Not creating", NAME);
}
if (!_sp.get_db().local().has_keyspace(NAME_EVERYWHERE)) {
auto group0_guard = co_await _mm.start_group0_operation();
auto ts = group0_guard.write_timestamp();
try {
auto ksm = keyspace_metadata::new_keyspace(
NAME_EVERYWHERE,
"org.apache.cassandra.locator.EverywhereStrategy",
{},
true /* durable_writes */);
co_await _mm.announce(service::prepare_new_keyspace_announcement(_sp.local_db(), ksm, ts), std::move(group0_guard),
"Create system_distributed_everywhere keyspace");
} catch (exceptions::already_exists_exception&) {}
auto sde_ksm = keyspace_metadata::new_keyspace(
NAME_EVERYWHERE,
"org.apache.cassandra.locator.EverywhereStrategy",
{},
true /* durable_writes */);
if (!db.has_keyspace(NAME_EVERYWHERE)) {
auto sde_mutations = service::prepare_new_keyspace_announcement(db.real_database(), sde_ksm, ts);
std::move(sde_mutations.begin(), sde_mutations.end(), std::back_inserter(mutations));
description += format(" create {} keyspace;", NAME_EVERYWHERE);
} else {
dlogger.info("{} keyspace is already present. Not creating", NAME_EVERYWHERE);
}
auto tables = ensured_tables();
bool exist = std::all_of(tables.begin(), tables.end(), [this] (schema_ptr s) {
return _sp.get_db().local().has_schema(s->ks_name(), s->cf_name());
});
// Get mutations for creating and updating tables.
auto num_keyspace_mutations = mutations.size();
co_await coroutine::parallel_for_each(ensured_tables(),
[this, &mutations, db, ts, sd_ksm, sde_ksm] (auto&& table) -> future<> {
auto ksm = table->ks_name() == NAME ? sd_ksm : sde_ksm;
if (!exist) {
auto group0_guard = co_await _mm.start_group0_operation();
auto ts = group0_guard.write_timestamp();
auto m = co_await map_reduce(tables,
/* Mapper */ [this, ts] (auto&& table) -> future<std::vector<mutation>> {
try {
co_return co_await service::prepare_new_column_family_announcement(_sp, std::move(table), ts);
} catch (exceptions::already_exists_exception&) {
co_return std::vector<mutation>();
}
},
/* Initial value*/ std::vector<mutation>(),
/* Reducer */ [] (std::vector<mutation> m1, std::vector<mutation> m2) {
std::move(m2.begin(), m2.end(), std::back_inserter(m1));
return m1;
});
if (m.size()) {
co_await _mm.announce(std::move(m), std::move(group0_guard),
"Create system_distributed(_everywhere) tables");
// Ensure that the service_levels table contains new columns.
if (table->cf_name() == SERVICE_LEVELS) {
table = get_updated_service_levels(db);
}
if (!db.has_schema(table->ks_name(), table->cf_name())) {
co_return co_await service::prepare_new_column_family_announcement(mutations, _sp, *ksm, std::move(table), ts);
}
// The service_levels table exists. Update it if it lacks new columns.
if (table->cf_name() == SERVICE_LEVELS && !get_current_service_levels(db)->equal_columns(*table)) {
auto update_mutations = co_await service::prepare_column_family_update_announcement(_sp, table, false, std::vector<view_ptr>(), ts);
std::move(update_mutations.begin(), update_mutations.end(), std::back_inserter(mutations));
}
});
if (mutations.size() > num_keyspace_mutations) {
description += " create and update system_distributed(_everywhere) tables";
} else {
dlogger.info("All tables are present on start");
dlogger.info("All tables are present and up-to-date on start");
}
if (!mutations.empty()) {
co_await _mm.announce(std::move(mutations), std::move(group0_guard), description);
}
_started = true;
if (has_missing_columns(_qp.db())) {
auto group0_guard = co_await _mm.start_group0_operation();
co_await add_new_columns_if_missing(_qp.db().real_database(), _mm, std::move(group0_guard));
} else {
dlogger.info("All schemas are uptodate on start");
}
}
future<> system_distributed_keyspace::stop() {

View File

@@ -11,6 +11,7 @@
#include <boost/range/adaptor/transformed.hpp>
#include <seastar/core/coroutine.hh>
#include <seastar/coroutine/parallel_for_each.hh>
#include "system_keyspace.hh"
#include "cql3/untyped_result_set.hh"
#include "utils/fb_utilities.hh"
@@ -1052,7 +1053,6 @@ schema_ptr system_keyspace::sstables_registry() {
return schema_builder(NAME, SSTABLES_REGISTRY, id)
.with_column("location", utf8_type, column_kind::partition_key)
.with_column("generation", timeuuid_type, column_kind::clustering_key)
.with_column("uuid", uuid_type)
.with_column("status", utf8_type)
.with_column("version", utf8_type)
.with_column("format", utf8_type)
@@ -1465,6 +1465,27 @@ future<std::unordered_map<table_id, db_clock::time_point>> system_keyspace::load
co_return result;
}
future<> system_keyspace::drop_truncation_rp_records() {
sstring req = format("SELECT table_uuid, shard, segment_id from system.{}", TRUNCATED);
auto rs = co_await execute_cql(req);
bool any = false;
co_await coroutine::parallel_for_each(rs->begin(), rs->end(), [&] (const cql3::untyped_result_set_row& row) -> future<> {
auto table_uuid = table_id(row.get_as<utils::UUID>("table_uuid"));
auto shard = row.get_as<int32_t>("shard");
auto segment_id = row.get_as<int64_t>("segment_id");
if (segment_id != 0) {
any = true;
sstring req = format("UPDATE system.{} SET segment_id = 0, position = 0 WHERE table_uuid = {} AND shard = {}", TRUNCATED, table_uuid, shard);
co_await execute_cql(req);
}
});
if (any) {
co_await force_blocking_flush(TRUNCATED);
}
}
future<> system_keyspace::save_truncation_record(const replica::column_family& cf, db_clock::time_point truncated_at, db::replay_position rp) {
sstring req = format("INSERT INTO system.{} (table_uuid, shard, position, segment_id, truncated_at) VALUES(?,?,?,?,?)", TRUNCATED);
co_await _qp.execute_internal(req, {cf.schema()->id().uuid(), int32_t(rp.shard_id()), int32_t(rp.pos), int64_t(rp.base_id()), truncated_at}, cql3::query_processor::cache_internal::yes);
@@ -1548,6 +1569,9 @@ future<> system_keyspace::update_tokens(gms::inet_address ep, const std::unorder
slogger.debug("INSERT INTO system.{} (peer, tokens) VALUES ({}, {})", PEERS, ep, tokens);
auto set_type = set_type_impl::get_instance(utf8_type, true);
co_await execute_cql(req, ep.addr(), make_set_value(set_type, prepare_tokens(tokens))).discard_result();
if (!_db.uses_schema_commitlog()) {
co_await force_blocking_flush(PEERS);
}
}
@@ -1653,7 +1677,7 @@ future<> system_keyspace::set_scylla_local_param_as(const sstring& key, const T&
sstring req = format("UPDATE system.{} SET value = ? WHERE key = ?", system_keyspace::SCYLLA_LOCAL);
auto type = data_type_for<T>();
co_await execute_cql(req, type->to_string_impl(data_value(value)), key).discard_result();
if (visible_before_cl_replay) {
if (visible_before_cl_replay || !_db.uses_schema_commitlog()) {
co_await force_blocking_flush(SCYLLA_LOCAL);
}
}
@@ -1692,6 +1716,9 @@ future<> system_keyspace::remove_endpoint(gms::inet_address ep) {
sstring req = format("DELETE FROM system.{} WHERE peer = ?", PEERS);
slogger.debug("DELETE FROM system.{} WHERE peer = {}", PEERS, ep);
co_await execute_cql(req, ep.addr()).discard_result();
if (!_db.uses_schema_commitlog()) {
co_await force_blocking_flush(PEERS);
}
}
future<> system_keyspace::update_tokens(const std::unordered_set<dht::token>& tokens) {
@@ -1702,6 +1729,9 @@ future<> system_keyspace::update_tokens(const std::unordered_set<dht::token>& to
sstring req = format("INSERT INTO system.{} (key, tokens) VALUES (?, ?)", LOCAL);
auto set_type = set_type_impl::get_instance(utf8_type, true);
co_await execute_cql(req, sstring(LOCAL), make_set_value(set_type, prepare_tokens(tokens)));
if (!_db.uses_schema_commitlog()) {
co_await force_blocking_flush(PEERS);
}
}
future<> system_keyspace::force_blocking_flush(sstring cfname) {
@@ -1747,6 +1777,9 @@ future<> system_keyspace::update_cdc_generation_id(cdc::generation_id gen_id) {
sstring(v3::CDC_LOCAL), id.ts, id.id);
}
), gen_id);
if (!_db.uses_schema_commitlog()) {
co_await force_blocking_flush(v3::CDC_LOCAL);
}
}
future<std::optional<cdc::generation_id>> system_keyspace::get_cdc_generation_id() {
@@ -1828,6 +1861,9 @@ future<> system_keyspace::set_bootstrap_state(bootstrap_state state) {
sstring req = format("INSERT INTO system.{} (key, bootstrapped) VALUES (?, ?)", LOCAL);
co_await execute_cql(req, sstring(LOCAL), state_name).discard_result();
if (!_db.uses_schema_commitlog()) {
co_await force_blocking_flush(LOCAL);
}
co_await container().invoke_on_all([state] (auto& sys_ks) {
sys_ks._cache->_state = state;
});
@@ -1879,7 +1915,8 @@ std::vector<schema_ptr> system_keyspace::all_tables(const db::config& cfg) {
static bool maybe_write_in_user_memory(schema_ptr s) {
return (s.get() == system_keyspace::batchlog().get()) || (s.get() == system_keyspace::paxos().get())
|| s == system_keyspace::v3::scylla_views_builds_in_progress();
|| s == system_keyspace::v3::scylla_views_builds_in_progress()
|| s == system_keyspace::raft();
}
future<> system_keyspace::make(
@@ -2027,6 +2064,9 @@ future<int> system_keyspace::increment_and_get_generation() {
}
req = format("INSERT INTO system.{} (key, gossip_generation) VALUES ('{}', ?)", LOCAL, LOCAL);
co_await _qp.execute_internal(req, {generation.value()}, cql3::query_processor::cache_internal::yes);
if (!_db.uses_schema_commitlog()) {
co_await force_blocking_flush(LOCAL);
}
co_return generation;
}
@@ -2691,24 +2731,10 @@ mutation system_keyspace::make_cleanup_candidate_mutation(std::optional<cdc::gen
return m;
}
future<> system_keyspace::sstables_registry_create_entry(sstring location, utils::UUID uuid, sstring status, sstables::entry_descriptor desc) {
static const auto req = format("INSERT INTO system.{} (location, generation, uuid, status, version, format) VALUES (?, ?, ?, ?, ?, ?)", SSTABLES_REGISTRY);
slogger.trace("Inserting {}.{}:{} into {}", location, desc.generation, uuid, SSTABLES_REGISTRY);
co_await execute_cql(req, location, desc.generation, uuid, status, fmt::to_string(desc.version), fmt::to_string(desc.format)).discard_result();
}
future<utils::UUID> system_keyspace::sstables_registry_lookup_entry(sstring location, sstables::generation_type gen) {
static const auto req = format("SELECT uuid FROM system.{} WHERE location = ? AND generation = ?", SSTABLES_REGISTRY);
slogger.trace("Looking up {}.{} in {}", location, gen, SSTABLES_REGISTRY);
auto msg = co_await execute_cql(req, location, gen);
if (msg->empty() || !msg->one().has("uuid")) {
slogger.trace("ERROR: Cannot find {}.{} in {}", location, gen, SSTABLES_REGISTRY);
co_await coroutine::return_exception(std::runtime_error("No entry in sstables registry"));
}
auto uuid = msg->one().get_as<utils::UUID>("uuid");
slogger.trace("Found {}.{}:{} in {}", location, gen, uuid, SSTABLES_REGISTRY);
co_return uuid;
future<> system_keyspace::sstables_registry_create_entry(sstring location, sstring status, sstables::entry_descriptor desc) {
static const auto req = format("INSERT INTO system.{} (location, generation, status, version, format) VALUES (?, ?, ?, ?, ?)", SSTABLES_REGISTRY);
slogger.trace("Inserting {}.{} into {}", location, desc.generation, SSTABLES_REGISTRY);
co_await execute_cql(req, location, desc.generation, status, fmt::to_string(desc.version), fmt::to_string(desc.format)).discard_result();
}
future<> system_keyspace::sstables_registry_update_entry_status(sstring location, sstables::generation_type gen, sstring status) {
@@ -2724,17 +2750,16 @@ future<> system_keyspace::sstables_registry_delete_entry(sstring location, sstab
}
future<> system_keyspace::sstables_registry_list(sstring location, sstable_registry_entry_consumer consumer) {
static const auto req = format("SELECT uuid, status, generation, version, format FROM system.{} WHERE location = ?", SSTABLES_REGISTRY);
static const auto req = format("SELECT status, generation, version, format FROM system.{} WHERE location = ?", SSTABLES_REGISTRY);
slogger.trace("Listing {} entries from {}", location, SSTABLES_REGISTRY);
co_await _qp.query_internal(req, db::consistency_level::ONE, { location }, 1000, [ consumer = std::move(consumer) ] (const cql3::untyped_result_set::row& row) -> future<stop_iteration> {
auto uuid = row.get_as<utils::UUID>("uuid");
auto status = row.get_as<sstring>("status");
auto gen = sstables::generation_type(row.get_as<utils::UUID>("generation"));
auto ver = sstables::version_from_string(row.get_as<sstring>("version"));
auto fmt = sstables::format_from_string(row.get_as<sstring>("format"));
sstables::entry_descriptor desc(gen, ver, fmt, sstables::component_type::TOC);
co_await consumer(std::move(uuid), std::move(status), std::move(desc));
co_await consumer(std::move(status), std::move(desc));
co_return stop_iteration::no;
});
}

View File

@@ -355,6 +355,7 @@ public:
future<> save_truncation_record(const replica::column_family&, db_clock::time_point truncated_at, db::replay_position);
future<replay_positions> get_truncated_positions(table_id);
future<> drop_truncation_rp_records();
/**
* Return a map of stored tokens to IP addresses
@@ -497,11 +498,10 @@ public:
// Assumes that the history table exists, i.e. Raft experimental feature is enabled.
static future<mutation> get_group0_history(distributed<replica::database>&);
future<> sstables_registry_create_entry(sstring location, utils::UUID uuid, sstring status, sstables::entry_descriptor desc);
future<utils::UUID> sstables_registry_lookup_entry(sstring location, sstables::generation_type gen);
future<> sstables_registry_create_entry(sstring location, sstring status, sstables::entry_descriptor desc);
future<> sstables_registry_update_entry_status(sstring location, sstables::generation_type gen, sstring status);
future<> sstables_registry_delete_entry(sstring location, sstables::generation_type gen);
using sstable_registry_entry_consumer = noncopyable_function<future<>(utils::UUID uuid, sstring state, sstables::entry_descriptor desc)>;
using sstable_registry_entry_consumer = noncopyable_function<future<>(sstring state, sstables::entry_descriptor desc)>;
future<> sstables_registry_list(sstring location, sstable_registry_entry_consumer consumer);
future<std::optional<sstring>> load_group0_upgrade_state();

View File

@@ -493,56 +493,37 @@ mutation_partition& view_updates::partition_for(partition_key&& key) {
}
size_t view_updates::op_count() const {
return _op_count;
return _op_count++;;
}
row_marker view_updates::compute_row_marker(const clustering_or_static_row& base_row) const {
/*
* We need to compute both the timestamp and expiration for view rows.
* We need to compute both the timestamp and expiration.
*
* Below there are several distinct cases depending on how many new key
* columns the view has - i.e., how many of the view's key columns were
* regular columns in the base. base_regular_columns_in_view_pk.size():
*
* Zero new key columns:
* The view rows key is composed only from base key columns, and those
* cannot be changed in an update, so the view row remains alive as
* long as the base row is alive. We need to return the same row
* marker as the base for the view - to keep an empty view row alive
* for as long as an empty base row exists.
* Note that in this case, if there are *unselected* base columns, we
* may need to keep an empty view row alive even without a row marker
* because the base row (which has additional columns) is still alive.
* For that we have the "virtual columns" feature: In the zero new
* key columns case, we put unselected columns in the view as empty
* columns, to keep the view row alive.
*
* One new key column:
* In this case, there is a regular base column that is part of the
* view key. This regular column can be added or deleted in an update,
* or its expiration be set, and those can cause the view row -
* including its row marker - to need to appear or disappear as well.
* So the liveness of cell of this one column determines the liveness
* of the view row and the row marker that we return.
*
* Two or more new key columns:
* This case is explicitly NOT supported in CQL - one cannot create a
* view with more than one base-regular columns in its key. In general
* picking one liveness (timestamp and expiration) is not possible
* if there are multiple regular base columns in the view key, as
* those can have different liveness.
* However, we do allow this case for Alternator - we need to allow
* the case of two (but not more) because the DynamoDB API allows
* creating a GSI whose two key columns (hash and range key) were
* regular columns.
* We can support this case in Alternator because it doesn't use
* expiration (the "TTL" it does support is different), and doesn't
* support user-defined timestamps. But, the two columns can still
* have different timestamps - this happens if an update modifies
* just one of them. In this case the timestamp of the view update
* (and that of the row marker we return) is the later of these two
* updated columns.
* There are 3 cases:
* 1) There is a column that is not in the base PK but is in the view PK. In that case, as long as that column
* lives, the view entry does too, but as soon as it expires (or is deleted for that matter) the entry also
* should expire. So the expiration for the view is the one of that column, regardless of any other expiration.
* To take an example of that case, if you have:
* CREATE TABLE t (a int, b int, c int, PRIMARY KEY (a, b))
* CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE c IS NOT NULL AND a IS NOT NULL AND b IS NOT NULL PRIMARY KEY (c, a, b)
* INSERT INTO t(a, b) VALUES (0, 0) USING TTL 3;
* UPDATE t SET c = 0 WHERE a = 0 AND b = 0;
* then even after 3 seconds elapsed, the row will still exist (it just won't have a "row marker" anymore) and so
* the MV should still have a corresponding entry.
* This cell determines the liveness of the view row.
* 2) The columns for the base and view PKs are exactly the same, and all base columns are selected by the view.
* In that case, all components (marker, deletion and cells) are the same and trivially mapped.
* 3) The columns for the base and view PKs are exactly the same, but some base columns are not selected in the view.
* Use the max timestamp out of the base row marker and all the unselected columns - this ensures we can keep the
* view row alive. Do the same thing for the expiration, if the marker is dead or will expire, and so
* will all unselected columns.
*/
// WARNING: The code assumes that if multiple regular base columns are present in the view key,
// they share liveness information. It's true especially in the only case currently allowed by CQL,
// which assumes there's up to one non-pk column in the view key. It's also true in alternator,
// which does not carry TTL information.
const auto& col_ids = base_row.is_clustering_row()
? _base_info->base_regular_columns_in_view_pk()
: _base_info->base_static_columns_in_view_pk();
@@ -550,20 +531,7 @@ row_marker view_updates::compute_row_marker(const clustering_or_static_row& base
auto& def = _base->column_at(base_row.column_kind(), col_ids[0]);
// Note: multi-cell columns can't be part of the primary key.
auto cell = base_row.cells().cell_at(col_ids[0]).as_atomic_cell(def);
auto ts = cell.timestamp();
if (col_ids.size() > 1){
// As explained above, this case only happens in Alternator,
// and we may need to pick a higher ts:
auto& second_def = _base->column_at(base_row.column_kind(), col_ids[1]);
auto second_cell = base_row.cells().cell_at(col_ids[1]).as_atomic_cell(second_def);
auto second_ts = second_cell.timestamp();
ts = std::max(ts, second_ts);
// Alternator isn't supposed to have TTL or more than two col_ids!
if (col_ids.size() != 2 || cell.is_live_and_has_ttl() || second_cell.is_live_and_has_ttl()) [[unlikely]] {
utils::on_internal_error(format("Unexpected col_ids length {} or has TTL", col_ids.size()));
}
}
return cell.is_live_and_has_ttl() ? row_marker(ts, cell.ttl(), cell.expiry()) : row_marker(ts);
return cell.is_live_and_has_ttl() ? row_marker(cell.timestamp(), cell.ttl(), cell.expiry()) : row_marker(cell.timestamp());
}
return base_row.marker();
@@ -962,22 +930,8 @@ void view_updates::do_delete_old_entry(const partition_key& base_key, const clus
// Note: multi-cell columns can't be part of the primary key.
auto& def = _base->column_at(kind, col_ids[0]);
auto cell = existing.cells().cell_at(col_ids[0]).as_atomic_cell(def);
auto ts = cell.timestamp();
if (col_ids.size() > 1) {
// This is the Alternator-only support for two regular base
// columns that become view key columns. See explanation in
// view_updates::compute_row_marker().
auto& second_def = _base->column_at(kind, col_ids[1]);
auto second_cell = existing.cells().cell_at(col_ids[1]).as_atomic_cell(second_def);
auto second_ts = second_cell.timestamp();
ts = std::max(ts, second_ts);
// Alternator isn't supposed to have more than two col_ids!
if (col_ids.size() != 2) [[unlikely]] {
utils::on_internal_error(format("Unexpected col_ids length {}", col_ids.size()));
}
}
if (cell.is_live()) {
r->apply(shadowable_tombstone(ts, now));
r->apply(shadowable_tombstone(cell.timestamp(), now));
}
} else {
// "update" caused the base row to have been deleted, and !col_id
@@ -1362,12 +1316,11 @@ void view_update_builder::generate_update(static_row&& update, const tombstone&
future<stop_iteration> view_update_builder::on_results() {
constexpr size_t max_rows_for_view_updates = 100;
auto should_stop_updates = [this] () -> bool {
size_t rows_for_view_updates = std::accumulate(_view_updates.begin(), _view_updates.end(), 0, [] (size_t acc, const view_updates& vu) {
return acc + vu.op_count();
});
return rows_for_view_updates >= max_rows_for_view_updates;
};
size_t rows_for_view_updates = std::accumulate(_view_updates.begin(), _view_updates.end(), 0, [] (size_t acc, const view_updates& vu) {
return acc + vu.op_count();
});
const bool stop_updates = rows_for_view_updates >= max_rows_for_view_updates;
if (_update && !_update->is_end_of_partition() && _existing && !_existing->is_end_of_partition()) {
auto cmp = position_in_partition::tri_compare(*_schema)(_update->position(), _existing->position());
if (cmp < 0) {
@@ -1390,7 +1343,7 @@ future<stop_iteration> view_update_builder::on_results() {
: std::nullopt;
generate_update(std::move(update), _update_partition_tombstone, std::move(existing), _existing_partition_tombstone);
}
return should_stop_updates() ? stop() : advance_updates();
return stop_updates ? stop() : advance_updates();
}
if (cmp > 0) {
// We have something existing but no update (which will happen either because it's a range tombstone marker in
@@ -1426,7 +1379,7 @@ future<stop_iteration> view_update_builder::on_results() {
generate_update(std::move(update), _update_partition_tombstone, { std::move(existing) }, _existing_partition_tombstone);
}
}
return should_stop_updates() ? stop () : advance_existings();
return stop_updates ? stop () : advance_existings();
}
// We're updating a row that had pre-existing data
if (_update->is_range_tombstone_change()) {
@@ -1448,9 +1401,8 @@ future<stop_iteration> view_update_builder::on_results() {
mutation_fragment_v2::printer(*_schema, *_update), mutation_fragment_v2::printer(*_schema, *_existing)));
}
generate_update(std::move(*_update).as_static_row(), _update_partition_tombstone, { std::move(*_existing).as_static_row() }, _existing_partition_tombstone);
}
return should_stop_updates() ? stop() : advance_all();
return stop_updates ? stop() : advance_all();
}
auto tombstone = std::max(_update_partition_tombstone, _update_current_tombstone);
@@ -1465,7 +1417,7 @@ future<stop_iteration> view_update_builder::on_results() {
auto update = static_row();
generate_update(std::move(update), _update_partition_tombstone, { std::move(existing) }, _existing_partition_tombstone);
}
return should_stop_updates() ? stop() : advance_existings();
return stop_updates ? stop() : advance_existings();
}
// If we have updates and it's a range tombstone, it removes nothing pre-exisiting, so we can ignore it
@@ -1486,7 +1438,7 @@ future<stop_iteration> view_update_builder::on_results() {
: std::nullopt;
generate_update(std::move(*_update).as_static_row(), _update_partition_tombstone, std::move(existing), _existing_partition_tombstone);
}
return should_stop_updates() ? stop() : advance_updates();
return stop_updates ? stop() : advance_updates();
}
return stop();
@@ -1667,13 +1619,6 @@ static bool should_update_synchronously(const schema& s) {
return *tag_opt == "true";
}
size_t memory_usage_of(const frozen_mutation_and_schema& mut) {
// Overhead of sending a view mutation, in terms of data structures used by the storage_proxy, as well as possible background tasks
// allocated for a remote view update.
constexpr size_t base_overhead_bytes = 2288;
return base_overhead_bytes + mut.fm.representation().size();
}
// Take the view mutations generated by generate_view_updates(), which pertain
// to a modification of a single base partition, and apply them to the
// appropriate paired replicas. This is done asynchronously - we do not wait
@@ -1698,7 +1643,7 @@ future<> view_update_generator::mutate_MV(
bool network_topology = dynamic_cast<const locator::network_topology_strategy*>(&ks.get_replication_strategy());
auto target_endpoint = get_view_natural_endpoint(ermp, network_topology, base_token, view_token);
auto remote_endpoints = ermp->get_pending_endpoints(view_token);
auto sem_units = seastar::make_lw_shared<db::timeout_semaphore_units>(pending_view_updates.split(memory_usage_of(mut)));
auto sem_units = pending_view_updates.split(mut.fm.representation().size());
const bool update_synchronously = should_update_synchronously(*mut.s);
if (update_synchronously) {
@@ -1744,9 +1689,9 @@ future<> view_update_generator::mutate_MV(
auto mut_ptr = remote_endpoints.empty() ? std::make_unique<frozen_mutation>(std::move(mut.fm)) : std::make_unique<frozen_mutation>(mut.fm);
tracing::trace(tr_state, "Locally applying view update for {}.{}; base token = {}; view token = {}",
mut.s->ks_name(), mut.s->cf_name(), base_token, view_token);
local_view_update = _proxy.local().mutate_mv_locally(mut.s, *mut_ptr, tr_state, db::commitlog::force_sync::no).then_wrapped(
local_view_update = _proxy.local().mutate_locally(mut.s, *mut_ptr, tr_state, db::commitlog::force_sync::no).then_wrapped(
[s = mut.s, &stats, &cf_stats, tr_state, base_token, view_token, my_address, mut_ptr = std::move(mut_ptr),
sem_units] (future<>&& f) {
units = sem_units.split(sem_units.count())] (future<>&& f) {
--stats.writes;
if (f.failed()) {
++stats.view_updates_failed_local;
@@ -1783,7 +1728,7 @@ future<> view_update_generator::mutate_MV(
schema_ptr s = mut.s;
future<> view_update = apply_to_remote_endpoints(_proxy.local(), std::move(ermp), *target_endpoint, std::move(remote_endpoints), std::move(mut), base_token, view_token, allow_hints, tr_state).then_wrapped(
[s = std::move(s), &stats, &cf_stats, tr_state, base_token, view_token, target_endpoint, updates_pushed_remote,
sem_units, apply_update_synchronously] (future<>&& f) mutable {
units = sem_units.split(sem_units.count()), apply_update_synchronously] (future<>&& f) mutable {
if (f.failed()) {
stats.view_updates_failed_remote += updates_pushed_remote;
cf_stats.total_view_updates_failed_remote += updates_pushed_remote;
@@ -2310,7 +2255,7 @@ future<> view_builder::do_build_step() {
}
}
}).handle_exception([] (std::exception_ptr ex) {
vlogger.warn("Unexcepted error executing build step: {}. Ignored.", ex);
vlogger.warn("Unexcepted error executing build step: {}. Ignored.", std::current_exception());
});
}

View File

@@ -209,7 +209,7 @@ class view_updates final {
schema_ptr _base;
base_info_ptr _base_info;
std::unordered_map<partition_key, mutation_partition, partition_key::hashing, partition_key::equality> _updates;
size_t _op_count = 0;
mutable size_t _op_count = 0;
const bool _backing_secondary_index;
public:
explicit view_updates(view_and_base vab, bool backing_secondary_index)
@@ -318,8 +318,6 @@ future<query::clustering_row_ranges> calculate_affected_clustering_ranges(
bool needs_static_row(const mutation_partition& mp, const std::vector<view_and_base>& views);
size_t memory_usage_of(const frozen_mutation_and_schema& mut);
/**
* create_virtual_column() adds a "virtual column" to a schema builder.
* The definition of a "virtual column" is based on the given definition

View File

@@ -234,12 +234,12 @@ void view_update_generator::do_abort() noexcept {
}
vug_logger.info("Terminating background fiber");
_db.unplug_view_update_generator();
_as.request_abort();
_pending_sstables.signal();
}
future<> view_update_generator::stop() {
_db.unplug_view_update_generator();
do_abort();
return std::move(_started).then([this] {
_registration_sem.broken();

View File

@@ -96,7 +96,6 @@ struct failure_detector::impl {
clock& _clock;
clock::interval_t _ping_period;
clock::interval_t _ping_timeout;
// Number of workers on each shard.
// We use this to decide where to create new workers (we pick a shard with the smallest number of workers).
@@ -139,7 +138,7 @@ struct failure_detector::impl {
// The unregistering process requires cross-shard operations which we perform on this fiber.
future<> _destroy_subscriptions = make_ready_future<>();
impl(failure_detector& parent, pinger&, clock&, clock::interval_t ping_period, clock::interval_t ping_timeout);
impl(failure_detector& parent, pinger&, clock&, clock::interval_t ping_period);
~impl();
// Inform update_endpoint_fiber() about an added/removed endpoint.
@@ -175,14 +174,12 @@ struct failure_detector::impl {
future<> mark(listener* l, pinger::endpoint_id ep, bool alive);
};
failure_detector::failure_detector(
pinger& pinger, clock& clock, clock::interval_t ping_period, clock::interval_t ping_timeout)
: _impl(std::make_unique<impl>(*this, pinger, clock, ping_period, ping_timeout))
failure_detector::failure_detector(pinger& pinger, clock& clock, clock::interval_t ping_period)
: _impl(std::make_unique<impl>(*this, pinger, clock, ping_period))
{}
failure_detector::impl::impl(
failure_detector& parent, pinger& pinger, clock& clock, clock::interval_t ping_period, clock::interval_t ping_timeout)
: _parent(parent), _pinger(pinger), _clock(clock), _ping_period(ping_period), _ping_timeout(ping_timeout) {
failure_detector::impl::impl(failure_detector& parent, pinger& pinger, clock& clock, clock::interval_t ping_period)
: _parent(parent), _pinger(pinger), _clock(clock), _ping_period(ping_period) {
if (this_shard_id() != 0) {
return;
}
@@ -539,9 +536,11 @@ future<> endpoint_worker::ping_fiber() noexcept {
auto start = clock.now();
auto next_ping_start = start + _fd._ping_period;
auto timeout = start + _fd._ping_timeout;
// If there's a listener that's going to timeout soon (before the ping returns), we abort the ping in order to handle
// A ping should take significantly less time than _ping_period, but we give it a multiple of ping_period before it times out
// just in case of transient network partitions.
// However, if there's a listener that's going to timeout soon (before the ping returns), we abort the ping in order to handle
// the listener (mark it as dead).
auto timeout = start + 3 * _fd._ping_period;
for (auto& [threshold, l]: _fd._listeners_liveness) {
if (l.endpoint_liveness[_id].alive && last_response + threshold < timeout) {
timeout = last_response + threshold;

View File

@@ -120,14 +120,14 @@ public:
// Every endpoint in the detected set will be periodically pinged every `ping_period`,
// assuming that the pings return in a timely manner. A ping may take longer than `ping_period`
// before it's aborted (up to `ping_timeout`), in which case the next ping will start immediately.
// before it's aborted (up to a certain multiple of `ping_period`), in which case the next ping
// will start immediately.
//
// `ping_period` should be chosen so that during normal operation, a ping takes significantly
// less time than `ping_period` (preferably at least an order of magnitude less).
//
// The passed-in value must be the same on every shard.
clock::interval_t ping_period,
// Duration after which a ping is aborted, so that next ping can be started
// (pings are sent sequentially).
clock::interval_t ping_timeout
clock::interval_t ping_period
);
~failure_detector();
@@ -147,7 +147,7 @@ public:
// The listener stops being called when the returned subscription is destroyed.
// The subscription must be destroyed before service is stopped.
//
// `threshold` should be significantly larger than `ping_timeout`, preferably at least an order of magnitude larger.
// `threshold` should be significantly larger than `ping_period`, preferably at least an order of magnitude larger.
//
// Different listeners may use different thresholds, depending on the use case:
// some listeners may want to mark endpoints as dead more aggressively if fast reaction times are important

View File

@@ -62,7 +62,8 @@ ExternalSizeMax=1024G
[Unit]
Description=Save coredump to scylla data directory
Conflicts=umount.target
Before=local-fs.target scylla-server.service
Before=scylla-server.service
After=local-fs.target
DefaultDependencies=no
[Mount]
@@ -72,7 +73,7 @@ Type=none
Options=bind
[Install]
WantedBy=local-fs.target
WantedBy=multi-user.target
'''[1:-1]
with open('/etc/systemd/system/var-lib-systemd-coredump.mount', 'w') as f:
f.write(dot_mount)

View File

@@ -10,7 +10,6 @@
import os
import re
from scylla_util import *
import resource
import subprocess
import argparse
import yaml
@@ -103,34 +102,6 @@ class scylla_cpuinfo:
else:
return len(self._cpu_data["system"])
def configure_iotune_open_fd_limit(shards_count):
try:
fd_limits = resource.getrlimit(resource.RLIMIT_NOFILE)
except (OSError, ValueError) as e:
logging.warning("Could not get the limit of count of open file descriptors!")
logging.warning("iotune will proceed with the default limit. This may cause problems.")
return
precalculated_fds_count = (10 * shards_count) + 500
soft_limit, hard_limit = fd_limits
if hard_limit == resource.RLIM_INFINITY:
# If there is no hard limit, then ensure that soft limit allows enough FDs.
soft_limit = max(soft_limit, precalculated_fds_count)
else:
# If hard_limit is greater than precalculated_fds_count, then set it as soft and as hard limit.
required_fds_count = max(hard_limit, precalculated_fds_count)
soft_limit = max(soft_limit, required_fds_count)
hard_limit = max(hard_limit, required_fds_count)
try:
resource.setrlimit(resource.RLIMIT_NOFILE, (soft_limit, hard_limit))
except (OSError, ValueError) as e:
logging.error(e)
logging.error("Could not set the limit of open file descriptors for iotune!")
logging.error(f"Required FDs count: {precalculated_fds_count}, default limit: {fd_limits}!")
sys.exit(1)
def run_iotune():
if "SCYLLA_CONF" in os.environ:
conf_dir = os.environ["SCYLLA_CONF"]
@@ -171,8 +142,6 @@ def run_iotune():
elif cpudata.smp():
iotune_args += [ "--smp", str(cpudata.smp()) ]
configure_iotune_open_fd_limit(cpudata.nr_shards())
try:
subprocess.check_call([bindir() + "/iotune",
"--format", "envfile",

View File

@@ -257,19 +257,19 @@ if __name__ == '__main__':
dev_type = 'realpath'
LOGGER.error(f'Failed to detect uuid, using {dev_type}: {mount_dev}')
after = ''
after = 'local-fs.target'
wants = ''
if raid and args.raid_level != '0':
after = wants = 'md_service'
after += f' {md_service}'
wants = f'\nWants={md_service}'
opt_discard = ''
if args.online_discard:
opt_discard = ',discard'
unit_data = f'''
[Unit]
Description=Scylla data directory
Before=local-fs.target scylla-server.service
After={after}
Wants={wants}
Before=scylla-server.service
After={after}{wants}
DefaultDependencies=no
[Mount]
@@ -279,7 +279,7 @@ Type=xfs
Options=noatime{opt_discard}
[Install]
WantedBy=local-fs.target
WantedBy=multi-user.target
'''[1:-1]
with open(f'/etc/systemd/system/{mntunit_bn}', 'w') as f:
f.write(unit_data)

View File

@@ -64,6 +64,7 @@ bcp "${packages[@]}" packages/
bcp dist/docker/etc etc/
bcp dist/docker/scylla-housekeeping-service.sh /scylla-housekeeping-service.sh
bcp dist/docker/sshd-service.sh /sshd-service.sh
bcp dist/docker/scyllasetup.py /scyllasetup.py
bcp dist/docker/commandlineparser.py /commandlineparser.py
@@ -73,11 +74,10 @@ bcp dist/docker/scylla_bashrc /scylla_bashrc
run apt-get -y clean expire-cache
run apt-get -y update
run apt-get -y upgrade
run apt-get -y install dialog apt-utils
run bash -ec "echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections"
run bash -ec "rm -rf /etc/rsyslog.conf"
run apt-get -y install hostname supervisor openjdk-11-jre-headless python2 python3 python3-yaml curl rsyslog sudo systemd
run apt-get -y install hostname supervisor openssh-server openssh-client openjdk-11-jre-headless python2 python3 python3-yaml curl rsyslog sudo
run bash -ec "echo LANG=C.UTF-8 > /etc/default/locale"
run bash -ec "dpkg -i packages/*.deb"
run apt-get -y clean all

View File

@@ -0,0 +1,6 @@
[program:sshd]
command=/sshd-service.sh
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0

View File

@@ -75,8 +75,7 @@ class ScyllaSetup:
hostname = self._listenAddress
else:
hostname = subprocess.check_output(['hostname', '-i']).decode('ascii').strip()
self._run(["mkdir", "-p", "%s/.cassandra" % home])
with open("%s/.cassandra/cqlshrc" % home, "w") as cqlshrc:
with open("%s/.cqlshrc" % home, "w") as cqlshrc:
cqlshrc.write("[connection]\nhostname = %s\n" % hostname)
def set_housekeeping(self):

15
dist/docker/sshd-service.sh vendored Executable file
View File

@@ -0,0 +1,15 @@
#!/bin/bash
if [ ! -f /run/sshd ]; then
mkdir -p /run/sshd
fi
if [ ! -f /etc/ssh/ssh_host_ed25519_key ]; then
ssh-keygen -t ed25519 -f /etc/ssh/ssh_host_ed25519_key -N ''
fi
if [ ! -f /etc/ssh/ssh_host_rsa_key ]; then
ssh-keygen -t rsa -b 4096 -f /etc/ssh/ssh_host_rsa_key -N ''
fi
/usr/sbin/sshd -D

View File

@@ -145,28 +145,10 @@ class AMIVersionsTemplateDirective(Directive):
version = self._extract_version_from_filename(filename)
return tuple(map(int, version.split("."))) if version else (0,)
def _get_current_version(self, current_version, stable_version):
prefix = 'branch-'
version = current_version
if current_version.startswith(prefix):
version = current_version
elif not stable_version.startswith(prefix):
LOGGER.error("Invalid stable_version format in conf.py. It should start with 'branch-'")
else:
version = stable_version
return version.replace(prefix, '')
def run(self):
app = self.state.document.settings.env.app
current_version = os.environ.get('SPHINX_MULTIVERSION_NAME', '')
stable_version = app.config.smv_latest_version
version_pattern = self._get_current_version(current_version, stable_version)
version_options = self.options.get("version", "")
if version_options:
version_pattern = version_options
version_pattern = self.options.get("version", "")
exclude_patterns = self.options.get("exclude", "").split(",")
download_directory = os.path.join(

View File

@@ -1,23 +1,14 @@
import os
import re
import yaml
from typing import Any, Dict, List
import jinja2
from sphinx import addnodes
from sphinx.application import Sphinx
from sphinx.directives import ObjectDescription
from sphinx.util import logging, status_iterator, ws_re
from sphinx.util.docfields import Field
from sphinx.util.docutils import switch_source_input, SphinxDirective
from sphinx.util.nodes import make_id, nested_parse_with_titles
from sphinx.jinja2glue import BuiltinTemplateLoader
from docutils import nodes
from sphinxcontrib.datatemplates.directive import DataTemplateYAML
from docutils.parsers.rst import directives
from docutils.statemachine import StringList
logger = logging.getLogger(__name__)
CONFIG_FILE_PATH = "../db/config.cc"
CONFIG_HEADER_FILE_PATH = "../db/config.hh"
DESTINATION_PATH = "_data/db_config.yaml"
class DBConfigParser:
@@ -56,18 +47,42 @@ class DBConfigParser:
"""
COMMENT_PATTERN = r"/\*.*?\*/|//.*?$"
all_properties = {}
def __init__(self, config_file_path, config_header_file_path):
def __init__(self, config_file_path, config_header_file_path, destination_path):
self.config_file_path = config_file_path
self.config_header_file_path = config_header_file_path
self.destination_path = destination_path
def _create_yaml_file(self, destination, data):
current_data = None
try:
with open(destination, "r") as file:
current_data = yaml.safe_load(file)
except FileNotFoundError:
pass
if current_data != data:
os.makedirs(os.path.dirname(destination), exist_ok=True)
with open(destination, "w") as file:
yaml.dump(data, file)
@staticmethod
def _clean_description(description):
return (
description.replace("\\n", "")
.replace('<', '&lt;')
.replace('>', '&gt;')
.replace("\n", "<br>")
.replace("\\t", "- ")
.replace('"', "")
)
def _clean_comments(self, content):
return re.sub(self.COMMENT_PATTERN, "", content, flags=re.DOTALL | re.MULTILINE)
def _parse_group(self, group_match, config_group_content):
group_name = group_match.group(1).strip()
group_description = group_match.group(2).strip() if group_match.group(2) else ""
group_description = self._clean_description(group_match.group(2).strip()) if group_match.group(2) else ""
current_group = {
"name": group_name,
@@ -96,16 +111,14 @@ class DBConfigParser:
config_matches = re.findall(self.CONFIG_CC_REGEX_PATTERN, content, re.DOTALL)
for match in config_matches:
name = match[1].strip()
property_data = {
"name": name,
"name": match[1].strip(),
"value_status": match[4].strip(),
"default": match[5].strip(),
"liveness": "True" if match[3] else "False",
"description": match[6].strip(),
"description": self._clean_description(match[6].strip()),
}
properties.append(property_data)
DBConfigParser.all_properties[name] = property_data
return properties
@@ -122,7 +135,7 @@ class DBConfigParser:
if property_data["name"] == property_key:
property_data["type"] = match[0].strip()
def parse(self):
def _parse_db_properties(self):
groups = []
with open(self.config_file_path, "r", encoding='utf-8') as file:
@@ -145,170 +158,26 @@ class DBConfigParser:
return groups
@classmethod
def get(cls, name: str):
return DBConfigParser.all_properties[name]
def run(self, app: Sphinx):
dest_path = os.path.join(app.builder.srcdir, self.destination_path)
parsed_properties = self._parse_db_properties()
self._create_yaml_file(dest_path, parsed_properties)
def readable_desc(description: str) -> str:
return (
description.replace("\\n", "")
.replace('<', '&lt;')
.replace('>', '&gt;')
.replace("\n", "<br>")
.replace("\\t", "- ")
.replace('"', "")
class DBConfigTemplateDirective(DataTemplateYAML):
option_spec = DataTemplateYAML.option_spec.copy()
option_spec["value_status"] = directives.unchanged_required
def _make_context(self, data, config, env):
context = super()._make_context(data, config, env)
context["value_status"] = self.options.get("value_status")
return context
def setup(app: Sphinx):
db_parser = DBConfigParser(
CONFIG_FILE_PATH, CONFIG_HEADER_FILE_PATH, DESTINATION_PATH
)
def maybe_add_filters(builder):
env = builder.templates.environment
if 'readable_desc' not in env.filters:
env.filters['readable_desc'] = readable_desc
class ConfigOption(ObjectDescription):
has_content = True
required_arguments = 1
optional_arguments = 0
final_argument_whitespace = False
# TODO: instead of overriding transform_content(), render option properties
# as a field list.
doc_field_types = [
Field('type',
label='Type',
has_arg=False,
names=('type',)),
Field('default',
label='Default value',
has_arg=False,
names=('default',)),
Field('liveness',
label='Liveness',
has_arg=False,
names=('liveness',)),
]
def handle_signature(self,
sig: str,
signode: addnodes.desc_signature) -> str:
signode.clear()
signode += addnodes.desc_name(sig, sig)
# normalize whitespace like XRefRole does
return ws_re.sub(' ', sig)
@property
def env(self):
document = self.state.document
return document.settings.env
def before_content(self) -> None:
maybe_add_filters(self.env.app.builder)
def _render(self, name) -> str:
item = DBConfigParser.get(name)
if item is None:
raise self.error(f'Option "{name}" not found!')
builder = self.env.app.builder
template = self.config.scylladb_cc_properties_option_tmpl
return builder.templates.render(template, item)
def transform_content(self,
contentnode: addnodes.desc_content) -> None:
name = self.arguments[0]
# the source is always None here
_, lineno = self.get_source_info()
source = f'scylla_config:{lineno}:<{name}>'
fields = StringList(self._render(name).splitlines(),
source=source, parent_offset=lineno)
with switch_source_input(self.state, fields):
self.state.nested_parse(fields, 0, contentnode)
def add_target_and_index(self,
name: str,
sig: str,
signode: addnodes.desc_signature) -> None:
node_id = make_id(self.env, self.state.document, self.objtype, name)
signode['ids'].append(node_id)
self.state.document.note_explicit_target(signode)
entry = f'{name}; configuration option'
self.indexnode['entries'].append(('pair', entry, node_id, '', None))
std = self.env.get_domain('std')
std.note_object(self.objtype, name, node_id, location=signode)
class ConfigOptionList(SphinxDirective):
has_content = False
required_arguments = 2
optional_arguments = 0
final_argument_whitespace = True
option_spec = {
'template': directives.path,
'value_status': directives.unchanged_required,
}
@property
def env(self):
document = self.state.document
return document.settings.env
def _resolve_src_path(self, path: str) -> str:
rel_filename, filename = self.env.relfn2path(path)
self.env.note_dependency(filename)
return filename
def _render(self, context: Dict[str, Any]) -> str:
builder = self.env.app.builder
template = self.options.get('template')
if template is None:
self.error(f'Option "template" not specified!')
return builder.templates.render(template, context)
def _make_context(self) -> Dict[str, Any]:
header = self._resolve_src_path(self.arguments[0])
source = self._resolve_src_path(self.arguments[1])
db_parser = DBConfigParser(source, header)
value_status = self.options.get("value_status")
return dict(data=db_parser.parse(),
value_status=value_status)
def run(self) -> List[nodes.Node]:
maybe_add_filters(self.env.app.builder)
rendered = self._render(self._make_context())
contents = StringList(rendered.splitlines())
node = nodes.section()
node.document = self.state.document
nested_parse_with_titles(self.state, contents, node)
return node.children
def setup(app: Sphinx) -> Dict[str, Any]:
app.add_config_value(
'scylladb_cc_properties_option_tmpl',
default='db_option.tmpl',
rebuild='html',
types=[str])
app.add_object_type(
'confgroup',
'confgroup',
objname='configuration group',
indextemplate='pair: %s; configuration group',
doc_field_types=[
Field('example',
label='Example',
has_arg=False)
])
app.add_object_type(
'confval',
'confval',
objname='configuration option')
app.add_directive_to_domain('std', 'confval', ConfigOption, override=True)
app.add_directive('scylladb_config_list', ConfigOptionList)
return {
"version": "0.1",
"parallel_read_safe": True,
"parallel_write_safe": True,
}
app.connect("builder-inited", db_parser.run)
app.add_directive("scylladb_config_template", DBConfigTemplateDirective)

View File

@@ -1,25 +0,0 @@
from sphinx.directives.other import Include
from docutils.parsers.rst import directives
class IncludeFlagDirective(Include):
option_spec = Include.option_spec.copy()
option_spec['base_path'] = directives.unchanged
def run(self):
env = self.state.document.settings.env
base_path = self.options.get('base_path', '_common')
if env.app.tags.has('enterprise'):
self.arguments[0] = base_path + "_enterprise/" + self.arguments[0]
else:
self.arguments[0] = base_path + "/" + self.arguments[0]
return super().run()
def setup(app):
app.add_directive('scylladb_include_flag', IncludeFlagDirective, override=True)
return {
"version": "0.1",
"parallel_read_safe": True,
"parallel_write_safe": True,
}

View File

@@ -27,12 +27,4 @@ h3 .pre {
hr {
max-width: 100%;
}
dl dt:hover > a.headerlink {
visibility: visible;
}
dl.confval {
border-bottom: 1px solid #cacaca;
}
}

View File

@@ -8,12 +8,25 @@
{% if group.description %}
.. raw:: html
<p>{{ group.description | readable_desc }}</p>
<p>{{ group.description }}</p>
{% endif %}
{% for item in group.properties %}
{% if item.value_status == value_status %}
.. confval:: {{ item.name }}
{{ item.name }}
{{ '=' * (item.name|length) }}
.. raw:: html
<p>{{ item.description }}</p>
{% if item.type %}* **Type:** ``{{ item.type }}``{% endif %}
{% if item.default %}* **Default value:** ``{{ item.default }}``{% endif %}
{% if item.liveness %}* **Liveness** :term:`* <Liveness>` **:** ``{{ item.liveness }}``{% endif %}
.. raw:: html
<hr/>
{% endif %}
{% endfor %}
{% endif %}

View File

@@ -1,7 +0,0 @@
.. raw:: html
<p>{{ description | readable_desc }}</p>
{% if type %}* **Type:** ``{{ type }}``{% endif %}
{% if default %}* **Default value:** ``{{ default }}``{% endif %}
{% if liveness %}* **Liveness** :term:`* <Liveness>` **:** ``{{ liveness }}``{% endif %}

Some files were not shown because too many files have changed in this diff Show More