We already have a dedicated coverage build, however, this build is
dedicated mostly for coverage in boost and standalone unit tests.
This added configuration option will compile every configured
build mode with coverage profiling support (excluding 'coverage' mode).
It also does targeted profiling that is narrowed down only to ScyllaDB
code and doesn't instrument seastar and testing code, this should give
a more accurate coverage reporting and also impact performance less, as
one example, the reactor loop in seastar will not be profiled (along
with everything else).
The targeted profiling is done with the help of the newly added
`coverage_sources.list` file which excludes all seastar sub directories
from the profiling.
Also an extra measure is taken to make sure that the seastar
library will not be linked with the coverage framework
(so it will not dump confusing empty profiles).
Some of the seastar headers are still going to be included in the
profile since they are indirectly included by profiled source files in
order to remove them from the final report a processing step on the
resulting profile will need to take place.
A note about expected performance impact:
It is expected to have minimal impact on performance since the
instrumentation adds counter increments without locking.
Ref: https://clang.llvm.org/docs/UsersManual.html#cmdoption-fprofile-update
This means that the numbers themselves are less reliable but all covered
lines are guarantied to have at least non-zero value.
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
a209ae15 addresses that last -Wimplicit-int-float-conversion warning
in the tree, so we now have the luxury of enabling this warning.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16640
`--static-boost` is an option provided by `configure.py`. this option is
not used by our CI or building scripts. but in order to be compatible
with the existing behavior of `configure.py`, let's support this option
when building with CMake.
`Boost_USE_STATIC_LIBS` is a cmake variable supported by CMake's
FindBoost and Boost's own `BoostConfig.cmake`. see
https://cmake.org/cmake/help/latest/module/FindBoost.html#other-variables
by default boost is linked via its shared libraries. by setting
this variable, we link boost's static libraries.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16545
this preserves the existing behavior of `configure.py` in the CMake
generated `build.ninja`.
* configure.py: map 'release' to 'RelWithDebInfo'
* cmake: rename cmake/mode.Release.cmake to cmake/mode.RelWithDebInfo.cmake
* CMakeLists.txt: s/Release/RelWithDebInfo/
Closesscylladb/scylladb#16479
* github.com:scylladb/scylladb:
build: cmake: map 'release' to 'RelWithDebInfo'
build: define BuildType for enclosing build_by_default
It enables interaction with the node through CQL protocol without authentication. It gives full-permission access.
The maintenance socket is available by Unix domain socket with file permissions `755`, thus it is not accessible from outside of the node and from other POSIX groups on the node.
It is created before the node joins the cluster.
To set up the maintenance socket, use the `maintenance-socket` option when starting the node.
* If set to `ignore` maintenance socket will not be created.
* If set to `workdir` maintenance socket will be created in `<node's workdir>/cql.m`.
* Otherwise maintenance socket will be created in the specified path.
The default value is `ignore`.
* With python driver
```python
from cassandra.cluster import Cluster
from cassandra.connection import UnixSocketEndPoint
from cassandra.policies import HostFilterPolicy, RoundRobinPolicy
socket = "<node's workdir>/cql.m"
cluster = Cluster([UnixSocketEndPoint(socket)],
# Driver tries to connect to other nodes in the cluster, so we need to filter them out.
load_balancing_policy=HostFilterPolicy(RoundRobinPolicy(), lambda h: h.address == socket))
session = cluster.connect()
```
Merge note: apparently cqlsh does not support unix domain sockets; it
will have to be fixed in a follow-up.
Closesscylladb/scylladb#16172
* github.com:scylladb/scylladb:
test.py: add maintenance socket test
test.py: enable maintenance socket in tests by default
docs: add maintenance socket documentation
main: add maintenance socket
main: refactor initialization of cql controller and auth service
auth/service: don't create system_auth keyspace when used by maintenance socket
cql_controller: maintenance socket: fix indentation
cql_controller: add option to start maintenance socket
db/config: add maintenance_socket_enabled bool class
auth: add maintenance_socket_role_manager
db/config: add maintenance_socket variable
this preserves the existing behavior of `configure.py` in the CMake
generated `build.ninja`.
* configure.py: map 'release' to 'RelWithDebInfo'
* cmake: rename cmake/mode.Release.cmake to cmake/mode.RelWithDebInfo.cmake
* CMakeLists.txt: s/Release/RelWithDebInfo/
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
in existing `modes` defined in `configure.py`, "release" is mapped to
"RelWithDebInfo". this behavior matches that of seastar's
`configure.py`, where we also map "release" build mode to
"RelWithDebInfo" CMAKE_BUILD_TYPE.
but in scylladb's existing cmake settings, it maps "release" to
"Release", despite "Release" is listed as one of the typical
CMAKE_BUILD_TYPE values.
so, in this change, to prepare for the mapping, `BuildType` is
introduced to map a build mode to its related settings. the
building settings are still kept in `cmake.${CMAKE_BUILD_TYPE}.cmake`,
but the other settings, like if a build type should be enabled or
its mappings, are stored in `BuildType` in `configure.py`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Add `maintenance_socket_role_manager` which will disable all operations
associated with roles to not depend on system_auth keyspace, which may
be not yet created when the maintenance socket starts listening
The task for splitting compaction will run until all sstables
in the main set are split. The only exceptions are shutdown
or user has explicitly asked for abort.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Token group is an abstraction that allows us to easily segregate a
mutation stream into buckets. Groups share the same properties as
compaction groups. Groups follow the ring order and they don't
overlap each other. Groups are defined according to a classifier,
which return an id given a token. It's expected that classifier
return ids in monotonic increasing order.
The reasons for this abstraction are:
1) we don't want to make segregator aware of compaction groups
2) splitting happens before tablet metadata is changed, so the
the segregator will have to classify based on whether the token
belongs to left (group id 0) or right (group id 1) side of
the range to be split.
The reason for not extending sstable writer instead, is that
today, writer consumer can only tell producer to switch to a
new writer, when consuming the end of a partition, but that
would be too late for us, as we have to decide to move to
a new writer at partition start instead.
It will be wired into compaction when it happens in split mode.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
For all compaction types which can be started with api, add an asynchronous
version of api, which returns task_id of the corresponding task manager
task. With the task_id a user can check task status, abort, or wait for it,
using task manager api.
as part of the efforts to migrate to the CMake-based building system,
this change enables us to `configure.py` to optionally create
`build.ninja` with CMake.
in this change, we add a new option named `--use-cmake` to
`configure.py` so we can create `build.ninja`. please note,
instead of using the "Ninja" generator used by Seastar's
`configure.py` script, we use "Ninja Multi-Config" generator
along with `CMAKE_CROSS_CONFIGS` setting in this project.
so that we can generate a `build.ninja` which is capable of
building the same artifacts with multiple configuration.
Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15916
* github.com:scylladb/scylladb:
build: cmake: add compatibility target of dev-headers
build: add an option to use CMake as the build build system
our CI builds "dev-headers" as a gating check. but the target names
generated by CMake's Ninja Multi-Config generator does not follow
this naming convention. we could have headers:Dev, but still, it's
different from what we are using, before completely switching to
CMake, let's keep this backward compatibility by adding a target
with the same name.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
as part of the efforts to migrate to the CMake-based building system,
this change enables us to `configure.py` to optionally create
`build.ninja` with CMake.
in this change, we add a new option named `--use-cmake` to
`configure.py` so we can create `build.ninja`. please note,
instead of using the "Ninja" generator used by Seastar's
`configure.py` script, we use "Ninja Multi-Config" generator
along with `CMAKE_CROSS_CONFIGS` setting in this project.
so that we can generate a `build.ninja` which is capable of
building the same artifacts with multiple configuration.
Fixes#15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
in Python, a raw string is created using 'r' or 'R' prefix. when
creating the regex using Python string, sometimes, we have to use
"\" to escape the parenthesis so the tools like "sed" can consider
the parenthesis as a capture group. but "\" is also used to escape
strings in Python, in order to put "\" as it is, we use "\" instead
of escaping "\" with "\\" which is obscure. when generating rules,
we use multiple-lines string and do not want to have an empty line
at the beginning of the string so added "\" continuation mark.
but we fail to escape some of the "\" in the string, and just put
"\(", despite that Python accepts it after failing to find a matched
escaped char for it, and interprets it as "\\(". this should still
be considered a misuse of oversight. with python's warning enabled,
one is able see its complaints.
in this change, we escape the "\" properly.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16179
In topology on raft mode, the events "new node starts its group0 server"
and "new node is added to group0 configuration" are not synchronized
with each other. Therefore it might happen that the cluster starts
sending commands to the new node before the node starts its server. This
might lead to harmless, but ugly messages like:
INFO 2023-09-27 15:42:42,611 [shard 0:stat] rpc - client
127.0.0.1:56352 msg_id 2: exception "Raft group
b8542540-5d3b-11ee-99b8-1052801f2975 not found" in no_wait handler
ignored
In order to solve this, the failure detector verb is extended to report
information about whether group0 is alive. The raft rpc layer will drop
messages to nodes whose group0 is not seen as alive.
Tested by adding a delay before group0 is started on the joining node,
running all topology tests and grepping for the aforementioned log
messages.
Fixes: scylladb/scylladb#15853Fixes: scylladb/scylladb#15167Closesscylladb/scylladb#16071
* github.com:scylladb/scylladb:
raft: rpc: introduce destination_not_alive_error
raft: rpc: drop RPCs if the destination is not alive
raft: pass raft::failure_detector to raft_rpc
raft: transfer information about group0 liveness in direct_fd_ping
raft: add server::is_alive
Add a new destination_not_alive_error, thrown from two-way RPCs in case
when the RPC is not issued because the destination is not reported as
alive by the failure detector.
In snapshot transfer code, lower the verbosity of the message printed in
case it fails on the new error. This is done to prevent flakiness in the
CI - in case of slow runs, nodes might get spuriously marked as dead if
they are busy, and a message with the "error" verbosity can cause some
tests to fail.
Clang-18 starts to complain when a constexp value is casted to a
enum and the value is out of the range of the enum values. in this
case, boost intentially cast the out-of-range values to the
type to be casted. so silence this warning at this moment.
since `lexical_cast.hpp` is included in multiple places in the
source tree, this warning is disabled globally.
the warning look like:
```
In file included from /home/kefu/dev/scylladb/types/types.cc:9:
In file included from /usr/include/boost/lexical_cast.hpp:32:
In file included from /usr/include/boost/lexical_cast/try_lexical_convert.hpp:43:
In file included from /usr/include/boost/lexical_cast/detail/converter_numeric.hpp:36:
In file included from /usr/include/boost/numeric/conversion/cast.hpp:33:
In file included from /usr/include/boost/numeric/conversion/converter.hpp:13:
In file included from /usr/include/boost/numeric/conversion/conversion_traits.hpp:13:
In file included from /usr/include/boost/numeric/conversion/detail/conversion_traits.hpp:18:
In file included from /usr/include/boost/numeric/conversion/detail/int_float_mixture.hpp:19:
In file included from /usr/include/boost/mpl/integral_c.hpp:32:
/usr/include/boost/mpl/aux_/integral_wrapper.hpp:73:31: error: integer value -1 is outside the valid range of values [0, 3] for the enumeration type 'udt_buil
tin_mixture_enum' [-Wenum-constexpr-conversion]
73 | typedef AUX_WRAPPER_INST( BOOST_MPL_AUX_STATIC_CAST(AUX_WRAPPER_VALUE_TYPE, (value - 1)) ) prior;
| ^
/usr/include/boost/mpl/aux_/static_cast.hpp:24:47: note: expanded from macro 'BOOST_MPL_AUX_STATIC_CAST'
24 | # define BOOST_MPL_AUX_STATIC_CAST(T, expr) static_cast<T>(expr)
| ^
In file included from /home/kefu/dev/scylladb/types/types.cc:9:
In file included from /usr/include/boost/lexical_cast.hpp:32:
In file included from /usr/include/boost/lexical_cast/try_lexical_convert.hpp:43:
In file included from /usr/include/boost/lexical_cast/detail/converter_numeric.hpp:36:
In file included from /usr/include/boost/numeric/conversion/cast.hpp:33:
In file included from /usr/include/boost/numeric/conversion/converter.hpp:13:
In file included from /usr/include/boost/numeric/conversion/conversion_traits.hpp:13:
In file included from /usr/include/boost/numeric/conversion/detail/conversion_traits.hpp:18:
In file included from /usr/include/boost/numeric/conversion/detail/int_float_mixture.hpp:19:
In file included from /usr/include/boost/mpl/integral_c.hpp:32:
/usr/include/boost/mpl/aux_/integral_wrapper.hpp:73:31: error: integer value -1 is outside the valid range of values [0, 3] for the enumeration type 'int_float_mixture_enum' [-Wenum-constexpr-conversion]
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16082
this was a typo introduced by 781b7de5. which intended to add
-Wignored-qualifiers to the compiling options, but it ended up
adding -Wignore-qualifiers.
in this change, the typo is corrected.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16124
`-Wignore-qualifiers` is included by -Wextra. but we are not there yet,
with this change, we can keep the changes introducing -Wignore-qualifiers
warnings out of the repo, before applying `-Wextra`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we only check the existence of compile_commands.json
before creating a symlink to build/*/compile_commands.json. but there are
chances that multiple ninja tasks are calling into `configure.py` for
updating `build.ninja`: this does not break the process, as the last one
wins: we just unconditionally `mv build.ninja.new build.ninja` for
updating the this file. but this could break the build of
`'compile_commands.json`: we create a symlink with Python, and if it
fails the Python script errors out.
in this change, we just ignore the `FileExistsError` when creating
the symlink to `compile_commands.json`. because, if this symlink,
we've achieved the goal, and should not consider it a failure.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15870
actually we've created outdir when using it as the parent directory
of `tempfile.tempdir`, but there are many places where we use
`tempfile.tempdir` for, for instance, testing the compiler flags,
and these tests will be removed once we migrate to CMake, so they
do not really matter when reviewing the change which migrates to
CMake.
the point of this change is to help the review understand the major
changes performed by the migration.
Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
we use `globals().update(vars(args))` for updating the global variables
with a dict in `args`, this is convenient, but it hurts the readability.
let's reference the parsed options explicitly.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
prepare for the change to read the SCYLLA-*-FILE in functions not
doing this in global scope.
Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
on top of per-mode cxxflags, we apply more of them based on settings
and building environment. to reduce the statements in global scope,
let's extract the related code into a function.
Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
`optional_packages` was introduced in 8b0a26f06d, but we don't
offer the alternative versions of libsystemd anymore, and this
variable is not used in `configure.py`, so let's drop it.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
There are few of them that don't need the storage service for anything
but get token metadata from. Move them to own .cc/.hh units.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
instead of updating `modes` in with global statements, update it in
a function. for better readablity. and to reduce the statements in
global scope.
Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
this change adds "jsoncpp" dependency using "pkgs". simpler this
way. it also helps to remove more global statements.
Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
this change extract `get_warnings_options()` out. it helps to
remove more global statements.
Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
this series is one of the steps to remove global statements in `configure.py`.
not only the script is more structured this way, this also allows us to quickly identify the part which should/can be reused when migrating to CMake based building system.
Refs #15379Closesscylladb/scylladb#15668
* github.com:scylladb/scylladb:
build: move check for NIX_CC into dynamic_linker_option()
build: extract dynamic_linker_option(): out
build: move `headers` into write_build_file()
This PR contains several refactoring, related to truncation records handling in `system_keyspace`, `commitlog_replayer` and `table` clases:
* drop map_reduce from `commitlog_replayer`, it's sufficient to load truncation records from the null shard;
* add a check that `table::_truncated_at` is properly initialized before it's accessed;
* move its initialization after `init_non_system_keyspaces`
Closesscylladb/scylladb#15583
* github.com:scylladb/scylladb:
system_keyspace: drop truncation_record
system_keyspace: remove get_truncated_at method
table: get_truncation_time: check _truncated_at is initialized
database: add_column_family: initialize truncation_time for new tables
database: add_column_family: rename readonly parameter to is_new
system_keyspace: move load_truncation_times into distributed_loader::populate_keyspace
commitlog_replayer: refactor commitlog_replayer::impl::init
system_keyspace: drop redundant typedef
system_keyspace: drop redundant save_truncation_record overload
table: rename cache_truncation_record -> set_truncation_time
system_keyspace: get_truncated_position -> get_truncated_positions
`employ_ld_trickery` is only used by `dynamic_linker_option()`, so
move it into this function.
Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
This PR implements a new procedure for joining nodes to group 0, based on the description in the "Cluster features on Raft (v2)" document. This is a continuation of the previous PRs related to cluster features on raft (https://github.com/scylladb/scylladb/pull/14722, https://github.com/scylladb/scylladb/pull/14232), and the last piece necessary to replace cluster feature checks in gossip.
Current implementation relies on gossip shadow round to fetch the set of enabled features, determine whether the node supports all of the enabled features, and joins only if it is safe. As we are moving management of cluster features to group 0, we encounter a problem: the contents of group 0 itself may depend on features, hence it is not safe to join it unless we perform the feature check which depends on information in group 0. Hence, we have a dependency cycle.
In order to solve this problem, the algorithm for joining group 0 is modified, and verification of features and other parameters is offloaded to an existing node in group 0. Instead of directly asking the discovery leader to unconditionally add the node to the configuration with `GROUP0_MODIFY_CONFIG`, two different RPCs are added: `JOIN_NODE_REQUEST` and `JOIN_NODE_RESPONSE`. The main idea is as follows:
- The new node sends `JOIN_NODE_REQUEST` to the discovery leader. It sends a bunch of information describing the node, including supported cluster features. The discovery leader verifies some of the parameters and adds the node in the `none` state to `system.topology`.
- The topology coordinator picks up the request for the node to be joined (i.e. the node in `none` state), verifies its properties - including cluster features - and then:
- If the node is accepted, the coordinator transitions it to `boostrap`/`replace` state and transitions the topology to `join_group0` state. The node is added to group 0 and then `JOIN_NODE_RESPONSE` is sent to it with information that the node was accepted.
- Otherwise, the node is moved to `left` state, told by the coordinator via `JOIN_NODE_RESPONSE` that it was rejected and it shuts down.
The procedure is not retryable - if a node fails to do it from start to end and crashes in between, it will not be allowed to retry it with the same host_id - `JOIN_NODE_REQUEST` will fail. The data directory must be cleared before attempting to add it again (so that a new host_id is generated).
More details about the procedure and the RPC are described in `topology-over-raft.md`.
Fixes: #15152Closesscylladb/scylladb#15196
* github.com:scylladb/scylladb:
tests: mark test_blocked_bootstrap as skipped
storage_service: do not check features in shadow round
storage_service: remove raft_{boostrap,replace}
topology_coordinator: relax the check in enable_features
raft_group0: insert replaced node info before server setup
storage_service: use join node rpc to join the cluster
topology_coordinator: handle joining nodes
topology_state_machine: add join_group0 state
storage_service: add join node RPC handlers
raft: expose current_leader in raft::server
storage_service: extract wait_for_live_nodes_timeout constant
raft_group0: abstract out node joining handshake
storage_service: pass raft_topology_change_enabled on rpc init
rpc: add new join handshake verbs
docs: document the new join procedure
topology_state_machine: add supported_features to replica_state
storage_service: check destination host ID in raft verbs
group_state_machine: take reference to raft address map
raft_group0: expose joined_group0
instead relying on updating `global()` with `args`, pass the
argument explicitly, this helps us understand the data dependency
in this script better.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>