scylla

Author	SHA1	Message	Date
Patryk Jędrzejczak	ddfd9c3173	raft topology: join: shut down a node on error in response handler If the joining node fails while handling the response from the topology coordinator, it hangs even though it knows the join operation has failed. Therefore, we ensure it shuts down in this patch. We rethrow the caught exception to ensure the topology coordinator knows the RPC has failed. In case of rejection, it does not matter because the coordinator behaves the same way in both cases: RPC success and RPC failure. It transitions the rejected node to the left state. However, in case of acceptance, this only happens if the RPC fails. Otherwise, the coordinator continues handling the request. On abort, one of the two events happens first: - the new catch statement catches `abort_requested_exeption` and sets it on `_join_node_response_done`, - `co_await _ss._join_node_response_done.get_shared_future(as);` in `join_node_rpc_handshaker::post_server_start` resolves with `abort_requested_exception` after triggering `as`. In both cases, `join_node_rpc_handshaker::post_server_start` throws `abort_requested_exception`. Therefore, we don't need a separate catch statement for `abort_requested_exception` in `join_node_response_handler`.	2024-01-11 14:19:37 +01:00
Botond Dénes	697ebef149	Merge 'tasks: compaction: drop regular compaction tasks after they are finished' from Aleksandra Martyniuk Make compaction tasks internal. Drop all internal tasks without parents immediately after they are done. Fixes: #16735 Refs: #16694. Closes scylladb/scylladb#16698 * github.com:scylladb/scylladb: compaction: make regular compaction tasks internal tasks: don't keep internal root tasks after they complete	2024-01-11 12:10:44 +02:00
Nadav Har'El	5762170526	main: fix "starting {}" messages The supervisor::notify() function expects a single string - not a format and parameters. Calls we have in main.cc like supervisor::notify("starting {}", what); end up printing the silly message "starting {}". The second parameter "what" is converted to a bool, also having an unintended consequence for telling notify we're "ready". This patch fixes it to call fmt::format, as intended. Fixes #16728 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16729	2024-01-11 11:43:07 +02:00
Botond Dénes	ac69473bac	Merge 'utils/pretty_printers: add "I" specifier support' from Kefu Chai this is to mimic the formatting of `human_readable_value`, and to prepare for consolidating these two formatters, so we don't have two pretty printers in the tree. Closes scylladb/scylladb#16726 * github.com:scylladb/scylladb: utils/pretty_printers: add "I" specifier support utils/pretty_printers: use the formatting of to_hr_size()	2024-01-11 10:54:14 +02:00
Kefu Chai	0c2ef5de54	test/unit/bptree_validation: use "{}" for formatting test_data before this change, "{:d}" is used for formatting `test_data` y bptree_stress_test.cc. but the "d" specifier is only used for formatting integers, not for formatting `test_data` or generic data types, so this fails when the test is compiled with {fmt} v10, like: ``` In file included from /home/kefu/dev/scylladb/test/unit/bptree_stress_test.cc:20: /home/kefu/dev/scylladb/test/unit/bptree_validation.hh:294:35: error: call to consteval function 'fmt::basic_format_string<char, test_data &, test_data &>::basic_format_string<char[31], 0>' is not a constant expression 294 \| fmt::print(std::cout, "Iterator broken, {:d} != {:d}\n", val, *_fwd); \| ^ /home/kefu/dev/scylladb/test/unit/bptree_validation.hh:267:20: note: in instantiation of member function 'bplus::iterator_checker<tree_test_key_base, test_data, test_key_compare, 16>::forward_check' requested here 267 \| return forward_check(); \| ^ /home/kefu/dev/scylladb/test/unit/bptree_stress_test.cc:92:35: note: in instantiation of member function 'bplus::iterator_checker<tree_test_key_base, test_data, test_key_compare, 16>::step' requested here 92 \| if (!itc->step()) { \| ^ /usr/include/fmt/core.h:2322:31: note: non-constexpr function 'throw_format_error' cannot be used in a constant expression 2322 \| if (!in(arg_type, set)) throw_format_error("invalid format specifier"); \| ^ ``` in this change, instead of specifying "{:d}", let's just use "{}", which works for both integer and `test_data`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16727	2024-01-11 10:53:33 +02:00
Kefu Chai	6c06751640	cdc: not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16725	2024-01-11 09:13:37 +02:00
Kefu Chai	5874652967	cql3: define format_as() for formatting cql3::cql3_type in the same spirit of `724a6e26`, format_as() is defined for cql3::cql3_type. despite that this is not used yet by fmt v9, where we still have FMT_DEPRECATED_OSTREAM, this prepares us for fmt v10. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16232	2024-01-11 09:07:18 +02:00
Botond Dénes	3d1667c720	Update ./tools/java submodule * ./tools/java e106b500...24e51259 (1): > build.xml: update io.airlift to 0.9	2024-01-11 08:55:51 +02:00
Lakshmi Narayanan Sreethar	76f0d5e35b	reader_permit: store schema_ptr instead of raw schema pointer Store schema_ptr in reader permit instead of storing a const pointer to schema to ensure that the schema doesn't get changed elsewhere when the permit is holding on to it. Also update the constructors and all the relevant callers to pass down schema_ptr instead of a raw pointer. Fixes #16180 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#16658	2024-01-11 08:37:56 +02:00
Kefu Chai	f11a53856d	utils/pretty_printers: add "I" specifier support this is to mimic the formatting of `human_readable_value`, and to prepare for consolidating these two formatters, so we don't have two pretty printers in the tree. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-11 14:33:47 +08:00
Patryk Wrobel	f4e311e871	cql3: add formatter for cql3::expr::oper_t This change introduces a specialization of fmt::formatter for cql3::expr::oper_t. This enables the usage of this type with FMTv10, which dropped the default generated formatter. Usage of cql3::expr::oper_t without the defined formatter resulted in compilation error when compiled with FMTv10. Refs: #13245 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#16719	2024-01-11 08:33:35 +02:00
Kefu Chai	7d627b328f	utils/pretty_printers: use the formatting of to_hr_size() keep the precision of 4 digits, for instance, so that we format "8191" as "8191" instead of as "8 Ki". this is modeled after the behavior of `to_hr_size()`. for better user experience. and also prepares to consolidate these two formatters. tests are updated to exercise both IEC and SI notations. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-11 14:33:03 +08:00
Kefu Chai	8c4576f55d	api: storage_service: correct the descriptions of two APIs this change is more about documentation of the RESTful API of storage_service. as we define the API using Swagger 2.0 format, and generate the API document from the definitions. so would be great if the document matches with the API. in this change, since the keyspace is not queried but mutated. so changed to a more accurate description. from the code perspective, it is but cosmetic. as we don't read the description fields or verify them in our tests. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16637	2024-01-11 08:28:14 +02:00
Kamil Braun	6e39c2ffde	gossiper: store source_location instead of string in endpoint_permit The original code extracted only the function_name from the source_location for logging. We'll use more information from the source_location in later commits.	2024-01-10 17:02:52 +01:00
Kamil Braun	664349a10f	storage_service: more verbose logging in handle_state_normal In a longevity test reported in scylladb/scylladb#16668 we observed that NORMAL state is not being properly handled for a node that replaced another node. Either handle_state_normal is not being called, or it is but getting stuck in the middle. Which is the case couldn't be determined from the logs, and attempts at creating a local reproducer failed. Improve the INFO level logging in handle_state_normal to aid debugging in the future. The amount of logs is still constant per-node. Even though some log messages report all tokens owned by a node, handle_state_normal calls are still rare. The most "spammy" situation is when a node starts and calls handle_state_normal for every other node in the cluster, but it is a once-per-startup event.	2024-01-10 16:39:55 +01:00
Patryk Wrobel	a64eb92369	utils: specialize fmt::formatter for utils::tagged_integer This change introduces a specialization of fmt::formatter for utils::tagged_integer. This enables the usage of this type with FMTv10, which dropped the default generated formatter. Usage of utils::tagged_integer without the defined formatter resulted in compilation error when compiled with FMTv10. Refs: #13245 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#16715	2024-01-10 18:32:43 +03:00
Nadav Har'El	083868508c	Update seastar submodule * seastar 70349b74...0ffed835 (15): > http/client: include used header files > treewide: s/format/fmt::format/ when appropriate > shared_future: shared_state::run_and_dispose(): release reserve of _peers Fixes #16493 > metrics_tester - A demo app to test metrics > build: silence the waring of -Winclude-angled-in-module-purview > estimated_histogram.hh: Support native histograms > prometheus.cc: Clean the pick representation code > prometheus.cc add native histogram > memory: fix the indentation. > metrics_types.hh: add optional native histogram information > memory: include used header > prometheus.cc: Add filter, aggregate by label and skip_when_empty > src/proto/metrics2.proto: newer proto buf definition > print: deprecate format_separated() > reactor: use fmt::join() when appropriate Closes scylladb/scylladb#16712	2024-01-10 14:02:04 +02:00
Nadav Har'El	39dd2a2690	cql-pytest: translated Cassandra's test for LWT with static column This is a translation of Cassandra's CQL unit test source file validation/operations/InsertUpdateIfConditionStaticsTest.java into our cql-pytest framework. This test file checks various LWT conditional updates which involve static columns or UDTs (there are separate test file for LWT conditional updates that do not involve static columns). This test did not uncover any new bugs, but demonstrates yet again several places where we intentionally deviated from Cassandra's behavior, forcing me to add "is_scylla" checks in many of the checks to allow them to pass on both Scylla and Cassanda. These deviations are known, intentional and some are documented in docs/kb/lwt-differences.rst but not all, so it's worth listing here the ones re-discovered by this test: 1. On a successful conditional write, Cassandra returns just True, Scylla also returns the old contents of the row. This difference is officially documented in docs/kb/lwt-differences.rst. 2. On a batch request, Scylla always returns a row per statement, Cassandra doesn't - it often returns just a single failed row, or just True if the whole batch succeeded. This difference is officially documented in docs/kb/lwt-differences.rst. 3. In a DELETE statement with a condition, in the returned row Cassandra lists the deleted column first - while Scylla lists the static column first (as in any other row). This difference is probably inconsequential, because columns also have names so their order in the response usually doesn't matter. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16643	2024-01-10 12:14:06 +02:00
Nadav Har'El	b1a441ba56	test/cql-pytest: correct xfail status of timestamp parser The recently-added test test_fromjson_timestamp_submilli demonstrated a difference between Scylla's and Cassandra's parsing timestamps in JSON: Trying to use too many (more than 3) digits of precision is forbidden in Scylla, but ignored in Cassandra. So we marked the test "xfail", suggesting we think it's a Scylla bug that should be fixed in the future. However, it turns out that we already had a different test, test_type_timestamp_from_string_overprecise, which showed the same difference in a different context (without JSON). In that older test, the decision was to consider this a Cassandra bug, not Scylla bug - because Cassandra seemingly allows the sub-millisecond timestap but in reality drops the extra precision. So we need to be consistent in the tests - this is either a Scylla bug or a Cassandra bug, we can't make once choice in one test and another in a different test :-) So let's accept our older decision, and consider Scylla's behavior the correct one in this case. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16586	2024-01-10 12:12:26 +02:00
Kefu Chai	eb9216ef11	compaction: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16707	2024-01-10 11:07:36 +02:00
Kefu Chai	317af97e41	test/pylib: shutdown unix RESTful client when stopping the ManagerClient, it would be better to close all connected connector, otherwise aiohttp complains like: ``` 13:57:53.763 ERROR> Unclosed connector connections: ['[(<aiohttp.client_proto.ResponseHandler object at 0x7f939d2ca5f0>, 96672.211256817)]'] connector: <aiohttp.connector.UnixConnector object at 0x7f939d2da890> ``` this warning message is printed to the console, and it is distracting when testing manually. so, in this change, let's close the client connecting to unix domain socket. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16675	2024-01-10 11:07:14 +02:00
Kefu Chai	f61f6c27e3	gms: add formatter for gms::endpoint_state before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for gms::endpoint_state, and change update the callers of `operator<<` to use `fmt::print()`. but we cannot drop `operator<<` yet, as we are still using the templated operator<< and templated fmt::formatter to print containers in scylla and in seastar -- they are still using `operator<<` under the hood. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16705	2024-01-10 09:16:23 +02:00
Sylwia Szunejko	eabe97bcd0	transport: remove additional options from TABLETS_ROUTING_V1 Closes scylladb/scylladb#16701	2024-01-10 09:00:25 +02:00
Botond Dénes	5981900dca	Update tools/jmx submodule * tools/jmx 80ce5996...3257897a (1): > scylla-apiclient: drop hk2-locator dependency	2024-01-10 08:53:20 +02:00
Kefu Chai	34b03867b2	tools: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16673	2024-01-10 08:44:09 +02:00
Kefu Chai	0dc7db54d1	build: cmake: add "unit_test_list" target this target is used by test.py for enumerating unit tests * test/CMakeLists.txt: append executable's full path to `scylla_tests`. add `unit_test_list` target printing `scylla_tests`, please note, `cmake -E echo` does not support the `-e` option of `echo`, and ninja does not support command line with newline in it, we have to use `echo` to print the list of tests. * test/{boost,raft,unit}/CMakeLists.txt: set scylla_tests only if $PWD/suite.yaml exists. we could hardwire this logic in these files, as it is known that this file exists in these directory, but this is still put this way, so that it serves as a comment explaining that the reason why we update scylla_tests here but not somewhere else where we also use `add_scylla_test()` function is just suite.yaml exists here. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16702	2024-01-10 08:43:04 +02:00
Botond Dénes	4aba445ef6	Merge 'test.py: adapt to cmake building system' from Kefu Chai in this series, we adapt to cmake building system by mapping scylla build mode to `CMAKE_BUILD_TYPE` and by using `build/build.ninja` if it exists, as `configure.py` generates `build.ninja` in `build` when using CMake for creating `build.ninja`. Closes scylladb/scylladb#16703 * github.com:scylladb/scylladb: test.py: build using build/build.ninja when it exists test.py: extract ninja() test.py: extract path_to() test.py: define all_modes as a dict of mode:CMAKE_BUILD_TYPE	2024-01-10 08:39:33 +02:00
Kefu Chai	382a5e2d0c	test.py: build using build/build.ninja when it exists CMake puts `build.ninja` under `build`, so use it if it exists, and fall back to current directory otherwise. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-10 10:01:02 +08:00
Kefu Chai	6674e87842	test.py: extract ninja() use ninja() to build target using `ninja`. since CMake puts `build.ninja` under "build", while `configure.py` puts it under the root source directory, this change prepares us for a follow-up change to build with build/build.ninja. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-10 10:01:02 +08:00
Kefu Chai	5fda822c4e	test.py: extract path_to() use path_to() to find the path to the directory under build directory. this change helps to find the executables built using CMake as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-10 10:01:02 +08:00
Kefu Chai	0b11ae9fe6	test.py: define all_modes as a dict of mode:CMAKE_BUILD_TYPE because scylla build mode and CMAKE_BUILD_TYPE is not identical, let's define `all_modes` as a dict so we can look it up. this change prepares for a follow-up commit which adds a path resolver which support both build system generator: the plain `configure.py` and CMake driven by `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-10 10:01:02 +08:00
Botond Dénes	f4f724921c	load_meter: get_load_map(): don't unconditionally dereference _lb Said method has a check on `_lb` not being null, before accessing it. However, since `0e5754a`, there was an unconditional access, adding an entry for the local node. Move this inside the if, so it is covered by the null-check. The only caller is the api (probably nodetool), the worst that can happend is that they get completely empty load-map if they call too early during startup. Fixes: #16617 Closes scylladb/scylladb#16659	2024-01-09 16:02:12 +03:00
Aleksandra Martyniuk	6b87778ef2	compaction: make regular compaction tasks internal Regular compaction tasks are internal. Adjust test_compaction_task accordingly: modify test_regular_compaction_task, delete test_running_compaction_task_abort (relying on regular compaction) which checks are already achived by test_not_created_compaction_task_abort. Rename the latter.	2024-01-09 13:13:54 +01:00
Aleksandra Martyniuk	6b2b384c83	tasks: don't keep internal root tasks after they complete	2024-01-09 13:13:54 +01:00
Pavel Emelyanov	cdf5124003	Merge 'tools/scylla-sstable: pass error handler to utils::config_file::read_from_file()' from Botond Dénes The default error handler throws an exception, which means scylla-sstable will exit with exception if there is any problem in the configuration. Not even ScyllaDB itself is this harsh -- it will just log a warning for most errors. A tool should be much more lenient. So this patch passes an error handler which just logs all errors with debug level. If reading an sstable fails, the user is expected to investigate turning debug-level logging on. When they do so, they will see any problems while reading the configuration (if it is relevant, e.g. when using EAR). Fixes: #16538 Closes scylladb/scylladb#16657 * github.com:scylladb/scylladb: tools/scylla-sstable: pass error handler to utils::config_file::read_from_file() tools/scylla-sstable: allow always passing --scylla-yaml-file option	2024-01-09 14:28:49 +03:00
Kefu Chai	b91eb89ffa	gms: heart_beat_state: add formatter for gms::heart_beat_state before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for gms::heart_beat_state, and remove its operator<<(). the only caller site of its operator<< is updated to use `fmt::print()` Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16652	2024-01-09 11:52:40 +02:00
Kefu Chai	cca786e847	gms: endpoint_state: fix a typo in comment Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16653	2024-01-09 11:51:49 +02:00
Kefu Chai	c1beba1f7d	utils: config_file: throw bpo::invalid_option_value() when seeing invalid option before this change, `std::invalid_argument` is thrown by `bpo::notify(configuration)` in `app_template::run_deprecated()` when invalid option is passed in via command line. `utils::named_value` throws `std::invalid_argument` if the given value is not listed in `_allowed_values`. but we don't handle `std::invalid_argument` in `app_template::run_deprecated()`. so the application aborts with unhandled exception if the specified argument is not allowed. in this change, we convert the `std::invalid_argument` to a derived class of `bpo::error` in the customized notify handler, so that it can be handled in `app_template::run_deprecated()`. because `name_value::operator()` is also used otherwhere, we should not throw a bpo::error there. so its exception type is preserved. Fixes #16687 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16688	2024-01-09 11:49:06 +02:00
Kefu Chai	a6152cb87b	sstables: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16666	2024-01-09 11:45:44 +02:00
Kefu Chai	be364d30fd	db: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16664	2024-01-09 11:44:19 +02:00
Aleksandra Martyniuk	6f13e55187	tasks: call release_resources when task is finished Call task_manager::task::impl::release_resources when task is finished instead of putting the responsibility on user. Closes scylladb/scylladb#16660	2024-01-09 11:41:54 +02:00
Pavel Emelyanov	cfeff893c6	network_topology_strategy: Print map of dc:rf pairs in one go The strategy constructor prints the dc:rf at the end making the sstring for it by hand. Modern fmt-based logger can format unordered_map-s on its own. The message would look slightly different though: Configured datacenter replicas are: foo:1 bar:2 into Configured datacenter replicas are: {"foo": 1, "bar": 2} Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16443	2024-01-09 11:30:49 +02:00
Kamil Braun	d93074e87e	cql3: don't parallelize select aggregates to local tables We've observed errors during shutdown like the following: ``` ERROR 2023-12-26 17:36:17,413 [shard 0:main] raft - [088f01a3-a18b-4821-b027-9f49e55c1926] applier fiber stopped because of the error: std::_Nested_exception<raft::state_machine_error> (State machine error at raft/server.cc:1230): std::runtime_error (forward_service is shutting down) INFO 2023-12-26 17:36:17,413 [shard 0:strm] storage_service - raft_state_monitor_fiber aborted with raft::stopped_error (Raft instance is stopped) ERROR 2023-12-26 17:36:17,413 [shard 0:strm] storage_service - raft topology: failed to fence previous coordinator raft::stopped_error (Raft instance is stopped, reason: "background error, std::_Nested_exception<raft::state_machine_error> (State machine error at raft/server.cc:1230): std::runtime_error (forward_service is shutting down)") ``` some CQL statement execution was trying to use `forward_service` during shutdown. It turns out that the statement is in `system_keyspace::load_topology_state`: ``` auto gen_rows = co_await execute_cql( format("SELECT count(range_end) as cnt FROM {}.{} WHERE key = '{}' AND id = ?", NAME, CDC_GENERATIONS_V3, cdc::CDC_GENERATIONS_V3_KEY), gen_uuid); ``` It's querying a table in the `system` keyspace. Pushing local table queries through `forward_service` doesn't make sense as the data is not distributed. Excluding local tables from this logic also fixes the shutdown error. Fixes scylladb/scylladb#16570 Closes scylladb/scylladb#16662	2024-01-08 14:44:22 -05:00
Kamil Braun	d4f4b58f3a	Merge 'topology_coordinator: reject removenode if the removed node is alive' from Patryk Jędrzejczak The removenode operation is defined to succeed only if the node being removed is dead. Currently, we reject this operation on the initiator side (in `storage_service::raft_removenode`) when the failure detector considers the node being removed alive. However, it is possible that even if the initiator considers the node dead, the topology coordinator will consider it alive when handling the topology request. For example, the topology coordinator can use a bigger failure detector timeout, or the node being removed can suddenly resurrect. This PR makes the topology coordinator reject removenode if the node being removed is considered alive. It also adds `test_remove_alive_node` that verifies this change. Fixes scylladb/scylladb#16109 Closes scylladb/scylladb#16584 * github.com:scylladb/scylladb: test: add test_remove_alive_node topology_coordinator: reject removenode if the removed node is alive test: ManagerClient: remove unused wait_for_host_down test: remove_node: wait until the node being removed is dead	2024-01-08 12:39:23 +01:00
Kamil Braun	d11e824802	Merge 'storage_service: make all Raft-based operations abortable' from Patryk Jędrzejczak During a shutdown, we call `storage_service::stop_transport` first. We may try to apply a Raft command after that, or still be in the the process of applying a command. In such a case, the shutdown process will hang because Raft retries replicating a command until it succeeds even in the case of a network error. It will stop when a corresponding abort source is set. However, if we pass `nullptr` to a function like `add_entry`, it won't stop. The shutdown process will hang forever. We fix all places that incorrectly pass `nullptr`. These shutdown hangs are not only theoretical. The incorrect `add_entry` call in `update_topology_state` caused scylladb/scylladb#16435. Additionally, we remove the default `nullptr` values in all member functions of `server` and `raft_group0_client` to avoid similar bugs in the future. Fixes scylladb/scylladb#16435 Closes scylladb/scylladb#16663 * github.com:scylladb/scylladb: server, raft_group0_client: remove the default nullptr values storage_service: make all Raft-based operations abortable	2024-01-08 11:30:56 +01:00
Botond Dénes	9119bcbd67	tools/scylla-sstable: pass error handler to utils::config_file::read_from_file() The default error handler throws an exception, which means scylla-sstable will exit with exception if there is any problem in the configuration. Not even ScyllaDB itself is this harsh -- it will just log a warning for most errors. A tool should be much more lenient. So this patch passes an error handler which just logs all errors with debug level. If reading an sstable fails, the user is expected to investigate turning debug-level logging on. When they do so, they will see any problems while reading the configuration (if it is relevant, e.g. when using EAR). Fixes: #16538	2024-01-08 02:18:15 -05:00
Botond Dénes	16791a63c9	tools/scylla-sstable: allow always passing --scylla-yaml-file option Currently, if multiple schema sources are provided, the tool complains about ambiguity, over which to consider. One of these option is --scylla-yaml-file. However, we want to allow passing this option any time, otherwise encrypted sstables cannot be read. So relax the multiple schema source check to also allow this option to be used even when e.g. --schema-file was used as the schema source.	2024-01-08 02:18:12 -05:00
Nadav Har'El	61395a3658	Update tools/java submodule * tools/java b7ebfd38...e106b500 (3): > build.xml: update scylla-driver-core to 3.11.5.1 > Use ReplicaOrdering.NEUTRAL in TokenAwarePolicy to respect RackAwareness > treewide: update "guava" package Refs https://github.com/scylladb/scylladb/pull/16491 Refs https://github.com/scylladb/scylla-tools-java/pull/372	2024-01-07 15:12:15 +02:00
Patryk Jędrzejczak	df2034ebd7	server, raft_group0_client: remove the default nullptr values The previous commit has fixed 5 bugs of the same type - incorrectly passing the default nullptr to one of the changed functions. At least some of these bugs wouldn't appear if there was no default value. It's much harder to make this kind of a bug if you have to write "nullptr". It's also much easier to detect it in review. Moreover, these default values are rarely used outside tests. Keeping them is just not worth the time spent on debugging.	2024-01-05 18:45:50 +01:00
Patryk Jędrzejczak	3d4af4ecf1	storage_service: make all Raft-based operations abortable During a shutdown, we call `storage_service::stop_transport` first. We may try to apply a Raft command after that, or still be in the the process of applying a command. In such a case, the shutdown process will hang because Raft retries replicating a command until it succeeds even in the case of a network error. It will stop when a corresponding abort source is set. However, if we pass `nullptr` to a function like `add_entry`, it won't stop. The shutdown process will hang forever. We fix all places that incorrectly pass `nullptr`. These shutdown hangs are not only theoretical. The incorrect `add_entry` call in `update_topology_state` caused scylladb/scylladb#16435.	2024-01-05 18:45:20 +01:00

... 3 4 5 6 7 ...

40759 Commits