scylla

Author	SHA1	Message	Date
Tomasz Grabiec	7d0f4c10a2	test: tablets: Add test for failed streaming being fenced away	2023-12-06 18:37:01 +01:00
Patryk Jędrzejczak	cd7b282db6	test: ManagerClient: introduce servers_add We add a new function - servers_add - that allows adding multiple servers concurrently to a cluster. It makes use of a concurrent bootstrap now supported in the raft-based topology. servers_add doesn't have the replace_cfg parameter. The reason is that we don't support concurrent replace operations, at least for now. There is an implementation detail in ScyllaCluster.add_servers. We cannot simply do multiple calls to add_server concurrently. If we did that in an empty cluster, every node would take itself as the only seed and start a new cluster. To solve this, we introduce a new field - initial_seed. It is used to choose one of the servers as a seed for all servers added concurrently to an empty cluster. Note that the add_server calls in asyncio.gather in add_servers cannot race with each other when setting initial_seed because there is only one thread. In the future, we will also start all initial servers concurrently in ScyllaCluster.install_and_start. The changes in this commit were designed in a way that will make changing install_and_start easy.	2023-11-24 09:39:01 +01:00
Patryk Jędrzejczak	aca90e6640	test: ManagerClient: introduce _create_server_add_data We introduce this function to avoid code duplication. After the following commits, it will also be used in the new ManagerClient.servers_add function.	2023-11-24 09:39:01 +01:00
Patryk Jędrzejczak	9775b1c12d	test: server_add: wait until the node being replaced is dead In the following commits, we make the topology coordinator reject join requests if the node being replaced is considered alive by the gossiper. Before making this change, we need to adapt the testing framework so that we don't have flaky replace operations that fail because the node being replaced hasn't been marked as dead yet. We achieve this by waiting until all other running nodes see the node being replaced as dead in all replace operations.	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	18ed89f760	test: server_add: add support for expected errors After this change, if we try to add a server and it fails with an expected error, the add_server function will not throw. Also, the server will be correctly installed and stopped. Two issues are motivating this feature. The first one is that if we want to add a server while expecting an error, we have to do it in two steps: - call server_add with the start parameter set to False, - call server_start with the expected_error parameter. It is quite inconvenient. The second one is that we want to be able to test the replace operation when it is considered incorrect, for example when we try to replace an alive node. To do this, we would have to remove some assertions from ScyllaCluster.add_server. However, we should not remove them because they give us clear information when we write an incorrect test. After adding the expected_error parameter, we can ignore these assertions only when we expect an error. In this way, we enable testing failing replace operations without sacrificing the testing framework's protection.	2023-11-21 12:39:16 +01:00
Paweł Zakrzewski	a0dcc154c1	test: add the auth_cluster test suite This commit adds the auth_cluster test suite to test a custom scenario involving password authentication: - create a cluster of 2 nodes with password authentication - down one node - the other node should refuse login stating that it couldn't reach QUORUM References ScyllaDB OSS #2339	2023-11-13 14:04:28 +01:00
Kamil Braun	7dcee7de02	test/pylib: implement `expected_error` for decommission and removenode You can now pass `expected_error` to `ManagerClient.decommission_node` and `ManagerClient.remove_node`. Useful in combination with error injections, for example. Closes scylladb/scylladb#15650	2023-10-17 16:25:43 +03:00
Kamil Braun	05ede7a042	test/pylib: always return a response from `put_json` In `20ff2ae5e1` mutating endpoints were changed to use PUT. But some of them return a response, and I forgot to provide `response_type` parameter to `put_json` (which causes `RESTClient` to actually obtain the response). These endpoints now return `None`. Fix this. Closes scylladb/scylladb#15674	2023-10-09 14:35:04 +03:00
Kamil Braun	d3bc0d47e0	test/pylib: always return data as JSON from endpoints Some endpoint handlers return JSON, some return text, some return empty responses. Reduce the number of different handler types by making the text case a subcase of the JSON case. This also simplifies some code on the `ManagerClient` side, which would have to deserialize data from text (because some endpoint handlers would serialize data into text for no particular reason). And it will allow reducing boilerplate in later commits even further.	2023-10-06 11:24:02 +02:00
Kamil Braun	f848d7b5c0	test/pylib: use JSON data to pass `expected_error` in `server_start` Most other endpoints receive data through request body as JSON, this one endpoint is an exception for some reason. Make it consistent with others.	2023-10-06 10:55:45 +02:00
Kamil Braun	20ff2ae5e1	test/pylib: use PUT instead of GET for mutating endpoints `ScyllaClusterManager` registers a bunch of HTTP endpoints which `ManagerClient` uses to perform operations on a cluster during a topology test. The endpoints were inconsistently using verbs, like using GET for endpoints that would have side effects. Use PUT for these.	2023-10-06 10:55:45 +02:00
Kamil Braun	33463df7d2	test/pylib: fix some type errors	2023-10-06 10:55:45 +02:00
Botond Dénes	70e26e5a10	test/pylib: add REST methods to get node exe and workdir paths	2023-09-22 02:53:15 -04:00
Botond Dénes	7e7101c180	Revert "Merge 'database, storage_proxy: Reconcile pages with dead rows and partitions incrementally' from Botond Dénes" This reverts commit `628e6ffd33`, reversing changes made to `45ec76cfbf`. The test included with this PR is flaky and often breaks CI. Revert while a fix is found. Fixes: #15371	2023-09-13 10:45:37 +03:00
Botond Dénes	46e37436d0	test/pylib: add REST methods to get node exe and workdir paths	2023-09-11 07:02:14 -04:00
Aleksandra Martyniuk	ede8182dd4	test: fix types and variable names in wait_for_host_down Fix types and variable names in ManagerClient::wait_for_host_down and related methods.	2023-09-05 15:01:59 +02:00
Mikołaj Grzebieluch	a031a14249	tests: add asynchronous log browsing functionality Add a class that handles log file browsing with the following features: * mark: returns "a mark" to the current position of the log. * wait_for: asynchronously checks if the log contains the given message. * grep: returns a list of lines matching the regular expression in the log. Add a new endpoint in `ManagerClient` to obtain the scylla logfile path. Fixes #14782 Closes #14834	2023-08-25 14:19:09 +02:00
Kamil Braun	169d19e5b0	Merge 'raft topology: support --ignore-dead-nodes in removenode and replace' from Patryk Jędrzejczak We add support for `--ignore-dead-nodes` in `raft_removenode` and `--ignore-dead-nodes-for-replace` in `raft_replace`. For now, we allow passing only host ids of the ignored nodes. Supporting IPs is currently impossible because `raft_address_map` doesn't provide a mapping from IP to a host id. The main steps of the implementation are as follows: - add the `ignore_nodes` column to `system.topology`, - set the `ignore_nodes` value of the topology mutation in `raft_removenode` and `raft_replace`, - extend `service::request_param` with alternative types that allow storing a set of ids of the ignored nodes, - load `ignore_nodes` from `system.topology` into `request_param` in `system_keyspace::load_topology_state`, - add `ignore_nodes` to `exclude_nodes` in `topology_coordinator::exec_global_command`, - pass `ignore_nodes` to `replace_with_repair` and `remove_with_repair` in `storage_service::raft_topology_cmd_handler`. Additionally, we add `test_raft_ignore_nodes.py` with two tests that verify the added changes. Fixes #15025 Closes #15113 * github.com:scylladb/scylladb: test: add test_raft_ignore_nodes test: ManagerClient.remove_node: allow List[HostId] for ignore_dead raft topology: pass ignore_nodes to {replace, remove}_with_repair raft topology: exec_global_command: add ignore_nodes to exclude_nodes raft topology: exec_global_command: change type of exclude_nodes topology_state_machine: extend request_param with a set of raft ids raft topology: set ignore_nodes in raft_removenode and raft_replace utils: introduce split_comma_separated_list raft topology: add the ignore_nodes column to system.topology	2023-08-22 18:04:59 +02:00
Kamil Braun	cdc3cd2b79	Merge 'raft: add fencing tests' from Petr Gusev In this PR a simple test for fencing is added. It exercises the data plane, meaning if it somehow happens that the node has a stale topology version, then requests from this node will get an error 'stale topology'. The test just decrements the node version manually through CQL, so it's quite artificial. To test a more real-world scenario we need to allow the topology change fiber to sometimes skip unavailable nodes. Now the algorithm fails and retries indefinitely in this case. The PR also adds some logs, and removes one seemingly redundant topology version increment, see the commit messages for details. Closes #14901 * github.com:scylladb/scylladb: test_fencing: add test_fence_hints test.py: output the skipped tests test.py: add skip_mode decorator and fixture test.py: add mode fixture hints: add debug log for dropped hints hints: send_one_hint: extend the scope of file_send_gate holder pylib: add ScyllaMetrics hints manager: add send_errors counter token_metadata: add debug logs fencing: add simple data plane test random_tables.py: add counter column type raft topology: don't increment version when transitioning to node_state::normal	2023-08-22 16:28:21 +02:00
Patryk Jędrzejczak	6818d13f7d	test: ManagerClient.remove_node: allow List[HostId] for ignore_dead ManagerClient.remove_node allows passing ignore_dead only as List[IPAddress]. However, raft_removenode currently supports only host ids. To write a test that passes ignore_dead to ManagerClient.remove_node in the Raft topology mode, we allow passing ignore_dead as List[HostId]. Note that we don't want to use List[IPAddress \| HostId] because mixing IP addresses and host ids fails anyway. See ss::remove_node.set(...) in api::set_storage_service.	2023-08-22 14:19:09 +02:00
Petr Gusev	0b7a90dff6	pylib: add ScyllaMetrics This patch adds facilities to work with Scylla metrics from test.py tests. The new metrics property was added to ManagerClient, its query method sends a request to Scylla metrics endpoint and returns and object to conveniently access the result. ScyllaMetrics is copy-pasted from test_shedding.py. It's difficult to reuse code between 'new' and 'old' styles of tests, we can't just import pylib in 'old' tests because of some problems with python search directories. A past commit of mine that attempted to solve this problem was rejected on review.	2023-08-22 14:31:04 +04:00
Gleb Natapov	517f6bfa8a	test: add rebuild test Add simple rebuild test that makes sure that rebuild operation does not fail.	2023-08-10 16:46:13 +03:00
Konstantin Osipov	df97135583	test.py: forward the optional property file when creating a server To support multi-DC tests we need to provide a property file when creating a server. Forward it from the test client to test.py. Closes #14683	2023-08-02 13:45:19 +02:00
Alejo Sanchez	2194d8864b	test/pylib: remove redundant method The ManagerClient.get_cql method is defined twice. Remove one and fix the assert. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-07-18 13:33:46 +02:00
Kamil Braun	3464877276	test: manager_client: make `con_gen` for `ManagerClient.__init__` nonoptional `ManagerClient` is given a function that is used to create CQL connections to the Scylla cluster. For some reason it was typed as `Optional` even though it was never passed `None`. Fix it.	2023-07-12 11:44:15 +02:00
Kamil Braun	2032d7dbe4	test: scylla_cluster: return the new IP from `change_ip` API Also simplify the API by getting rid of `ActionReturn` and returning errors through exceptions (which are correctly forwarded to the client for some time already).	2023-07-06 10:24:46 +02:00
Kamil Braun	b38dcba6ed	test: pylib: increase checking period for `get_alive_endpoints` `server_sees_others` and similar functions periodically call `get_alive_endpoints`. The period was `.1` seconds, increase it to `.5` to reduce the log spam (I checked empirically that `.5` is usually how long it takes in dev mode on my laptop.)	2023-06-20 13:03:46 +02:00
Kamil Braun	ae92932240	test: pylib: manager_client: `get_cql()` helper	2023-06-20 13:03:46 +02:00
Kamil Braun	e02249f0cd	test: pylib: ScyllaCluster: server pause/unpause API	2023-06-20 13:03:46 +02:00
Kamil Braun	7e56388721	test: pylib: ScyllaCluster: generalize config type for `server_add` Generalize from `dict[str, str]` to `dict[str, Any]`.	2023-05-29 11:03:36 +02:00
Kamil Braun	f581282625	test: topology_experimental_raft: test `check_and_repair_cdc` API	2023-05-08 16:49:01 +02:00
Alejo Sanchez	11561a73cb	test/pylib: ManagerClient helpers to wait for... server to see other servers after start/restart When starting/restarting a server, provide a way to wait for the server to see at least n other servers. Also leave the implementation methods available for manual use and update previous tests, one to wait for a specific server to be seen, and one to wait for a specific server to not be seen (down). Fixes #13147 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #13438	2023-04-20 14:22:31 +02:00
Tomasz Grabiec	041ee3ffdd	test: pylib: Add a way to create cql connections with particular coordinators Usage: await manager.driver_connect(server=servers[0]) manager.cql.execute(f"...", execution_profile='whitelist')	2023-04-13 21:23:03 +02:00
Alejo Sanchez	e3b462507d	test/pylib: topology: support clusters of initial size 0 To allow tests with custom clusters, allow configuration of initial cluster size of 0. Add a proof-of-concept test to be removed later. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #13342	2023-03-31 11:17:58 +02:00
Petr Gusev	e407956e9f	scylla_cluster.py: add start flag to server_add Sometimes when creating a node it's useful to just install it and not start. For example, we may want to try to start it later with expected error. The ScyllaServer.install method has been made exception safe, if an exception occurs, it reverts to the original state. This allows to not duplicate the try/except logic in two of its call sites.	2023-03-24 16:08:17 +04:00
Petr Gusev	794d0e4000	ServerInfo: drop host_id We are going to allow the ScyllaCluster.add_server function not to start the server if the caller has requested that with a special parameter. The host_id can only be obtained from a running node, so add_server won't be able to return it in this case. I've grepped the tests for host_id and there doesn't seem to be any reference to it in the code.	2023-03-24 16:08:17 +04:00
Petr Gusev	8e3392c64f	scylla_cluster.py: add config to server_add Sometimes when creating a node it's useful to pass a custom node config.	2023-03-24 16:08:17 +04:00
Petr Gusev	c1d0ee2bce	scylla_cluster.py: add expected_error to server_start Sometimes it's useful to check that the node has failed to start for a particular reason. If server_start can't find expected_error in the node's log or if the node has started without errors, it throws an exception.	2023-03-24 16:08:11 +04:00
Konstantin Osipov	4ace19928d	raft: (test) test ip address change	2023-03-10 19:52:40 +03:00
Botond Dénes	e55f475db1	Merge 'test/pylib: use larger timeout for decommission/removenode' from Kamil Braun Recently we enabled RBNO by default in all topology operations. This made the operations a bit slower (repair-based topology ops are a bit slower than classic streaming - they do more work), and in debug mode with large number of concurrent tests running, they might timeout. The timeout for bootstrap was already increased before, do the same for decommission/removenode. The previously used timeout was 300 seconds (this is the default used by aiohttp library when it makes HTTP requests), now use the TOPOLOGY_TIMEOUT constant from ScyllaServer which is 1000 seconds. Closes #12765 * github.com:scylladb/scylladb: test/pylib: use larger timeout for decommission/removenode test/pylib: scylla_cluster: rename START_TIMEOUT to TOPOLOGY_TIMEOUT	2023-02-13 16:30:24 +02:00
Nadav Har'El	2653865b34	Merge 'test.py: improve test failure handling' from Kamil Braun Improve logging by printing the cluster at the end of each test. Stop performing operations like attempting queries or dropping keyspaces on dirty clusters. Dirty clusters might be completely dead and these operations would only cause more "errors" to happen after a failed test, making it harder to find the real cause of failure. Mark cluster as dirty when a test that uses it fails - after a failed test, we shouldn't assume that the cluster is in a usable state, so we shouldn't reuse it for another test. Rely on the `is_dirty` flag in `PythonTest`s and `CQLApprovalTest`s, similarly to what `TopologyTest`s do. Closes #12652 * github.com:scylladb/scylladb: test.py: rely on ScyllaCluster.is_dirty flag for recycling clusters test/topology: don't drop random_tables keyspace after a failed test test/pylib: mark cluster as dirty after a failed test test: pylib, topology: don't perform operations after test on a dirty cluster test/pylib: print cluster at the end of test	2023-02-12 12:13:25 +02:00
Kamil Braun	54f85c641d	test/pylib: use larger timeout for decommission/removenode Recently we enabled RBNO by default in all topology operations. This made the operations a bit slower (repair-based topology ops are a bit slower than classic streaming - they do more work), and in debug mode with large number of concurrent tests running, they might timeout. The timeout for bootstrap was already increased before, do the same for decommission/removenode. The previously used timeout was 300 seconds (this is the default used by aiohttp library when it makes HTTP requests), now use the TOPOLOGY_TIMEOUT constant from ScyllaServer which is 1000 seconds.	2023-02-10 15:56:31 +01:00
Kamil Braun	fde6ad5fc0	test/pylib: scylla_cluster: rename START_TIMEOUT to TOPOLOGY_TIMEOUT Use a more generic name since the constant will also be used as timeout for decommission and removenode.	2023-02-10 15:56:31 +01:00
Asias He	fc60484422	test: Increase START_TIMEOUT It is observed that CI machine is slow to run the test. Increase the timeout of adding servers.	2023-02-03 21:15:08 +08:00
Kamil Braun	a9dbd89478	test/pylib: mark cluster as dirty after a failed test We don't expect the cluster to be functioning at all after a failed test. The whole cluster might have crashed, for example. In these situations the framework would report multiple errors (one for the actual failure, another for a failed post-condition check because the cluster was down) which would only obscure the report and make debugging harder. It's also not safe in general to reuse the cluster in another test - if the test previous failed, we should not assume that it's in a valid state. Therefore, mark the cluster as dirty after a failed test. This will let us recycle the cluster based on the dirty flag and it will disable post-condition check after a failed test (which is only done on non-dirty clusters). To implement this in topology tests, we use the `pytest_runtest_makereport` hook which executes after a test finishes but before fixtures finish. There we store a test-failed flag in a stash provided by pytest, then access the flag in the `manager` fixture.	2023-02-02 16:35:55 +01:00
Kamil Braun	f4b56cddde	test/pylib: print cluster at the end of test - print the cluster used by the test in `after_test` - if cluster setup fails in `before_test`, print the cluster together with the exception (`after_test` is not executed if `before_test` fails)	2023-02-02 15:59:02 +01:00
Kamil Braun	d134c458e5	test/pylib: increase timeout when waiting for cluster before test Increase the timeout from default 5 minutes to 10 minutes. Sent as a workaround for #12546 to unblock next promotions. Closes #12547	2023-01-17 21:03:09 +02:00
Benny Halevy	7d0d9e28f1	test: pylib: ServerInfo: add host_id Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-13 18:36:07 +02:00
Alejo Sanchez	d632e1aa7a	test/pytest: add missing import, remove unused import Add missed import time and remove unused name import. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #12446	2023-01-08 17:38:46 +02:00
Petr Gusev	1c23390f12	test.py, allow to specify the node's command line in test An optional parameter cmdline has been added to the ManagerClient.server_add method. It allows you to override the default parameters set by the SCYLLA_CMDLINE_OPTIONS variable by changing, adding or deleting individual items. To change or add a parameter just specify its name and value one after the other. To remove parameter use the special keyword __remove__ as a value. To set a parameter without a value (such as --overprovisioned) use the special keyword __missing__ as the value.	2023-01-03 15:24:54 +03:00

1 2

82 Commits