Commit Graph

38044 Commits

Author SHA1 Message Date
Aleksandra Martyniuk
cdbfa0b2f5 replica: iterate safely over tables related maps
Loops over _column_families and _ks_cf_to_uuid which may preempt
are protected by reader mode of rwlock so that iterators won't
get invalid.
2023-07-25 17:13:04 +02:00
Aleksandra Martyniuk
a21d3357c3 replica: pass tables_metadata to phased_barrier_top_10_counts 2023-07-25 16:13:00 +02:00
Aleksandra Martyniuk
8842bd87c3 replica: add methods to safely add and remove table 2023-07-25 16:13:00 +02:00
Aleksandra Martyniuk
52afd9d42d replica: wrap column families related maps into tables_metadata
As a preparation for ensuring access safety for column families
related maps, add tables_metadata, access to members of which
would be protected by rwlock.
2023-07-25 16:13:00 +02:00
Aleksandra Martyniuk
395ce87eff replica: futurize database::add_column_family and database::remove
As a preparation for further changes, database::add_column_family
and database::remove return future<>.
2023-07-25 16:13:00 +02:00
Kamil Braun
e6099c4685 Merge 'config: set schema_commitlog_segment_size_in_mb to 128 ' from Patryk Jędrzejczak
Fixes #14668

In #14668, we have decided to introduce a new `scylla.yaml` variable for the schema commitlog segment size and set it to 128MB. The reason is that segment size puts a limit on the mutation size that can be written at once, and some schema mutation writes are much larger than average, as shown in #13864. This `schema_commitlog_segment_size_in_mb variable` variable is now added to `scylla.yaml` and `db/config`.

Additionally,  we do not derive the commitlog sync period for schema commitlog anymore because schema commitlog runs in batch mode, so it doesn't need this parameter. It has also been discussed in #14668.

Closes #14704

* github.com:scylladb/scylladb:
  replica: do not derive the commitlog sync period for schema commitlog
  config: set schema_commitlog_segment_size_in_mb to 128
  config: add schema_commitlog_segment_size_in_mb variable
2023-07-24 10:23:34 +02:00
Kefu Chai
3ad844a4bb build: cmake: set scylla version strings as CACHED strings
before this change, add_version_library() is a single function
which accomplishes two tasks:

1. build scylla-version target using
2. add an object library

but this has two problems:

1. we should run `SCYLLA-VERSION-GEN` at configure time, instead
   of at build time. otherwise the targets which read from the
   SCYLLA-{VERSION, RELEASE, PRODUCT}-FILE cannot access them,
   unless they are able to read them in their build rules. but
   they always use `file(STRINGS ..)` to read them, and thsee
   `file()` command is executed at configure time. so, this
   is a dead end.
2. we repeat the `file(STRING ..)` multiple places. this is
   not ideal if we want to minimize the repeatings.

so, to address this problem, in this change:

1. use `execute_process()` instead of `add_custom_command()`
   for generating these *-FILE files. so they are always ready
   at build time. this partially reverts bb7d99ad37.
2. extract `generate_scylla_version()` out of `add_version_library()`.
   so we can call the former much earlier than the latter.
   this would allow us to reference the variables defined by
   the `generate_scylla_version()` much earlier.
3. define cached strings in the extracted function, so that
   they can consumed by other places.
4. reference the cached variables in `build_submodule.cmake`.

also, take this opportunity to fix the version string
used in build_submodule.cmake: we should have used
`scylla_version_tilde`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14769
2023-07-24 08:57:19 +03:00
Jan Ciolek
decbc841b7 cql3/prepare_expr: fix partially preparing function arguments
Before choosing a function, we prepare the arguments that can be
prepared without a receiver. Preparing an argument makes
its type known, which allows to choose the best overload
among many possible functions.

The function that prepared the argument passes the unprepared
argument by mistake. Let's fix it so that it actually uses
the prepared argument.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes #14786
2023-07-21 18:59:56 +03:00
Jan Ciolek
cbc97b41d4 cql.g: make the parser reject INSERT JSON without a JSON value
We allow inserting column values using a JSON value, eg:
```cql
INSERT INTO mytable JSON '{ "\"myKey\"": 0, "value": 0}';
```

When no JSON value is specified, the query should be rejected.

Scylla used to crash in such cases. A recent change fixed the crash
(https://github.com/scylladb/scylladb/pull/14706), it now fails
on unwrapping an uninitialized value, but really it should
be rejected at the parsing stage, so let's fix the grammar so that
it doesn't allow JSON queries without JSON values.

A unit test is added to prevent regressions.

Refs: https://github.com/scylladb/scylladb/pull/14707
Fixes: https://github.com/scylladb/scylladb/issues/14709

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes #14785
2023-07-21 18:52:47 +03:00
Mikołaj Grzebieluch
37ceef23a6 test: raft: skip test_old_ip_notification_repro in debug mode
Closes #14777
2023-07-21 12:41:03 +02:00
Kefu Chai
a87b0d68cd s3/test: remove the tempdir if test succeeds
in 46616712, we tried to keep the tmpdir only if the test failed,
and keep up to 1 of them using the recently introduced
option of `tmp_path_retention_count`. but it turns out this option
is not supported by the pytest used by our jenkins nodes, where we
have pytest 6.2.5. this is the one shipped along with fedora 36.

so, in this change, the tempdir is removed if the test completes
without failures. as the tempdir contains huge number of files,
and jenkins is quite slow scanning them. after nuking the tempdir,
jenkins will be much faster when scanning for the artifacts.

Fixes #14690
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14772
2023-07-21 12:21:51 +03:00
Nadav Har'El
5860820934 Merge 'mutation/mutation_compactor: validate the input stream' from Botond Dénes
The mutation compactor has a validator which it uses to validate the stream of mutation fragments that passes through it. This validator is supposed to validate the stream as it enters the compactor, as opposed to its compacted form (output). This was true for most fragment kinds except range tombstones, as purged range tombstones were not visible to
the validator for the most part.

This mistake was introduced by https://github.com/scylladb/scylladb/commit e2c9cdb576, which itself was a flawed attempt at fixing an error seen because purged tombstones were not terminated by the compactor.

This patch corrects this mistake by fixing the above problem properly: on page-cut, if the validator has an active tombstone, a closing tombstone is generated for it, to avoid the false-positive error. With this, range tombstones can be validated again as they come in.

The existing unit test checking the validation in the compactor is greatly expanded to check all (I hope) different validation scenarios.

Closes #13817

* github.com:scylladb/scylladb:
  test/mutation_test: test_compactor_validator_sanity_test
  mutation/mutation_compactor: fix indentation
  mutation/mutation_compactor: validate the input stream
  mutation: mutation_fragment_stream_validating_filter: add accessor to underlying validator
  readers: reader-from-fragment: don't modify stream when created without range
2023-07-21 00:26:46 +03:00
Avi Kivity
e00811caac cql3: grammar: reject intValue with no contents
The grammar mistakenly allows nothing to be parsed as an
intValue (itself accepted in LIMIT and similar clauses).

Easily fixed by removing the empty alternative. A unit test is
added.

Fixes #14705.

Closes #14707
2023-07-21 00:24:51 +03:00
Pavel Emelyanov
98609e2115 Merge 's3/test: close using deferred_close() or deferred()' from Kefu Chai
let's use RAII to tear down the client and the input file, so we can
always perform the cleanups even if the test throws.

Closes #14765

* github.com:scylladb/scylladb:
  s3/test: use seastar::deferred() to perform cleanup
  s3/test: close using deferred_close()
2023-07-20 20:05:34 +03:00
Botond Dénes
bf6186ed7e Update tools/java submodule
* tools/java 9f63a96f...585b30fd (1):
  > cassandra-stress: add support for using RackAwareRoundRobinPolicy
2023-07-20 18:13:32 +03:00
Botond Dénes
819b45d107 Merge 'Remove dead replacing_nodes_pending_ranges_updater manipulations' from Pavel Emelyanov
The set in question is read-and-delete-only and thus always empty. Originally it was removed by commit c9993f020d (storage_service: get rid of handle_state_replacing), but some dangling ends were left. Consequentially, the on_alive() callback can get rid of few dead if-else branches

Closes #14762

* github.com:scylladb/scylladb:
  storage_service: Relax on_alive()
  storage_service: Remove _replacing_nodes_pending_ranges_updater
2023-07-20 16:55:44 +03:00
Pavel Emelyanov
9df750fd4c storage_service: Remove dead get_rpc_address()
Unused. Locator calls gossiper directly

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #14761
2023-07-20 16:54:24 +03:00
Botond Dénes
53da97416a Merge 'Remove qctx from system.paxos table access methods' from Pavel Emelyanov
The "fix" is straightforward -- callers of system_keyspace::*paxos* methods need to get system keyspace from somewhere. This time the only caller is storage_proxy::remote that can have system keyspace via direct dependency reference.

Closes #14758

* github.com:scylladb/scylladb:
  db/system_keyspace: Move and use qctx::execute_cql_with_timeout()
  db/system_keyspace: Make paxos methods non-static
  service/paxos: Add db::system_keyspace& argument to some methods
  test: Optionally initialize proxy remote for cql_test_env
  proxy/remote: Keep sharded<db::system_keyspace>& dependency
2023-07-20 16:53:25 +03:00
Botond Dénes
e62325babc Merge 'Compaction reshard task' from Aleksandra Martyniuk
Task manager tasks covering reshard compaction.

Reattempt on https://github.com/scylladb/scylladb/pull/14044. Bugfix for https://github.com/scylladb/scylladb/issues/14618 is squashed with 95191f4.
Regression test added.

Closes #14739

* github.com:scylladb/scylladb:
  test: add test for resharding with non-empty owned_ranges_ptr
  test: extend test_compaction_task.py to test resharding compaction
  compaction: add shard_reshard_sstables_compaction_task_impl
  compaction: invoke resharding on sharded database
  compaction: move run_resharding_jobs into reshard_sstables_compaction_task_impl::run()
  compaction: add reshard_sstables_compaction_task_impl
  compaction: create resharding_compaction_task_impl
2023-07-20 16:43:22 +03:00
Botond Dénes
a35f4f6985 test/mutation_test: test_compactor_validator_sanity_test
Greatly expand this test to check that the compactor validates the input
stream properly.
The test is renamed (the _sanity_test suffix is removed) to reflect the
expanded scope.
2023-07-20 08:48:50 -04:00
Botond Dénes
18ed94e60b mutation/mutation_compactor: fix indentation
Left broken by the previous patch.
2023-07-20 08:48:50 -04:00
Botond Dénes
3d5b70e0d7 mutation/mutation_compactor: validate the input stream
The mutation compactor has a validator which it uses to validate the
stream of mutation fragments that passes through it. This validator is
supposed to validate the stream as it enters the compactor, as opposed
to its compacted form (output). This was true for most fragment kinds
except range tombstones, as purged range tombstones were not visible to
the validator for the most part.
This mistake was introduced by e2c9cdb576, which itself was a flawed
attempt at fixing an error seen because purged tombstones were not
terminated by the compactor.
This patch corrects this mistake by fixing the above problem properly:
on page-cut, if the validator has an active tombstone, a closing
tombstone is generated for it, to avoid the false-positive error. With
this, range tombstones can be validated again as they come in.
2023-07-20 08:48:50 -04:00
Botond Dénes
dbb2a6f03a mutation: mutation_fragment_stream_validating_filter: add accessor to underlying validator 2023-07-20 08:48:50 -04:00
Botond Dénes
93dd16fccc readers: reader-from-fragment: don't modify stream when created without range
The fragment reader currently unconditionally forwards its buffer to the
passed-in partition range. Even if this range is
`query::full_partition_range`, this will involve dropping any fragments
up to the first partitions tart. This causes problems for test users who
intentionally create invalid fragment streams, that don't start with a
partition-start.
Refactor the reader to not do any modifications on the stream, when
neither slice, nor partition-range was passed by the user.
2023-07-20 08:48:50 -04:00
Kefu Chai
fdf61d2f7c compaction_manager: prevent gc-only sstables from being compacted
before this change, there are chances that the temporary sstables
created for collecting the GC-able data create by a certain
compaction can be picked up by another compaction job. this
wastes the CPU cycles, adds write amplification, and causes
inefficiency.

in general, these GC-only SSTables are created with the same run id
as those non-GC SSTables, but when a new sstable exhausts input
sstable(s), we proactively replace the old main set with a new one
so that we can free up the space as soon as possible. so the
GC-only SSTables are added to the new main set along with
the non-GC SSTables, but since the former have good chance to
overlap the latter. these GC-only SSTables are assigned with
different run ids. but we fail to register them to the
`compaction_manager` when replacing the main sstable set.
that's why future compactions pick them up when performing compaction,
when the compaction which created them is not yet completed.

so, in this change,

* to prevent sstables in the transient stage from being picked
  up by regular compactions, a new interface class is introduced
  so that the sstable is always added to registration before
  it is added to sstable set, and removed from registration after
  it is removed from sstable set. the struct helps to consolidate
  the regitration related logic in a single place, and helps to
  make it more obvious that the timespan of an sstable in
  the registration should cover that in the sstable set.
* use a different run_id for the gc sstable run, as it can
  overlap with the output sstable run. the run_id for the
  gc sstable run is created only when the gc sstable writer
  is created. because the gc sstables is not always created
  for all compactions.

please note, all (indirect) callers of
`compaction_task_executor::compact_sstables()` passes a non-empty
`std::function` to this function, so there is no need to check for
empty before calling it. so in this change, the check is dropped.

Fixes #14560
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14725
2023-07-20 15:47:48 +03:00
Asias He
865891cf02 doc: Repair system_auth with nodetool repair -pr option
Since repair is performed on all nodes, each node can just repair the
primary ranges instead of all owned ranges. This avoids repair ranges
more than once.

Closes #14766
2023-07-20 15:12:20 +03:00
Anna Stuchlik
6c70aef2d1 doc: document customizing CPUSET
Fixes https://github.com/scylladb/scylla-docs/issues/4004

This commit adds a Knowledge Base article on how to customize
CUPUSET.

Closes #13941
2023-07-20 15:07:32 +03:00
Raphael S. Carvalho
3117f2f066 tests: Add test for table's mutation source excluding staging
Commit f5e3b8df6d introduced an optimization for
as_mutation_source_excluding_staging() and added a test that
verifies correctness of single key and range reads based
on supplied predicates. This new test aims to improve the
coverage by testing directly both table::as_mutation_source()
and as_mutation_source_excluding_staging(), therefore
guaranteeing that both supply the correct predicate to
sstable set.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14763
2023-07-20 07:14:36 +03:00
Kefu Chai
77faec4f38 s3/test: use seastar::deferred() to perform cleanup
let's use RAII to remove the object use as a fixture, so we don't
leave some object in the bucket for testing. this might interfere
with other tests which share the same minio server with the test
which fails to do its clean up if an exception is thrown.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-20 10:04:54 +08:00
Kefu Chai
7a9c802fc3 s3/test: close using deferred_close()
let's use RAII to tear down the client and the input file, so we can
always perform the cleanups even if the test throws.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-20 10:04:54 +08:00
Pavel Emelyanov
7a3d61ce2c storage_service: Relax on_alive()
Now when there's always-false local variable, it can also be dropped and
all the associated if-else branches can be simplified

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-07-19 22:02:12 +03:00
Pavel Emelyanov
61a37cf6bf storage_service: Remove _replacing_nodes_pending_ranges_updater
The set in question is always empty, so it can be removed and the only
check for its contents can be constified

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-07-19 22:01:06 +03:00
Pavel Emelyanov
ea9db1b35c Merge 'cql3: expr: remove the default constructor' from Avi Kivity
`expression`'s default constructor is dangerous as an it can leak
into computations and generate surprising results. Fix that by
removing the default constructor.

This is made somewhat difficult by the parser generator's reliance
on default construction, and we need to expand our workaround
(`uninitialized<>`) capabilities to do so.

We also remove some incidental uses of default-constructed expressions.

Closes #14706

* github.com:scylladb/scylladb:
  cql3: expr: make expression non-default-constructible
  cql3: grammar: don't default-construct expressions
  cql3: grammar: improve uninitialized<> flexibility
  cql3: grammar: adjust uninitialized<> wrapper
  test: expr_test: don't invoke expression's default constructor
  cql3: statement_restrictions: explicitly initialize expressions in index match code
  cql3: statement_restrictions: explicitly intitialize some expression fields
  cql3: statement_restrictions: avoid expression's default constructor when classifying restrictions
  cql3: expr: prepare_expression: avoid default-constructed expression
  cql3: broadcast_tables: prepare new_value without relying on expression default constructor
2023-07-19 21:46:03 +03:00
Pavel Emelyanov
8a87c87824 db/system_keyspace: Move and use qctx::execute_cql_with_timeout()
This template call is only used by system keyspace paxos methods. All
those methods are no longer static and can use system_keyspace::_qp
reference to real query processor instead of global qctx. The
execute_cql_with_timeout() wrapper is moved to system_keyspace to make
it work

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-07-19 19:32:10 +03:00
Pavel Emelyanov
b9ef16c06f db/system_keyspace: Make paxos methods non-static
The service::paxos_state methods that call those already have system
keyspace reference at hand and can call method on an object

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-07-19 19:32:10 +03:00
Pavel Emelyanov
d9ba8eb8df service/paxos: Add db::system_keyspace& argument to some methods
The paxos_state's .prepare(), .accept(), .learn() and .prune() methods
access system keyspace via its static methods. The only caller of those
(storage_proxy::remote) already has the sharded system k.s. reference
and can pass its .local() one as argument

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-07-19 19:32:10 +03:00
Pavel Emelyanov
b4fc1076e3 test: Optionally initialize proxy remote for cql_test_env
Some test cases that use cql_test_env involve paxos state updates. Since
this update is becoming via proxy->remote->system_keyspace those test
cases need cql_test_env to initialize the remote part of the proxy too

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-07-19 19:32:10 +03:00
Aleksandra Martyniuk
bfb81b8cdd test: add test for resharding with non-empty owned_ranges_ptr 2023-07-19 17:19:10 +02:00
Aleksandra Martyniuk
4fc4c2527c test: extend test_compaction_task.py to test resharding compaction 2023-07-19 17:19:10 +02:00
Aleksandra Martyniuk
77dcdd743e compaction: add shard_reshard_sstables_compaction_task_impl
Add task manager's task covering resharding compaction on one shard.
2023-07-19 17:19:10 +02:00
Aleksandra Martyniuk
f73178a114 compaction: invoke resharding on sharded database
In reshard_sstables_compaction_task_impl::run() we call
sharded<sstables::sstable_directory>::invoke_on_all. In lambda passed
to that method, we use both sharded sstable_directory service
and its local instance.

To make it straightforward that sharded and local instances are
dependend, we call sharded<replica::database>::invoke_on_all
instead and access local directory through the sharded one.
2023-07-19 17:19:10 +02:00
Aleksandra Martyniuk
fa10c352a1 compaction: move run_resharding_jobs into reshard_sstables_compaction_task_impl::run() 2023-07-19 17:19:10 +02:00
Aleksandra Martyniuk
7a7e287d8c compaction: add reshard_sstables_compaction_task_impl
Add task manager's task covering resharding compaction.

A struct and some functions are moved from replica/distributed_loader.cc
to compaction/task_manager_module.cc.
2023-07-19 17:15:40 +02:00
Pavel Emelyanov
b0b91bf5ec proxy/remote: Keep sharded<db::system_keyspace>& dependency
This dependency will be needed to call service::paxos_state:: calls and
all of them are done in storage_proxy::remote() methods only

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-07-19 17:36:42 +03:00
Avi Kivity
fc71f49907 Update seastar submodule
* seastar bac344d58...c0e618bbb (7):
  > resource: take kernel min_free_kbytes into account when allocating memory
Fixes #14721
  > build: append JOB_POOLS definition instead of setting it
  > net: use designated initialization when appropriate
  > websocket: print logging message before handle_ping()
  > circular_buffer_fixed_capacity_test: enable randomize test to reverse
  > prometheus: do not qualify return type with const.
  > alien: do not define customized copy ctor

Closes #14755
2023-07-19 16:47:05 +03:00
Patryk Jędrzejczak
ee1c240f2a replica: do not derive the commitlog sync period for schema commitlog
We don't want to apply the value of the commitlog_sync_period_in_ms
variable to schema commitlog. Schema commitlog runs in batch mode,
so it doesn't need this parameter.
2023-07-19 14:16:50 +02:00
Patryk Jędrzejczak
b3be9617dc config: set schema_commitlog_segment_size_in_mb to 128
We increase the default schema commitlog segment size so that the
large mutations do not fail. We have agreed that 128 MB is sufficient.
2023-07-19 14:16:49 +02:00
Patryk Jędrzejczak
5b167a4ad7 config: add schema_commitlog_segment_size_in_mb variable
In #14668, we have decided to introduce a new scylla.yaml variable
for the schema commitlog segment size. The segment size puts a limit
on the mutation size that can be written at once, and some schema
mutation writes are much larger than average, as shown in #13864.
Therefore, increasing the schema commitlog segment size is sometimes
necessary.
2023-07-19 14:16:41 +02:00
Kefu Chai
8f390997cb db: do not use std::cmp_not_equal() when appropriate
this change is a follow-up of 3129ae3c8c.
since in both cases in this change, the `num_ranges` should always
be greater than zero, there is no need to use `int` for its type,
and "num_ranges" returned by the CQL query should always be greater
or equal to zero, so there is no need to check if it is positive.

in this change, we

* change the type of `num_ranges` to `size_t`
* change std::cmp_not_equal() to !=

to avoid using the verbose `std::cmp_not_equal()` helper, for better
readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14754
2023-07-19 13:25:21 +03:00
Mikołaj Grzebieluch
00db47292b test: raft: do not update raft address map with obsolete gossip data
Regression test for #14257.

It starts two nodes. It introduces a sleep in raft_group_registry::on_alive
(in raft_group_registry.cc) when receiving a gossip notification about HOST_ID
update from the second node. Then it restarts the second node with a different IP.
Due to the sleep, the old notification from the old IP arrives after the second
node has restarted. If the bug is present, this notification overrides the address
map entry and the second read barrier times out, since the first node cannot reach
the second node with the old IP.

Closes #14609.

Closes #14728
2023-07-19 11:57:49 +02:00