Commit Graph

119 Commits

Author SHA1 Message Date
Botond Dénes
ba7a9d2ac3 imr: switch back to open-coded description of structures
Commit aab6b0ee27 introduced the
controversial new IMR format, which relied on a very template-heavy
infrastructure to generate serialization and deserialization code via
template meta-programming. The promise was that this new format, beyond
solving the problems the previous open-coded representation had (working
on linearized buffers), will speed up migrating other components to this
IMR format, as the IMR infrastructure reduces code bloat, makes the code
more readable via declarative type descriptions as well as safer.
However, the results were almost the opposite. The template
meta-programming used by the IMR infrastructure proved very hard to
understand. Developers don't want to read or modify it. Maintainers
don't want to see it being used anywhere else. In short, nobody wants to
touch it.

This commit does a conceptual revert of
aab6b0ee27. A verbatim revert is not
possible because related code evolved a lot since the merge. Also, going
back to the previous code would mean we regress as we'd revert the move
to fragmented buffers. So this revert is only conceptual, it changes the
underlying infrastructure back to the previous open-coded one, but keeps
the fragmented buffers, as well as the interface of the related
components (to the extent possible).

Fixes: #5578
2021-02-16 23:43:07 +01:00
Konstantin Osipov
b4f875f08e uuid: reduce code dependency on UUID_gen.hh
Do not include UUID_gen.hh in trace_state.hh and lists.hh
to reduce header level dependency on it.

Message-Id: <20210127173114.725761-2-kostja@scylladb.com>
2021-01-27 20:08:29 +02:00
Benny Halevy
c60da2e90d cdc: remove _token_metadata from db_context
1. It's unused since cbe510d1b8
2. It's unsafe to keep a reference to token_metadata&
potentially across yield points.

The higher-level motivation is to make
storage_service::get_token_metadata() private so we
can control better how it's used.

For cdc, if the token_metadata is going to be needed
to the future, it'd be better get it from
db_context::_proxy.get_token_metadata_ptr().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20201213162351.52224-2-bhalevy@scylladb.com>
2020-12-13 18:32:17 +02:00
Pavel Emelyanov
89fd524c5a schema-tables: Add database argument to make_update_table_mutations
There are 3 callers of this helper (cdc, migration manager and tests)
and all of them already have the database object at hands.

The argument will be used by next patch to remove call for global
storage proxy instance from make_update_indices_mutations.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-12-11 21:21:22 +03:00
Kamil Braun
2da723b9c8 cdc: produce postimage when inserting with no regular columns
When a row was inserted into a table with no regular columns, and no
such row existed in the first place, postimage would not be produced.
Fix this.

Fixes #7716.

Closes #7723
2020-12-01 18:01:23 +02:00
Piotr Jastrzebski
6b1167ea0d cdc: Remove std::iterator from collection_iterator
std::iterator is deprecated since C++17 so define all the required
iterator_traits directly.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-17 16:53:20 +01:00
Calle Wilund
46ea8c9b8b cdc: Add an "end-of-record" column to
Fixes #7435

Adds an "eor" (end-of-record) column to cdc log. This is non-null only on
last-in-timestamp group rows, i.e. end of a singular source "event".

A client can use this as a shortcut to knowing whether or not he has a
full cdc "record" for a given source mutation (single row change).

Closes #7436
2020-10-26 09:39:27 +02:00
Avi Kivity
d3c0b4c555 cdc: log: don't capture structured bindings in lambdas
Clang does not yet implement p1091r3, which allows lambdas
to capture structured bindings. To accomodate it, don't
use structured bindings for variables that are later
captured.
2020-10-16 15:23:16 +03:00
Calle Wilund
70a282ced2 cdc: Remove post-filterings for keys-only/off cdc delta generation
Refs #7095

CDC delta!=full both relied on post-filtering
to remove generated log row and/or cells. This is inefficient.
Instead, simply check if the data should be created in the
visitors.

v2:
* Fixed delta logs rows created (empty) even when delta == off
v3:
* Killed delta == off
v4:
* Move checks into (const) member var(s)
2020-08-31 07:59:43 +00:00
Calle Wilund
78236c015a cdc: Remove cdc delta_mode::off
Fixes #7128

CDC logs are not useful without at least delta_mode==keys, since
pre/post image data has no info on _what_ was actually done to
base table in source mutation.
2020-08-31 07:59:40 +00:00
Calle Wilund
e50911e5b0 cdc: Do not generate pre/post image for non-existent rows
Fixes #7119
Fixes #7120

If preimage select came up empty - i.e. the row did not exist, either
due to never been created, or once delete, we should not bother creating
a log preimage row for it. Esp. since it makes it harder to interpret the
cdc log.

If an operation in a cdc batch did a row delete (ranged, ck, etc), do
not generate postimage data, since the row does no longer exist.
Note that we differentiate deleting all (non-pk/ck) columns from actual
row delete.
2020-08-26 18:14:09 +00:00
Calle Wilund
5ed3d6892d cdc: Remove stored (postimage) data when doing row delete
Fixes #6900

Clustered range deletes did not clear out the "row_states" data associated
with affected rows (might be many).

Adds a sweep through and erases relevant data. Since we do pre- and
postimage in "order", this should only affect postimage.
2020-08-25 12:27:18 +03:00
Piotr Jastrzebski
f01ce1458f cdc: Preserve metadata columns when geting only keys for delta
Fixes #7095

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-08-25 10:41:54 +03:00
Benny Halevy
2f7c529c1c storage_service: separate get_mutable_token_metadata
Use a different getter for a token_metadata& that
may be changed so we can better synchronize readers
and writers of token_metadata and eventually allow
them to yield in asynchronous loops.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-20 16:20:34 +03:00
Kamil Braun
0d3779e3e6 cdc: rewrite process_changes using inspect_mutation 2020-08-17 15:51:33 +02:00
Kamil Braun
9067f1a4e2 cdc: move some functions out of cdc::transformer
Preparing them to be used outside of `transformer`.
2020-08-17 15:51:33 +02:00
Nadav Har'El
7e01ae089e cdc: avoid including cdc/cdc_options.hh everywhere
Before this patch, modifying cdc/cdc_options.hh required recompiling 264
source files. This is because this header file was included by a couple
other header files - most notably schema.hh, where a forward declaration
would have been enough. Only the handful of source files which really
need to access the CDC options should include "cdc/cdc_options.hh" directly.

After this patch, modifying cdc/cdc_options.hh requires only 6 source files
to be recompiled.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200813070631.180192-1-nyh@scylladb.com>
2020-08-16 14:41:47 +03:00
Calle Wilund
2eb4522fef cdc: Make pre image optionally "full" (include all columns)
Makes the "preimage" option for cdc non-binary, i.e. it can now
be "true"/"on", "false"/"off" or "full. The two former behaving like
previously, the latter obviously including all columns in pre image.
2020-08-12 16:03:06 +00:00
Piotr Jastrzebski
80e3923b3c codebase wide: replace find(...) != end() with contains
C++20 introduced `contains` member functions for maps and sets for
checking whether an element is present in the collection. Previously
the code pattern looked like:

<collection>.find(<element>) != <collection>.end()

In C++20 the same can be expressed with:

<collection>.contains(<element>)

This is not only more concise but also expresses the intend of the code
more clearly.

This commit replaces all the occurences of the old pattern with the new
approach.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <f001bbc356224f0c38f06ee2a90fb60a6e8e1980.1597132302.git.piotr@scylladb.com>
2020-08-11 13:28:50 +03:00
Nadav Har'El
936cf4cce0 merge: Increase row limits
Merged pull request https://github.com/scylladb/scylla/pull/6910
by Wojciech Mitros:

This patch enables selecting more than 2^32 rows from a table. The change
becomes active after upgrading whole cluster - until then old limits are
used.

Tested reading 4.5*10^9 rows from a virtual table, manually upgrading a
cluster with ccm and performing cql SELECT queries during the upgrade,
ran unit tests in dev mode and cql and paging dtests.

  tests: add large paging state tests
  increase the maximum size of query results to 2^64
2020-08-04 19:52:30 +03:00
Kamil Braun
b5f3aef900 cdc: add an abstraction for building log mutations
This commit takes out some responsibilities of `cdc::transformer`
(which is currently a big ball of mud) into a separate class.

This class is a simple abstraction for creating entries in a CDC log
mutation.

Low-level calls to the mutation API (such as `set_cell`) inside
`cdc::transformer` were replaced by higher-level calls to the builder
abstraction, removing some duplication of logic.
2020-08-04 19:37:03 +03:00
Wojciech Mitros
45215746fe increase the maximum size of query results to 2^64
Currently, we cannot select more than 2^32 rows from a table because we are limited by types of
variables containing the numbers of rows. This patch changes these types and sets new limits.

The new limits take effect while selecting all rows from a table - custom limits of rows in a result
stay the same (2^32-1).

In classes which are being serialized and used in messaging, in order to be able to process queries
originating from older nodes, the top 32 bits of new integers are optional and stay at the end
of the class - if they're absent we assume they equal 0.

The backward compatibility was tested by querying an older node for a paged selection, using the
received paging_state with the same select statement on an upgraded node, and comparing the returned
rows with the result generated for the same query by the older node, additionally checking if the
paging_state returned by the upgraded node contained new fields with correct values. Also verified
if the older node simply ignores the top 32 bits of the remaining rows number when handling a query
with a paging_state originating from an upgraded node by generating and sending such a query to
an older node and checking the paging_state in the reply(using python driver).

Fixes #5101.
2020-08-03 17:32:49 +02:00
Nadav Har'El
2dcb6294da merge: cdc: New delta modes: off, keys, fulll
Merged pull request https://github.com/scylladb/scylla/pull/6914
by By Juliusz Stasiewicz:

The goal is to have finer control over CDC "delta" rows, i.e.:

    disable them totally (mode off);
    record only base PK+CK columns (mode keys);
    make them behave as usual (mode full, default).

The editing of log rows is performed at the stage of finishing CDC mutation.

Fixes #6838

  tests: Added CQL test for `delta mode`
  cdc: Implementations of `delta_mode::off/keys`
  cdc: Infrastructure for controlling `delta_mode`
2020-08-03 14:10:15 +03:00
Botond Dénes
92a7b16cba query: read_command: add max_result_size
This field will replace max size which is currently passed once per
established rpc connection via the CLIENT_ID verb and stored as an
auxiliary value on the client_info. For now it is unused, but we update
all sites creating a read command to pass the correct value to it. In the
next patch we will phase out the old max size and use this field to pass
max size on each verb instead.
2020-07-28 18:00:29 +03:00
Botond Dénes
8992bcd1f8 query: read_command: use tagged ints for limit ctor params
The convenience constructor of read_command now has two integer
parameter next to each other. In the next patch we intend to add another
one. This is recipe for disaster, so to avoid mistakes this patch
converts these parameters to tagged integers. This makes sure callers
pass what they meant to pass. As a matter of fact, while fixing up
call-sites, I already found several ones passing `query::max_partitions`
to the `row_limit` parameter. No harm done yet, as
`query::max_partitions` == `query::max_rows` but this shows just how
easy it is to mix up parameters with the same type.
2020-07-28 18:00:29 +03:00
Juliusz Stasiewicz
9e4247090f cdc: Implementations of delta_mode::off/keys
At the stage of `finish`ing CDC mutation, deltas are removed (mode
`off`) or edited to keep only PK+CK of the base table (mode `keys`).

Fixes #6838
2020-07-27 19:05:47 +02:00
Juliusz Stasiewicz
c05128d217 cdc: Infrastructure for controlling delta_mode
The goal is to have finer control over CDC "delta" rows, i.e.:
- disable them totally (mode `off`);
- record only PK+CK (mode `keys`);
- make them behave as usual (mode `full`, default).

This commit adds the necessary infrastructure to `cdc_options`.
2020-07-27 19:00:06 +02:00
Piotr Dulikowski
e2462bce3b cdc: fix a corner case inside get_base_table
It is legal for a user to create a table with name that has a
_scylla_cdc_log suffix. In such case, the table won't be treated as a
cdc log table, and does not require a corresponding base table to exist.

During refactoring done as a part of initial implemetation of of
Alternator streams (#6694), `is_log_for_some_table` started throwing
when trying to check a name like `X_scylla_cdc_log` when there was no
table with name `X`. Previously, it just returned false.

The exception originates inside `get_base_table`, which tries to return
the base table schema, not checking for its existence - which may throw.
It makes more sense for this function to return nullptr in such case (it
already does when provided log table name does not have the cdc log
suffix), so this patch adds an explicit check and returns nullptr when
necessary.

A similar oversight happened before (see #5987), so this patch also adds
a comment which explains why existence of `X_scylla_cdc_log` does not
imply existence of `X`.

Fixes: #6852
Refs: #5724, #5987
2020-07-16 16:38:48 +03:00
Calle Wilund
331aa7c501 cdc: Add "is_cdc_metacolumn_name" predicate
To sift column names
2020-07-15 08:10:23 +00:00
Calle Wilund
8a728ce618 cdc: Add get_base_table helper 2020-07-15 08:10:23 +00:00
Calle Wilund
8f462e8606 CDC::log: Add base_name helper
To extract base table name from CDC log table name.
2020-07-15 08:10:23 +00:00
Piotr Dulikowski
ad811a48bf cdc: fix indentation 2020-07-08 15:36:41 +02:00
Piotr Dulikowski
20b236d27d cdc: don't update partition state when not needed
In some cases, tracking the state of processed rows inside `transformer`
is not needd at all. We don't need to do it if either:

- Preimage and postimage are disabled for the table,
- Only preimage is enabled and we are processing the last timestamp.

This commit disables updating the state in the cases listed above.
2020-07-08 15:36:41 +02:00
Piotr Dulikowski
246f8da6f6 cdc: implement pre/postimage persistence
Moves responsibility for generating pre/postimage rows from the
"process_change" method to "produce_preimage" and "produce_postimage".
This commit actually affects the contents of generated CDC log
mutations.

Added a unit test that verifies more complicated cases with CQL BATCH.
2020-07-08 15:36:41 +02:00
Piotr Dulikowski
24b50ffbc8 cdc: add interface for producing pre/postimages
Introduces new methods to the change_processor interface that will cause
it to produce pre/postimage rows for requested clustering key, or for
static row.

Introduces logic in split.cc responsible for calling pre/postimage
methods of the change_processor interface. This does not have any effect
on generated CDC log mutations yet, because the transformer class has
empty implementations in place of those methods.
2020-07-08 15:36:41 +02:00
Piotr Dulikowski
761c59d92a cdc: load preimage query result into partition state fields
Instead of looking up preimage data directly from the raw preimage query
results, use the raw results to populate current partition state data,
and read directly from the current partition state.
2020-07-08 15:36:41 +02:00
Piotr Dulikowski
946354ee74 cdc: introduce fields for keeping partition state
Introduces data structures that will be used for keeping the current
state of processed rows: _clustering_row_states, and _static_row_state.
2020-07-08 15:36:41 +02:00
Piotr Dulikowski
bb587a93be cdc: rename set_pk_columns -> allocate_new_log_row
The new name better describes what this function does.
2020-07-08 15:36:41 +02:00
Piotr Dulikowski
82ddeb1992 cdc: track batch_no inside transformer
Move tracking of batch_no inside the transformer.
2020-07-08 15:36:41 +02:00
Piotr Dulikowski
7b47f84965 cdc: move cdc$time generation to transformer
Generate the timeuuid on the transformer side, which allows to simplify
the change_processor interface.
2020-07-08 15:36:41 +02:00
Piotr Dulikowski
7691568b0a cdc: move find_timestamp to split.cc
The function is no longer used in log.cc, so instead it is moved to
split.cc.

Removed declaration of the function from the log.hh header, because it
is not used elsewhere - apart from testing code, but it already
declared find_timestamp in the cdc_test.cc file.
2020-07-08 15:36:40 +02:00
Piotr Dulikowski
51d97be0b3 cdc: introduce change_processor interface
This allows for a more refined use of the transformer by the
for_each_change function (now named "process_changes_with_splitting).

The change_processor interface exposes two methods so far:
begin_timestamp, and process_change (previously named "transform").
By separating those two and exposing them, process_changes_with\
_splitting can cause the transformer to generate less CDC log mutations
- only one for each timestamp in the batch.
2020-07-08 15:36:40 +02:00
Piotr Dulikowski
f907cab156 cdc: remove redundant schema arguments from cdc functions
A `mutation` object already has a reference to its schema. It does not
make sense to call functions changed in this commit with a different
schema.
2020-07-08 15:36:40 +02:00
Piotr Dulikowski
fa00ea996a cdc: move management of generated mutations inside transformer
CDC log mutations are now stored inside `transformer`, and only moved to
the final set of mutations at the end of `transformer`'s lifetime.
2020-07-08 15:36:40 +02:00
Piotr Dulikowski
76a323a02d cdc: move preimage result set into a field of transformer
Instead of passing the preimage result set in each `transform` call, it
is now assigned to a field, and `transform` uses that field.
2020-07-08 15:36:40 +02:00
Piotr Dulikowski
79eabc04a8 cdc: keep ts and tuuid inside transformer
Adds a `begin_timestamp` method which tells the `transformer` to start
using the following timestamp and timeuuid when generating new log row
mutations.
2020-07-08 15:36:40 +02:00
Piotr Dulikowski
3c01b3c41d cdc: track touched parts of mutations inside transformer
Moves tracking of the "touched parts" statistics inside the transformer
class.

This commit is the first of multiple commits in this series which move
parts of the state used in CDC log row generation inside the
`transformer` class. There is a lot of state being passed to
`transformer` each time its methods are called, which could be as well
tracked by the `transformer` itself. This will result in a nicer
interface and will allow us to generate less CDC log mutations which
give the same result.
2020-07-08 15:36:40 +02:00
Piotr Dulikowski
027d20c654 cdc: always include preimage for affected rows
This changes the current algorithm so that the preimage row will not be
skipped if the corresponding rows was not present in preimage query
results.
2020-07-08 15:36:40 +02:00
Piotr Sarna
4cb79f04b0 treewide: replace libjsoncpp usage with rjson
In order to eventually switch to a single JSON library,
most of the libjsoncpp usage is dropped in favor of rjson.
Unfortunately, one usage still remains:
test/utils/test_repl utility heavily depends on the *exact textual*
format of its output JSON files, so replacing a library results
in all tests failing because of differences in formatting.
It is possible to force rjson to print its documents in the exact
matching format, but that's left for later, since the issue is not
critical. It would be nice though if our test suite compared
JSON documents with a real JSON parser, since there are more
differences - e.g. libjsoncpp keeps children of the object
sorted, while rapidjson uses an unordered data structure.
This change should cause no change in semantics, it strives
just to replace all usage of libjsoncpp with rjson.
2020-07-03 10:27:23 +02:00
Calle Wilund
5105e9f5e1 cdc::log: Missing "preimage" check in row deletion pre-image
Fixes #6561

Pre-image generation in row deletion case only checked if we had a pre-image
result set row. But that can be from post-image. Also check actual existance
of the pre-image CK.
Message-Id: <20200608132804.23541-1-calle@scylladb.com>
2020-06-09 10:56:41 +03:00