scylla

Author	SHA1	Message	Date
Nadav Har'El	db8d4a0cc6	Add computed columns Merged patch series by Piotr Sarna: This series introduces the concept of "computed" column, which represents values not provided directly by the user, but computed on the fly - possibly using other column values. It will be used in the future to implement map value indexing, collection indexing, etc. Right now the only use is the token column for secondary indexes - which is a column computed from the base partition key value. After this series, another one that depends on it and adds map value indexing will be pushed. Tests: unit(dev) Piotr Sarna (14): schema: add computed info to column definition schema: add implementation of computing token column schema: allow marking columns as computed in schema builder service: add computed columns feature view: check for computed columns in view view: remove unused token_for function database: add fixing previous secondary index schemas tests: disable computed columns feature in schema change test tests: add schema change test regeneration comment db: add system_schema.computed_columns docs: init system_schema_keyspace.md with column computations tests: generate new test case for schema change + computed cols index: mark token column as 'computed' when creating mv tests: add checking computed columns in SI column_computation.hh \| 63 ++++++++ db/schema_features.hh \| 4 +- db/schema_tables.hh \| 4 + idl/frozen_schema.idl.hh \| 1 + schema.hh \| 40 +++++ schema_builder.hh \| 4 +- schema_mutations.hh \| 18 ++- service/storage_service.hh \| 8 + view_info.hh \| 2 - database.cc \| 6 +- db/schema_tables.cc \| 146 ++++++++++++++++-- db/view/view.cc \| 46 +++--- index/secondary_index_manager.cc \| 2 +- schema.cc \| 58 ++++++- schema_mutations.cc \| 14 +- service/storage_service.cc \| 5 + tests/schema_change_test.cc \| 63 ++++++-- tests/secondary_index_test.cc \| 28 ++++ docs/system_schema_keyspace.md \| 40 +++++ plus about 200 new test sstable files	2019-07-21 13:05:46 +03:00
Piotr Sarna	4d1eaf8478	tests: add checking computed columns in SI The test case checks if token column generated for global indexing is indeed only present in global indexes and is marked as a computed column.	2019-07-19 11:58:42 +02:00
Piotr Sarna	a8f7d64a08	index: mark token column as 'computed' when creating mv Secondary indexes use a computed token column to preserve proper query ordering. This column is now marked as 'computed'.	2019-07-19 11:58:42 +02:00
Piotr Sarna	1c0ef5f9e9	tests: generate new test case for schema change + computed cols The original "test_schema_digest_does_not_change" test case ensures that schema digests will match for older nodes that do not support all the features yet (including computed columns). The additional case uses sstables generated after computed columns are allowed, in order to make sure that the digest computed including computed columns does not change spuriously as well.	2019-07-19 11:58:42 +02:00
Piotr Sarna	1e54752167	docs: init system_schema_keyspace.md with column computations The documentation file for system_schema keyspace is introduced, and its first entry describes the column_computation table.	2019-07-19 11:58:42 +02:00
Piotr Sarna	c1d5aef735	db: add system_schema.computed_columns Information on which columns of a table are 'computed' is now kept in system_schema.computed_columns system table.	2019-07-19 11:58:42 +02:00
Piotr Sarna	589200f5a2	tests: add schema change test regeneration comment Schema change test might need regenerating every time a system table is added. In order to save future developer's time on debugging this test, a short description of that requirement is added.	2019-07-19 11:58:42 +02:00
Piotr Sarna	03ade01db7	tests: disable computed columns feature in schema change test In order to make sure that old schema digest is not recomputed and can be verified - computed columns feature is initially disabled in schema_change_test. The reason for that is as follows: running CQL test env assumes that we are running the newest cluster with all features enabled. However, the mere existence of some features might influence digest calculation. So, in order for the existing test to work correctly, it should have exactly the same set of cluster supported features as it had during its creation. It used to be "all features", but now it's "all features except computed columns". One can think of that as running a cluster with some nodes not yet knowing what computed columns are, so they are not taken into account when computing digests. Additionally, a separate test case that takes computed column digest into account will be generated and added in this series.	2019-07-19 11:58:42 +02:00
Piotr Sarna	17c323c096	database: add fixing previous secondary index schemas If a schema was created before computed columns were implemented, its token column may not have been marked as computed. To remedy this, if no computed column is found, the schema will be recreated. The code will work correctly even without this patch in order to support upgrading from legacy versions, but it's still important: it transforms token columns from the legacy format to new computed format, which will eventually (after a few release cycles) allow dropping the support for legacy format altogether.	2019-07-19 11:58:42 +02:00
Piotr Sarna	3c5dd94306	view: remove unused token_for function The function was only used once in code removed in this series.	2019-07-19 11:58:42 +02:00
Piotr Sarna	6a6871aa0e	view: check for computed columns in view Currently, having a 'computed' column in view update generation indicates that token value needs to be generated and assigned to it.	2019-07-19 11:58:42 +02:00
Piotr Sarna	a0e02df36a	service: add computed columns feature Computed columns feature should be checked before creating index schemas the new way - by adding computed column names to system_schema.computed_columns.	2019-07-19 11:58:42 +02:00
Piotr Sarna	a1100e3737	schema: allow marking columns as computed in schema builder In order to be able to transform legacy materialized view definitions, builder is now able to mark an existing column as computed.	2019-07-19 11:58:41 +02:00
Piotr Sarna	65bf6d34fe	schema: add implementation of computing token column Computed column of 'token' type can now have its value computed.	2019-07-19 11:47:48 +02:00
Piotr Sarna	491b7a817f	schema: add computed info to column definition Some columns may represent not user-provided values, but ones computed from other columns. Currently an example is token column used in secondary indexes to provide proper ordering. In order to avoid hardcoding special cases in execution stage, optional additional information for computed columns is stored in column definition.	2019-07-19 11:47:46 +02:00
Tomasz Grabiec	7604980d63	database: Add missing partition slicing on streaming reader recreation streaming_reader_lifecycle_policy::create_reader() was ignoring the partition_slice passed to it and always creating the reader for the full slice. That's wrong because create_reader() is called when recreating a reader after it's evicted. If the reader stopped in the middle of partition we need to start from that point. Otherwise, fragments in the mutation stream will appear duplicated or out of ordre, violating assumptions of the consumers. This was observed to result in repair writing incorrect sstables with duplicated clustering rows, which results in malformed_sstable_exception on read from those sstables. Fixes #4659. In v2: - Added an overload without partition_slice to avoid changing existing users which never slice Tests: - unit (dev) - manual (3 node ccm + repair) Backport: 3.1 Reviewd-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1563451506-8871-1-git-send-email-tgrabiec@scylladb.com>	2019-07-18 18:35:28 +03:00
Asias He	64a4c0ede2	streaming: Do not open rpc stream connection if ranges are not relevant to a shard Given a list of ranges to stream, stream_transfer_task will create an reader with the ranges and create a rpc stream connection on all the shards. When user provides ranges to repair with -st -et options, e.g., using scylla-manger, such ranges can belong to only one shard, repair will pass such ranges to streaming. As a result, only one shard will have data to send while the rpc stream connections are created on all the shards, which can cause the kernel run out of ports in some systems. To mitigate the problem, do not open the connection if the ranges do not belong to the shard at all. Refs: #4708	2019-07-18 18:31:21 +03:00
Avi Kivity	51cff8ad23	Merge "Fix storage service for tests" from Botond " Fix another source of flakyness in mutation_reader_test. This one is caused by storage_service_for_tests lacking a config::broadcast_to_all_shards() call, triggering an invalid memory access (or SEGFAULT) when run on more than one shards. Refs: #4695 " * 'fix_storage_service_for_tests' of https://github.com/denesb/scylla: tests: storage_service_for_tests: broadcast config to all shards tests: move storage_service_for_tests impl to test_services.cc	2019-07-18 18:27:47 +03:00
Nadav Har'El	997b92a666	migration_manager: allow dropping table and all its views The function announce_column_family_drop() drops (deletes) a base table and all the materialized-views used for its secondary indexes, but not other materialized views - if there are any, the operation refuses to continue. This is exactly what CQL's "DROP TABLE" needs, because it is not allowed to drop a table before manually dropping its views. But there is no inherent reason why it we can't support an operation to delete a table and all its views - not just those related to indexes. This patch adds such an option to announce_column_family_drop(). This option is not used by the existing CQL layer, but can be used by other code automating operations programatically without CQL. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190716150559.11806-1-nyh@scylladb.com>	2019-07-18 13:26:25 +02:00
Takuya ASADA	bd7d1b2d38	dist/common/systemd: change stop timeout sec to 900s Currently scylla-server.service uses DefaultTimeoutStopSec = 90, if Scylla does not able to clean-shutdown in 90sec we may have data corruption on the node. Since we already set TimeoutStartSec = 900, we can use TimeoutSec to set both TimeoutStartSec and TimeoutStopSec to 900. See #4700 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190717095416.10652-1-syuu@scylladb.com>	2019-07-17 15:37:47 +03:00
Nadav Har'El	759752947b	drop_index_statement: fix column_family() All statement objects which derive from cf_statement, including drop_index_statement, have a column_family() returning the name of the column family involved in this statement. For most statement this is known at the time of construction, because it is part of the statement, but for "DROP INDEX", the user doesn't specify the table's name - just the index name. So we need to override column_family() to find the table name. The existing implementation assert()ed that we can always find such a table, but this is not true - for example, in a DROP INDEX with "IF EXISTS", it is perfectly fine for no such table to exist. In this case we don't want a crash, and not even an except - it's fine that we just return an empty table name. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190716180104.15985-1-nyh@scylladb.com>	2019-07-17 09:44:47 +03:00
Kamil Braun	4417e78125	Fix timestamp_type_impl::timestamp_from_string. Now it accepts the 'z' or 'Z' timezone, denoting UTC+00:00. Fixes #4641. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-07-16 19:16:56 +03:00
Asias He	722ab3bb65	repair: Log repair id in check_failed_ranges Add the word `id` before the repair id in the log. It makes the log easier to figure out what the number stands for.	2019-07-16 19:10:19 +03:00
Avi Kivity	43690ecbdf	Merge "Fix disable_sstable_write synchronization with on_compaction_completion" from Benny " disable_sstable_write needs to acquire _sstable_deletion_sem to properly synchronize with background deletions done by on_compaction_completion to ensure no sstables will be created or deleted during reshuffle_sstables after storage_service::load_new_sstables disables sstable writes. Fixes #4622 Test: unit(dev), nodetool_additional_test.py migration_test.py " * 'scylla-4622-fix-disable-sstable-write' of https://github.com/bhalevy/scylla: table: document _sstables_lock/_sstable_deletion_sem locking order table: disable_sstable_write: acquire _sstable_deletion_sem table: uninline enable_sstable_write table: reshuffle_sstables: add log message	2019-07-16 19:06:58 +03:00
Amnon Heiman	399d79fc6f	init: do not allow replace-address for seeds If a node is a seed node, it can not be started with replace-address-first-boot or the replace-address flag. The issue is that as a seed node it will generate new tokens instead of replacing the existing one the user expect it to replaec when supplying the flags. This patch will throw a bad_configuration_error exception in this case. Fixes #3889 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-07-16 18:53:19 +03:00
Calle Wilund	dbc3499fd1	server: Fix cql notification inet address serialization Fixes #4717 Bug in ipv6 support series caused inet_address serialization to include an additional "size" parameter in the address chunk. Message-Id: <20190716134254.20708-1-calle@scylladb.com>	2019-07-16 16:51:59 +03:00
Botond Dénes	b40cf1c43d	tests: storage_service_for_tests: broadcast config to all shards Due to recent changes to the config subsystem, configuration has to be broadcast to all shards if one wishes to use it on them. The `storage_service_for_tests` has a `sharded<gms::gossiper>` member, which reads config values on initialization on each shard, causing a crash as the configuration was initialized only on shard 0. Add a call to `config::broadcast_to_all_shards()` to ensure all shards have access to valid config values.	2019-07-16 10:37:17 +03:00
Botond Dénes	fc9f46d7c1	tests: move storage_service_for_tests impl to test_services.cc Let's make it easier to find.	2019-07-16 10:36:49 +03:00
Paweł Dziepak	060e3f8ac2	mutation_partition: verify row::append_cell() precondition row::append_cell() has a precondition that the new cell column id needs to be larger than that of any other already existing cell. If this precondition is violated the row will end up in an invalid state. This patch adds assertion to make sure we fail early in such cases.	2019-07-15 23:25:06 +02:00
Botond Dénes	5f22771ea8	tests/mutation_reader_test stabilize test_multishard_combining_reader_non_strictly_monotonic_positions Currently the test_multishard_combining_reader_non_strictly_monotonic_positions is flaky. The test is somewhat unconventional, in that it doesn't use the same instance of data as the input to the test and as it's expected output, instead it invokes the method which generates this data (`make_fragments_with_non_monotonic_positions()`) twice, first to generate the input, and a secondly to generate the expected output. This means that the test is prone to any deviation in the data generated by said method. One such deviation, discovered recently, is that the method doesn't explicitly specify the deletion time of the generated range tombstones. This results in this deletion time sometimes differing between the test input and the expected output. Solve by explicitly passing the same deletion time to all created range tombstones. Refs: #4695	2019-07-15 23:24:16 +02:00
Tomasz Grabiec	14700c2ac4	Merge "Fix the system.size_estimates table" from Kamil Fixes a segfault when querying for an empty keyspace. Also, fixes an infinite loop on smp > 1. Queries to system.size_estimates table which are not single-partition queries caused Scylla to go into an infinite loop inside multishard_combining_reader::fill_buffer. This happened because multishard_combinind_reader assumes that shards return rows belonging to separate partitions, which was not the case for size_estimates_mutation_reader. Fixes #4689.	2019-07-15 22:09:30 +02:00
Asias He	8774adb9d0	repair: Avoid deadlock in remove_repair_meta Start n1, n2 Create ks with rf = 2 Run repair on n2 Stop n2 in the middle of repair n1 will notice n2 is DOWN, gossip handler will remove repair instance with n2 which calls remove_repair_meta(). Inside remove_repair_meta(), we have: ``` 1 return parallel_for_each(*repair_metas, [repair_metas] (auto& rm) { 2 return rm->stop(); 3 }).then([repair_metas, from] { 4 rlogger.debug("Removed all repair_meta for single node {}", from); 5 }); ``` Since 3.1, we start 16 repair instances in parallel which will create 16 readers.The reader semaphore is 10. At line 2, it calls ``` 6 future<> stop() { 7 auto gate_future = _gate.close(); 8 auto writer_future = _repair_writer.wait_for_writer_done(); 9 return when_all_succeed(std::move(gate_future), std::move(writer_future)); 10 } ``` The gate protects the reader to read data from disk: ``` 11 with_gate(_gate, [] { 12 read_rows_from_disk 13 return _repair_reader.read_mutation_fragment() --> calls reader() to read data 14 }) ``` So line 7 won't return until all the 16 readers return from the call of reader(). The problem is, the reader won't release the reader semaphore until the reader is destroyed! So, even if 10 out of the 16 readers have finished reading, they won't release the semaphore. As a result, the stop() hangs forever. To fix in short term, we can delete the reader, aka, drop the the repair_meta object once it is stopped. Refs: #4693	2019-07-15 21:51:57 +02:00
Benny Halevy	0e4567c881	table: document _sstables_lock/_sstable_deletion_sem locking order Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-15 19:20:35 +03:00
Calle Wilund	1ed9a44396	utils::config_file: Propagare broadcast_to_all_shards to dependent files Fixes #4713 Modifying config files to use sharded storage misses the fact that extensions are allowed to add non-member config fields to the main configuration, typically from "extra" config_file objects. Unless those "extra" files are broadcast when main file broadcast, the values will not be readable from other shards. This patch propagates the broadcast to all other config files whose entries are in the top level object. This ensures we always keep data up to date on config reload. Message-Id: <20190715135851.19948-1-calle@scylladb.com>	2019-07-15 17:02:09 +03:00
Nadav Har'El	9cc9facbea	configure.py: atomically overwrite build.ninja configure.py currently takes some time to write build.ninja. If the user interrupts (e.g., control-C) configure.py, it can leave behind a partial or even empty build.ninja file. This is most frustrating when the user didn't explicitly run "configure.py", but rather just ran "ninja" and ninja decided to run configure.py, and after interrupting it the user cannot run "ninja" again because build.ninja is gone. Another result of losing build.ninja is that the user now needs to remember which parameters to run "configure.py", because the old ones stored in build.ninja were lost. The solution in this patch is simple: We write the new build.ninja contents into a temporary file, not directly into build.ninja. Then, only when the entire file has been succesfully written, do we rename the temporary file to its intended name - build.ninja. Fixes #4706 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190715122129.16033-1-nyh@scylladb.com>	2019-07-15 15:34:48 +03:00
Eliran Sinvani	997a146c7f	auth: Prevent race between role_manager and pasword_authenticator When scylla is started for the first time with PasswordAuthenticator enabled, it can be that a record of the default superuser will be created in the table with the can_login and is_superuser set to null. It happens because the module in charge of creating the row is the role manger and the module in charge of setting the default password salted hash value is the password authenticator. Those two modules are started together, it the case when the password authenticator finish the initialization first, in the period until the role manager completes it initialization, the row contains those null columns and any loging attempt in this period will cause a memory access violation since those columns are not expected to ever be null. This patch removes the race by starting the password authenticator and autorizer only after the role manger finished its initialization. Tests: 1. Unit tests (release) 2. Auth and cqlsh auth related dtests. Fixes #4226 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <20190714124839.8392-1-eliransin@scylladb.com>	2019-07-14 16:19:57 +03:00
Rafael Ávila de Espíndola	67c624d967	Add documentation for large_rows and large_cells Fixes #4552 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190614151907.20292-1-espindola@scylladb.com>	2019-07-12 19:21:26 +03:00
Amnon Heiman	1c6dec139f	API: compaction_manager add get pending tasks by table The pending tasks by table name API return an array of pending tasks by keyspace/table names. After this patch the following command would work: curl -X GET 'http://localhost:10000/compaction_manager/metrics/pending_tasks_by_table' Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-07-12 19:21:26 +03:00
Takuya ASADA	842f75d066	reloc: provide libthread_db.so.1 to debug thread on gdb In scylla-debuginfo package, we have /usr/lib/debug/opt/scylladb/libreloc/libthread_db-1.0.so-666.development-0.20190711.73a1978fb.el7.x86_64.debug but we actually does not have libthread_db.so.1 in /opt/scylladb/libreloc since it's not available on ldd result with scylla binary. To debug thread, we need to add the library in a relocatable package manually. Fixes #4673 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190711111058.7454-1-syuu@scylladb.com>	2019-07-12 19:21:26 +03:00
Piotr Sarna	ac7531d8d9	db,hints: decouple in-flight hints limits from resource manager The resource manager is used to manage common resources between various hints managers. In-flight hints used to be one of the shared resources, but it proves to cause starvation, when one manager eats the whole limit - which may be especially painful if the background materialized views hints manager starves the regular hints manager, which can in turn start failing user writes because of admission control. This patch makes the limit per-manager again, which effectively reverts the limit to its original behavior. Fixes #4483 Message-Id: <8498768e8bccbfa238e6a021f51ec0fa0bf3f7f9.1559649491.git.sarna@scylladb.com>	2019-07-12 19:21:26 +03:00
Rafael Ávila de Espíndola	4e7ffb80c0	cql: Fix use of UDT in reversed columns We were missing calls to underlying_type in a few locations and so the insert would think the given literal was invalid and the select would refuse to fetch a UDT field. Fixes #4672 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190708200516.59841-1-espindola@scylladb.com>	2019-07-12 19:21:26 +03:00
Kamil Braun	60a4867a5b	Fix infinite looping when performing a range query on system.size_estimates. Queries to system.size_estimates table which are not single parition queries caused Scylla to go into an infinite loop inside multishard_combining_reader::fill_buffer. This happened because multishard_combinind_reader assumes that shards return rows belonging to separate partitions, which was not the case for size_estimates_mutation_reader. This commit fixes the issue and closes #4689.	2019-07-12 18:09:15 +02:00
Kamil Braun	ba5a02169e	Fix segmentation fault when querying system.size_estimates for an empty keyspace.	2019-07-12 18:02:10 +02:00
Kamil Braun	a1665b74a9	Refactor size_estimates_virtual_reader Move the implementation of size_estimates_mutation_reader to a separate compilation unit to speed up compilation times and increase readability. Refactor tests to use seastar::thread.	2019-07-12 17:53:00 +02:00
Benny Halevy	6dad9baa1c	table: disable_sstable_write: acquire _sstable_deletion_sem `disable_sstable_write` needs to acquire `_sstable_deletion_sem` to properly synchronize with background deletions done by `on_compaction_completion` to ensure no sstables will be created or deleted during `reshuffle_sstables` after `storage_service::load_new_sstables` disables sstable writes. Fixes #4622 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-11 12:14:44 +03:00
Benny Halevy	bbbd749f70	table: uninline enable_sstable_write Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-11 12:14:44 +03:00
Benny Halevy	c6bad3f3c2	table: reshuffle_sstables: add log message To mark the point in time writes are disabled and scanning of the data directory is beginning. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-11 12:14:44 +03:00
Rafael Ávila de Espíndola	281f3a69f8	mc writer: Fix exception safety when closing _index_writer This fixes a possible cause of #4614. From the backtrace in that issue, it looks like a file is being closed twice. The first point in the backtrace where that seems likely is in the MC writer. My first idea was to add a writer::close and make it the responsibility of the code using the writer to call it. That way we would move work out of the destructor. That is a bit hard since the writer is destroyed from flat_mutation_reader::impl::~consumer_adapter and that would need to get a close function too. This patch instead just fixes an exception safety issue. If _index_writer->close() throws, _index_writer is still valid and ~writer will try to close it again. If the exception was thrown after _completed.set_value(), that would explain the assert about _completed.set_value() being called twice. With this patch the path outside of the destructor now moves the writer to a local variable before trying to close it. Fixes #4614 Message-Id: <20190710171747.27337-1-espindola@scylladb.com>	2019-07-10 19:27:19 +02:00
Paweł Dziepak	eb7d17e5c5	lsa: make sure align_up_for_asan() doesn't cause reads past end of segment In debug mode the LSA needs objects to be 8-byte aligned in order to maximise coverage from the AddressSanitizer. Usually `close_active()` creates a dummy objects that covers the end of the segment being closed. However, it the last real objects ends in the last eight bytes of the segment then that dummy won't be created because of the alignment requirements. This broke exit conditions on loops trying to read all objects in the segment and caused them to attempt to dereference address at the end of the segment. This patch fixes that. Fixes #4653.	2019-07-10 19:19:24 +02:00
Avi Kivity	e32bdb6b90	Merge "Warn user about using SimpleStrategy with Multi DC deployment" from Kamil " If the user creates a keyspace with the 'SimpleStrategy' replication class in a multi-datacenter environment, they will receive a warning in the CQL shell and in the server logs. Resolves #4481 and #4651. " * 'multidc' of https://github.com/kbr-/scylla: Warn user about using SimpleStrategy with Multi DC deployment Add warning support to the CQL binary protocol implementation	2019-07-10 16:47:07 +03:00

1 2 3 4 5 ...

19003 Commits