scylla

Author	SHA1	Message	Date
Duarte Nunes	d4db043f03	db/view: Start view building after schema agreement If a base table or view has been dropped in one node, but another one hasn't yet learned about it, it starts the view build process immediately on boot, possibly calculating unneeded view updates and causing errors at the view replica, if that replica has already processed the schema changes. We should thus wait for schema agreement, even if the node is a seed. Fixes #3328 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-04-03 13:16:28 +01:00
Duarte Nunes	11ece46f14	db/view: Remove leftover debug statement Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180402175238.5528-1-duarte@scylladb.com>	2018-04-03 09:41:33 +01:00
Duarte Nunes	a45fa8eaa2	db/view/view_builder: Allow synchronizing with the end of a build Intended for use by unit tests, this patch allows synchronizing with the end of a build for a particular view. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-27 01:20:11 +01:00
Duarte Nunes	5f822e3928	db/view/view_builder: Actually build views This patch adds the missing view building code to the eponymous class. We consume from the reader associated with each base table until all its views are built. If the reader reaches the end and there are incomplete views, then a view was added while others were being built. In such cases, we restart the reader to the beginning of the current token, but not to the beginning of the token range, when the view is added. Then, when we exhaust the reader, we simply create a new one for the whole token range, and resume building the pending views. We aim to be resource-conscious. On a given shard, at any given moment, we consume at most from one reader. We also strive for fairness, in that each build step inserts entries for the views of a different base. Each build step reads and generates updates for batch_size rows. We lack a controller, which could potentially allow us to go faster (to execute multiple steps at the same time, or consume more rows per batch), and also which would apply backpressure, so we could, for example, delay executing a build step. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-27 01:20:11 +01:00
Duarte Nunes	a21efeffa0	db/view/view_builder: React to schema changes The view_builder now uses the migration_manager to subscribe to schema change events, and update its bookkeeping accordingly. We prefer this to having the database call into the view_builder, as that would create a cyclic dependency. We serialize changes to the views of a particular base table, such that schema changes do not interfere with the upcoming view building code. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-27 01:20:11 +01:00
Duarte Nunes	901faabaa2	db/view: Introduce view_builder This patch introduces the view_builder class, a sharded service responsible for building all defined materialized views. This process entails walking over the existing data in a given base table, and using it to calculate and insert the respective entries for one or more views. This patch introduces only the bootstrap functionality, which is responsible for loading the data stored in the system tables and filling the in-memory data structures with the relevant information, to be used in subsequent patches for the actual view building. The interaction with the system tables is as follows. Interaction with the tables in system_keyspace: - When we start building a view, we add an entry to the scylla_views_builds_in_progress system table. If the node restarts at this point, we'll consider these newly inserted views as having made no progress, and we'll treat them as new views; - When we finish a build step, we update the progress of the views that we built during this step by writing the next token to the scylla_views_builds_in_progress table. If the node restarts here, we'll start building the views at the token in the next_token column. - When we finish building a view, we mark it as completed in the built views system table, and remove it from the in-progress system table. Under failure, the following can happen: * When we fail to mark the view as built, we'll redo the last step upon node reboot; * When we fail to delete the in-progress record, upon reboot we'll remove this record. A view is marked as completed only when all shards have finished their share of the work, that is, if a view is not built, then all shards will still have an entry in the in-progress system table; - A view that a shard finished building, but not all other shards, remains in the in-progress system table, with first_token == next_token. Interaction with the distributed system table (view_build_status): - When we start building a view, we mark the view build as being in-progress; - When we finish building a view, we mark the view as being built. Upon failure, we ensure that if the view is in the in-progress system table, then it may not have been written to this table. We don't load the built views from this table when starting. When starting, the following happens: * If the view is in the system.built_views table and not the in-progress system table, then it will be in view_build_status; * If the view is in the system.built_views table and not in this one, it will still be in the in-progress system table - we detect this and mark it as built in this table too, keeping the invariant; * If the view is in this table but not in system.built_views, then it will also be in the in-progress system table - we don't detect this and will redo the missing step, for simplicity. View building is necessarily a sharded process. That means that on restart, if the number of shards has changed, we need to calculate the most conservative token range that has been built, and build the remainder. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-27 01:20:10 +01:00
Duarte Nunes	dc44a08370	db/view: Return a future when sending view updates While we now send view mutations asynchronously in the normal view write path, other processes interested in sending view updates, such as streaming or view building, may wish to do it synchronously. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-27 01:20:10 +01:00
Duarte Nunes	b2cae7ea09	db/system_keyspace: Add virtual reader for MV in-progress build status Provide a virtual reader so users can query the in-progress view table in a way compatible with Apache Cassandra. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-27 01:20:10 +01:00
Duarte Nunes	aed28c667c	db/view: Pass pending endpoints to storage_proxy::send_to_endpoint This minimizes the number of mutation copies by just doing a single call to send_to_endpoint(). Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180325121412.76844-2-duarte@scylladb.com>	2018-03-25 15:45:22 +03:00
Duarte Nunes	fb54c09e0b	service/storage_proxy: Pass pending endpoints to send_to_endpoint() This will allow us to minimize the number of mutation copies in mutate_MV(). Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180325121412.76844-1-duarte@scylladb.com>	2018-03-25 15:45:21 +03:00
Nadav Har'El	e9702aa126	Materialized Views: don't lose updates while cluster is changing When the cluster is changed (nodes added or removed), ranges of tokens are moved between nodes. Scylla initiates a streaming process between an old and a new owner of the range, which can take a long time. During that streaming time, the new owner of the range is known as a "pending node" for this range, and all updates must go to both the old owner (in case the movement fails!) and the pending node (in case the movement succeeds). For materialized views, because they are ordinary tables, streaming moves all the view's data that existed before the streaming started. But we did not send updates done to the view during the streaming. A dtest demonstrates that the new node will miss some of the view update, and will require a repair of the view tables immediately after the cluster change ends, which is not good. To fix that, we need to send every new update that happens during the streaming also to the "pending node". We already did this properly for base-table updates, but not to the view updates: Each base table replica wrote to only one paired view table replica, and nobody wrote to the new pending node (in case where there is one, for the particular view token involved). In this patch, we make sure that all view updates go also to the "pending nodes" when there are any. We do the same thing that Cassandra does, which is - all base replicas write the update to the pending node(s). Arguably, it is inefficient that all replicas send the update to the same node. In most cases it is enough to send it from just one base replica - the one who is slated to be the new node's pair. I opened https://issues.apache.org/jira/browse/CASSANDRA-14262 about this idea. But that is an optimization. The patch as-is already fixes the bug. Fixes #3211 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180313171853.17283-1-nyh@scylladb.com>	2018-03-16 12:00:29 +00:00
Nadav Har'El	31d0a1dd0c	Materialized views: implement row and partition locking mechanism This patch adds a "row_locker" class providing locking (shard-locally) of individual clustering rows or entire partitions, and both exclusive and shared locks (a.k.a. reader/writer lock). As we'll see in a following patch, we need this locking capability for materialized views, to serialize the read-modify-update modifications which involve the same rows or partitions. The new row_locker is significantly different from the existing cell_locker. The two main differences are that 1. row_locker also supports locking the entire partition, not just individual rows (or cells in them), and that 2. row_locker supports also shared (reader) locks, not just exclusive locks. For this reason we opted for a new implementation, instead of making large modificiations to the existing cell_locker. And we put the source files in the view/ directory, because row_locker's requirements are pretty specific to the needs of materialized views. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-01-30 16:16:27 +02:00
Piotr Jastrzebski	96c97ad1db	Rename streamed_mutation* files to mutation_fragment* Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:49 +01:00
Piotr Jastrzebski	4c74b8c7e7	Migrate materalized views to flat_mutation_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-18 07:32:35 +01:00
Duarte Nunes	b607662d2e	collection_type_impl: Make for_each_cell static Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180115013532.67200-1-duarte@scylladb.com>	2018-01-15 11:16:33 +02:00
George Tavares	ceecd542cd	db/view: Consume updated rows regardless of static row Using Materialized Views, if the base table has static columns, and the update in base table mutates static and non static rows, the streamed_mutation is stopped before process non static row. The patch avoids stopping the stream_mutation and adds a test case. Message-Id: <20171220173434.25091-1-tavares.george@gmail.com>	2017-12-21 00:49:15 +01:00
Duarte Nunes	115ff1095e	db/view: Use view schema for view pk operations Instead of base schema. Fixes #2504 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170718190703.12972-1-duarte@scylladb.com>	2017-07-19 09:59:34 +02:00
Avi Kivity	ebaeefa02b	Merge seatar upstream (seastar namespace) - introcduced "seastarx.hh" header, which does a "using namespace seastar"; - 'net' namespace conflicts with seastar::net, renamed to 'netw'. - 'transport' namespace conflicts with seastar::transport, renamed to cql_transport. - "logger" global variables now conflict with logger global type, renamed to xlogger. - other minor changes	2017-05-21 12:26:15 +03:00
Duarte Nunes	bad0edb23b	db/view: Re-implement clustering_prefix_matches() This patch implements clustering_prefix_matches() in terms of abstract_restriction::is_satisfied_by() instead of ranges, which supports filtering just a subset of the clustering columns. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-05-17 10:33:19 +02:00
Duarte Nunes	b0d1ea76a2	db/view: Re-implement partition_key_matches() This patch implements partition_key_matches() in terms of abstract_restriction::is_satisfied_by() instead of ranges, which supports filtering just a component of a compound partition key. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-05-17 10:33:19 +02:00
Duarte Nunes	38be85a21d	db/view: Generate regular tombstone for base deletions Instead of shadowable tombstones, which only apply to updates. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-05-17 10:33:19 +02:00
Duarte Nunes	1fd8b8e723	db/view: Consider cell liveness when generating updates This patch ensures we take into account the liveness of the base's regular column in the view's pk when generating view updates. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-05-17 10:33:19 +02:00
Duarte Nunes	c421da6825	db/view: Don't generate view updates for static rows Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-05-17 10:33:19 +02:00
Duarte Nunes	983af595e9	database: Read existing base mutations When generating updates for a materialized view we need to read the existing base row, to be able to determine the primary key of the view row the new base update will supplant, in case the view includes a base non-primary key column in its own primary key. That old view row will be tombstoned or updated, if it exists, depending on the difference between the new base row and the existing one, if any. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-05-17 10:33:19 +02:00
Duarte Nunes	8a77bfe35b	db/view: Calculate clustering ranges for MV read-before-write query Introduce the calculate_affected_clustering_ranges() function to calculate the smallest subject of affected clustering ranges that we need to query for. The update_requires_read_before_write() function checks whether a view is potentially affected by the base update. The patch also cleans up the may_be_affected_by() function. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-05-17 10:33:19 +02:00
Duarte Nunes	ec681060a8	db/view: Replace entry if cells don't match If a base table regular columns is part of the view's pk, and if that column changes, we should replace the entry, by deleting the row(s) with the old value and inserting a new one. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-05-17 10:33:19 +02:00
Duarte Nunes	f41a5e554d	view_info: Store base regular col in the view's PK as column_id This patch stores the base_non_pk_column_in_view column as column_id, which is more convenient, and it also stores a two-level optional to encode both lazy initialization and the absence of such a column. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-05-17 10:33:18 +02:00
Calle Wilund	0e6ae8dec2	schema: rename column accessors to be in line with origin More pointedly: Expose columns as is (currently all_columns_in_select_order), expose name->column mapping more appropriately named. Renaming like this is not strictly neccesary, but there is a point to trying to keep nomenclature similar-ish with origin, esp. when select order column need to become filtered (spoiler alert).	2017-05-10 16:44:48 +00:00
Duarte Nunes	4e693383f7	mutation_partion: Use row_tombstone This patch replaces the current row tombstone representation by a row_tombstone. The intent of the patch is thus to reify the idea of shadowable tombstones, that up until now we considered all materialized view row tombstones to be. We need to distinguish shadowable from non-shadowable row tombstones to support scenarios such as, when inserting to a table with a materialzied view: 1. insert into base (p, v1, v2) values (3, 1, 3) using timestamp 1 2. delete from base using timestamp 2 where p = 3 3. insert into base (p, v1) values (3, 1) using timestamp 3 These should yield a view row where v2 is definitely null, but with the current implementation, v2 will pop back with its value v2=3@TS=1, even though its dead in the base row. This is because the row tombstone inserted at 2) is a shadowable one. This patch only addresses the memory representation of such row_tombstones. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:46:33 +02:00
Duarte Nunes	bfb8a3c172	materialized views: Replace db::view::view class The write path uses a base schema at a particular version, and we want it to use the materialized views at the corresponding version. To achieve this, we need to map the state currently in db::view::view to a particular schema version, which this patch does by introducing the view_info class to hold the state previously in db::view::view, and by having a view schema directly point to it. The changes in the patch are thus: 1) Introduce view_info to hold the extra view state; 2) Point to the view_info from the schema; 3) Make the functions in the now stateless db::view::view non-member; 4) Remove the db::view::view class. All changes are structural and don't affect current behavior. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-03-15 15:50:05 +01:00
Paweł Dziepak	354ce0b2c7	mutation_fragment: make write access more explicit mutation_fragments are going to be caching their size in memory. In order to be able to invalidate that correctly, they need to know when that size may change (but avoid invalidation when it is not necessary).	2017-02-09 10:49:46 +00:00
Nadav Har'El	3ae73164a4	materialized views: partial mutate_MV This adds a function mutate_MV() which takes view mutations and sends them to the appropriate nodes (this may be the current node, or a remote node). This is only a partial implementation - we still don't do the local batch log (to survive reboots and failures) and some other stuff which is left commented out. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-02-06 13:36:45 +01:00
Nadav Har'El	365df8f900	materialized views: match base and view replicas A function to find the appropriate replica to send a view update to. This patch creates a new source file db/view/view.cc. We should eventually move a lot more of the materialized views code there. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2017-02-06 13:36:45 +01:00
Duarte Nunes	d5a61a8c48	view: Add view_update_builder class This patch adds the view_update_builder class, which is responsible for calculating the mutations to apply to a column family's materialized views, given a streamed_mutation representing an update to the base table and a streamed_mutation representing the pre-existing rows which the update covers. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-02-06 13:36:45 +01:00
Duarte Nunes	3991a58f08	view_updates: Generate updates This patch adds the view_updates::generate_update() function to generate view updates given a base row update and the corresponding, pre-existing row. This function will decide which of the previously introduced functions to call based on whether there is a pre-existing row and whether there exists a regular base column that's part of the view's PK. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-02-06 13:36:45 +01:00
Duarte Nunes	861d2dfb61	view_updates: Adds function to replace row This patch adds a function to replace a view row given a base table update and the pre-existing row, which simply deletes the old view entry and adds a new one. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-02-06 13:36:45 +01:00
Duarte Nunes	7901ce7de4	view_updates: Update view entry This patch introduces the view_updates::update_entry function, which creates the updates to apply to the existing view entry given the base table row before and after the update. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-02-06 13:36:45 +01:00
Duarte Nunes	b34ae6d6da	view_updates: Delete old view entry This patch introduces the view_updates::delete_old_entry function, which creates a view row mutation to delete an entry given an updated base table row. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-02-06 13:36:45 +01:00
Duarte Nunes	e0f642180f	view_updates: Create view entry This patch introduces the view_updates::create_entry function, which creates a view row mutation given a new base table row. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-02-06 13:35:31 +01:00
Duarte Nunes	b8b8a8099c	view_updates: Compute row marker This patch adds a function to compute the row marker of a view row given the base row. There are two cases to consider when building the row marker: 1) there is a column C that is a regular base column but is in the view PK; and 2) the columns for the base and the view PKs are the same. For 1), the view row marker timestamp will be the biggest between the base's row marker and C. The TTL will be that of C. This means that if C expires, the view row maker will expire as well (and the row, if no other column is keeping it alive). Note that if the base row marker expires but not C, then the base row will still be live due to C and we shouldn't expire the view row. For 2), the view row timestamp will be the same as the base row timestamp. The TTL should be set in such a way that both base and view rows live for the same time. We thus set the view row TTL to be the max of any other TTL in the base row. This is particularly important in the case where the base row marker has a TTL, but a column absent from the view holds a greater one. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-02-06 13:35:31 +01:00
Duarte Nunes	7321938bcf	view: Introduce view_updates class This patch introduces the view_updates class, which is responsible for generating and storing updates to a particular materialized view. The updates will be generated from an updated base row and the pre-existing one (if any), in later patches. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-02-06 13:35:31 +01:00
Duarte Nunes	082ef56df1	view: Store pk view column that's non-pk in the base To help calculate the view mutations from a base update, we store in the view class the column that's part of the view's primary key but not part of the base's, if such column exists. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-02-06 13:35:30 +01:00
Duarte Nunes	734ad80390	view: Add matches_view_filter() function This patch adds the matches_view_filter() function which specifies whether a given base row matches the view filter. Unlike may_be_affected_by(), this function has no false positives. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-02-06 13:35:30 +01:00
Duarte Nunes	21d1bbb527	view: Add may_be_affected_by function This patch adds the may_be_affected_by() function to the view class, which is responsible to determine whether an update to a base class affects one of its views. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-02-06 13:35:30 +01:00
Duarte Nunes	124802e196	cql3: Add function to build view's select statement This patch adds an utility function that creates a raw select statement from a set of columns and a where clause. It is intended to be used to create the prepared select statement used by the view class. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-20 13:06:11 +00:00
Duarte Nunes	5bd74abee8	create_view_statement: Implement check_access This patch implements check_access according to Cassandra's implementation. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-20 13:06:11 +00:00
Duarte Nunes	7818339791	materialized views: Add view class This patch adds the view class, which will contains functions related to populating a view, either from the base table's write path or from the view building mechanism which copies over already existing data in the base table. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-20 13:06:11 +00:00

... 5 6 7 8 9

447 Commits