2a67355dedc323b54bd04bab42f7de7589d2b400
The multishard reader has to combine the output of all shards into a single fragment stream. To do that, each time a `partition_start` is read it has to check if there is another partition, from another shard, that has to be emitted before this partition. Currently for this it uses the partitioner. At every partition start fragment it checks if the token falls into the current shard sub-range. The shard sub-range is the continuous range of tokens, where each token belongs to the same shard. If the partition doesn't belong to the current shard sub-range the multishard reader assumes the following shard sub-range of the next shard will have data and move over to it. This assumption will however only stand on very dense tables, and will fail miserably on less dense tables, resulting in the multishard reader effectively iterating over the shard sub-ranges (4096 in the worst case), only to find data in just a few of them. This resulted in high user-perceived latency when scanning a sparse table. This patch replaces this algorithm with one based on a shard heap. The shards are now organized into a min-heap, by the next token they have data for. When a partition start fragment is read from the current shard, its token is compared to the smallest token in the shard heap. If smaller, we continue to read from the current shard. Otherwise we move to the shard with the smallest token. When constructing the reader, or after fast-forwarding we don't know what first token each reader will produce. To avoid reading in a partition from each reader, we assume each reader will produce the first token from the first shard sub-range that overlaps with the query range. This algorithm performs much better on sparse tables, while also being slightly better on dense tables. I did only a very rough measurement using CQL tracing. I populated a table with four rows on a 64 shards machine, then scanned the entire table. Time to scan the table (microseconds): before 27'846 after 5'248 Fixes: #4125 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <d559f887b650ab8caa79ad4d45fa2b7adc39462d.1548846019.git.bdenes@scylladb.com>
…
…
…
…
…
…
Scylla
Quick-start
$ git submodule update --init --recursive
$ sudo ./install-dependencies.sh
$ ./configure.py --mode=release
$ ninja-build -j4 # Assuming 4 system threads.
$ ./build/release/scylla
$ # Rejoice!
Please see HACKING.md for detailed information on building and developing Scylla. Note: GCC >= 8.1.1 is require to compile Scylla.
Running Scylla
- Run Scylla
./build/release/scylla
- run Scylla with one CPU and ./tmp as data directory
./build/release/scylla --datadir tmp --commitlog-directory tmp --smp 1
- For more run options:
./build/release/scylla --help
Building Fedora RPM
As a pre-requisite, you need to install Mock on your machine:
# Install mock:
sudo yum install mock
# Add user to the "mock" group:
usermod -a -G mock $USER && newgrp mock
Then, to build an RPM, run:
./dist/redhat/build_rpm.sh
The built RPM is stored in /var/lib/mock/<configuration>/result directory.
For example, on Fedora 21 mock reports the following:
INFO: Done(scylla-server-0.00-1.fc21.src.rpm) Config(default) 20 minutes 7 seconds
INFO: Results and/or logs in: /var/lib/mock/fedora-21-x86_64/result
Building Fedora-based Docker image
Build a Docker image with:
cd dist/docker
docker build -t <image-name> .
Run the image with:
docker run -p $(hostname -i):9042:9042 -i -t <image name>
Contributing to Scylla
Description
Languages
C++
72.7%
Python
26.1%
CMake
0.3%
GAP
0.3%
Shell
0.3%