scylla

Author	SHA1	Message	Date
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Takuya ASADA	6f1fff58ba	dist: drop legacy control group parameters Since we dropped CentOS7 support, now we can drop legacy control group parameters which is deprecated on systemd v252.	2023-12-11 19:38:28 +09:00
Takuya ASADA	6d7cb97645	dist: move AmbientCapabilities to scylla-server.service Since we dropped support CentOS7, now we always can use AmbientCapabilities without systemd version check, so we can move it from capabilities.conf to scylla-server.service. Although, we still cannnot hardcode CAP_PERFMON since it is too new, only newer kernel supported this, so keep it on scylla_post_install.sh	2023-12-11 19:38:28 +09:00
Takuya ASADA	f90c10260f	scylla_post_install.sh: Add CAP_PERFMON to AmbientCapabilities Add CAP_PERFMON to AmbientCapabilities in capabilities.conf, to enable perf_event based stall detector in Seastar. However, on Debian/Ubuntu CAP_PERFMON with non-root user does not work because it sets kernel.perf_event_paranoid=4 which disallow all non-root user access. (On Debian it kernel.perf_event_paranoid=3) So we need to configure kernel.perf_event_paranoid=2 on these distros. see: https://askubuntu.com/questions/1400874/what-does-perf-paranoia-level-four-do Also, CAP_PERFMON is only available on linux-5.8+, older kernel does not have this capability. To enable older kernel environment such as CentOS7, we need to configure kernel.perf_event_paranoid=1 to allow non-root user access even without the capability. Fixes #15743 Closes scylladb/scylladb#16070	2023-12-06 13:53:08 +02:00
Takuya ASADA	338a9492c9	scylla_post_install.sh: detect RHEL correctly $ID_LIKE = "rhel" works only on RHEL compatible OSes, not for RHEL itself. To detect RHEL correctly, we also need to check $ID = "rhel". Fixes #16040 Closes scylladb/scylladb#16041	2023-11-14 13:53:35 +02:00
Takuya ASADA	bf27fdeaa2	scylla_coredump_setup: fix coredump timeout settings We currently configure only TimeoutStartSec, but probably it's not enough to prevent coredump timeout, since TimeoutStartSec is maximum waiting time for service startup, and there is another directive to specify maximum service running time (RuntimeMaxSec). To fix the problem, we should specify RunTimeMaxSec and TimeoutSec (it configures both TimeoutStartSec and TimeoutStopSec). Fixes #5430 Closes #12757	2023-02-16 10:23:20 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Takuya ASADA	3a25e7285b	scylla_post_install.sh: generate memory.conf for CentOS7 On CentOS7, systemd does not support percentage-based parameter. To apply memory parameter on CentOS7, we need to override the parameter in bytes, instead of percentage. Fixes #6783	2020-07-29 14:10:16 +03:00
Takuya ASADA	df4fac2849	dist: add scylla_memory_setup To ask user the host is not shared with another services, then set "--lock-memory 1" if it's not shared. Fixes #1393	2020-04-26 13:34:05 +03:00
Takuya ASADA	9a84164c95	dist: drop old distribution code Since we dropped support of Ubuntu 14.04 and Debian 8, we can remove the code for these distributions.	2020-02-17 10:18:35 +02:00
Takuya ASADA	b6988112b4	scylla_post_install.sh: fix operator precedence issue with multiple statements In bash, 'A \|\| B && C' will be problem because when A is true, then it will be evaluates C, since && and \|\| have the same precedence. To avoid the issue we need make B && C in one statement. Fixes #5764	2020-02-10 14:29:40 +02:00
Takuya ASADA	5627888b7c	scylla_post_install.sh: fix 'integer expression expected' error awk returns float value on Debian, it causes postinst script failure since we compare it as integer value. Replaced with sed + bash. Fixes #5569	2020-01-20 11:13:55 +02:00
Glauber Costa	da260ecd61	systemd: put scylla processes in systemd slices. It is well known that seastar applications, like Scylla, do not play well with external processes: CPU usage from external processes may confuse the I/O and CPU schedulers and create stalls. We have also recently seen that memory usage from other application's anonymous and page cache memory can bring the system to OOM. Linux has a very good infrastructure for resource control contributed by amazingly bright engineers in the form of cgroup controllers. This infrastructure is exposed by SystemD in the form of slices: a hierarchical structure to which controllers can be attached. In true systemd way, the hierarchy is implicit in the filenames of the slice files. a "-" symbol defines the hierarchy, so the files that this patch presents, scylla-server and scylla-helper, essentially create a "scylla" cgroup at the top level with "server" and "helper" children. Later we mark the Services needed to run scylla as belonging to one or the other through the Slice= directive. Scylla DBAs can benefit from this setup by using the systemd-run utility to fire ad-hoc commands. Let's say for example that someone wants to hypothetically run a backup and transfer files to an external object store like S3, making sure that the amount of page cache used won't create swap pressure leading to database timeouts. One can then run something like: ``` sudo systemd-run --uid=`id -u scylla` --gid=`id -g scylla` -t --slice=scylla-helper.slice /path/to/my/magical_backup_tool ``` (or even better, the backup tool can itself be a systemd timer) Changes from last version: - No longer use the CPUQuota - Minor typo fixes - postinstall fixup for small machines Benchmark results: ================== Test: read from disk, with 100% disk util using a single i3.xlarge (4 vCPUs). We have to fill the cache as we read, so this should stress CPU, memory and disk I/O. cassandra-stress command: ``` cassandra-stress read no-warmup duration=5m -rate threads=20 -node 10.2.209.188 -pop dist=uniform$1..150000000$ ``` Baseline results: ``` Results: Op rate : 13,830 op/s [READ: 13,830 op/s] Partition rate : 13,830 pk/s [READ: 13,830 pk/s] Row rate : 13,830 row/s [READ: 13,830 row/s] Latency mean : 1.4 ms [READ: 1.4 ms] Latency median : 1.4 ms [READ: 1.4 ms] Latency 95th percentile : 2.4 ms [READ: 2.4 ms] Latency 99th percentile : 2.8 ms [READ: 2.8 ms] Latency 99.9th percentile : 3.4 ms [READ: 3.4 ms] Latency max : 12.0 ms [READ: 12.0 ms] Total partitions : 4,149,130 [READ: 4,149,130] Total errors : 0 [READ: 0] Total GC count : 0 Total GC memory : 0.000 KiB Total GC time : 0.0 seconds Avg GC time : NaN ms StdDev GC time : 0.0 ms Total operation time : 00:05:00 ``` Question 1: =========== Does putting scylla in a special slice affect its performance ? Results with Scylla running in a slice: ``` Results: Op rate : 13,811 op/s [READ: 13,811 op/s] Partition rate : 13,811 pk/s [READ: 13,811 pk/s] Row rate : 13,811 row/s [READ: 13,811 row/s] Latency mean : 1.4 ms [READ: 1.4 ms] Latency median : 1.4 ms [READ: 1.4 ms] Latency 95th percentile : 2.2 ms [READ: 2.2 ms] Latency 99th percentile : 2.6 ms [READ: 2.6 ms] Latency 99.9th percentile : 3.3 ms [READ: 3.3 ms] Latency max : 23.2 ms [READ: 23.2 ms] Total partitions : 4,151,409 [READ: 4,151,409] Total errors : 0 [READ: 0] Total GC count : 0 Total GC memory : 0.000 KiB Total GC time : 0.0 seconds Avg GC time : NaN ms StdDev GC time : 0.0 ms Total operation time : 00:05:00 ``` Conclusion : No significant change Question 2: =========== What happens when there is a CPU hog running in the same server as scylla? CPU hog: ``` taskset -c 0 /bin/sh -c "while true; do true; done" & taskset -c 1 /bin/sh -c "while true; do true; done" & taskset -c 2 /bin/sh -c "while true; do true; done" & taskset -c 3 /bin/sh -c "while true; do true; done" & sleep 330 ``` Scenario 1: CPU hog runs freely: ``` Results: Op rate : 2,939 op/s [READ: 2,939 op/s] Partition rate : 2,939 pk/s [READ: 2,939 pk/s] Row rate : 2,939 row/s [READ: 2,939 row/s] Latency mean : 6.8 ms [READ: 6.8 ms] Latency median : 5.3 ms [READ: 5.3 ms] Latency 95th percentile : 11.0 ms [READ: 11.0 ms] Latency 99th percentile : 14.9 ms [READ: 14.9 ms] Latency 99.9th percentile : 17.1 ms [READ: 17.1 ms] Latency max : 26.3 ms [READ: 26.3 ms] Total partitions : 884,460 [READ: 884,460] Total errors : 0 [READ: 0] Total GC count : 0 Total GC memory : 0.000 KiB Total GC time : 0.0 seconds Avg GC time : NaN ms StdDev GC time : 0.0 ms Total operation time : 00:05:00 ``` Scenario 2: CPU hog runs inside scylla-helper slice ``` Results: Op rate : 13,527 op/s [READ: 13,527 op/s] Partition rate : 13,527 pk/s [READ: 13,527 pk/s] Row rate : 13,527 row/s [READ: 13,527 row/s] Latency mean : 1.5 ms [READ: 1.5 ms] Latency median : 1.4 ms [READ: 1.4 ms] Latency 95th percentile : 2.4 ms [READ: 2.4 ms] Latency 99th percentile : 2.9 ms [READ: 2.9 ms] Latency 99.9th percentile : 3.8 ms [READ: 3.8 ms] Latency max : 18.7 ms [READ: 18.7 ms] Total partitions : 4,069,934 [READ: 4,069,934] Total errors : 0 [READ: 0] Total GC count : 0 Total GC memory : 0.000 KiB Total GC time : 0.0 seconds Avg GC time : NaN ms StdDev GC time : 0.0 ms Total operation time : 00:05:00 ``` Conclusion: With systemd slice we can keep the performance very close to baseline Question 3: =========== What happens when there is a CPU hog running in the same server as scylla? I/O hog: (Data in the cluster is 2x size of memory) ``` while true; do find /var/lib/scylla/data -type f -exec grep glauber {} + done ``` Scenario 1: I/O hog runs freely: ``` Results: Op rate : 7,680 op/s [READ: 7,680 op/s] Partition rate : 7,680 pk/s [READ: 7,680 pk/s] Row rate : 7,680 row/s [READ: 7,680 row/s] Latency mean : 2.6 ms [READ: 2.6 ms] Latency median : 1.3 ms [READ: 1.3 ms] Latency 95th percentile : 7.8 ms [READ: 7.8 ms] Latency 99th percentile : 10.9 ms [READ: 10.9 ms] Latency 99.9th percentile : 16.9 ms [READ: 16.9 ms] Latency max : 40.8 ms [READ: 40.8 ms] Total partitions : 2,306,723 [READ: 2,306,723] Total errors : 0 [READ: 0] Total GC count : 0 Total GC memory : 0.000 KiB Total GC time : 0.0 seconds Avg GC time : NaN ms StdDev GC time : 0.0 ms Total operation time : 00:05:00 ``` Scenario 2: I/O hog runs in the scylla-helper systemd slice: ``` Results: Op rate : 13,277 op/s [READ: 13,277 op/s] Partition rate : 13,277 pk/s [READ: 13,277 pk/s] Row rate : 13,277 row/s [READ: 13,277 row/s] Latency mean : 1.5 ms [READ: 1.5 ms] Latency median : 1.4 ms [READ: 1.4 ms] Latency 95th percentile : 2.4 ms [READ: 2.4 ms] Latency 99th percentile : 2.9 ms [READ: 2.9 ms] Latency 99.9th percentile : 3.5 ms [READ: 3.5 ms] Latency max : 183.4 ms [READ: 183.4 ms] Total partitions : 3,984,080 [READ: 3,984,080] Total errors : 0 [READ: 0] Total GC count : 0 Total GC memory : 0.000 KiB Total GC time : 0.0 seconds Avg GC time : NaN ms StdDev GC time : 0.0 ms Total operation time : 00:05:00 ``` Conclusion: With systemd slice we can keep the performance very close to baseline Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-08-19 14:31:28 -04:00
Glauber Costa	ffc328c924	move postinst steps to an external script There are systemd-related steps done in both rpm and deb builds. Move that to a script so we avoid duplication. The tests are so far a bit specific to the distributions, so it needs to be adapted a bit. Also note that this also fixes a bug with rpm as a side-effect: rpm does not call daemon-reload after potentially changing the systemd files (it is only implied during postun operations, that happen during uninstall). daemon-reload was called explicitly for debian packages, and now it is called for both. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-08-15 10:43:17 -04:00

15 Commits