Merge 'Revert scylla sstable schema improvements' from Botond Dénes

This PR reverts the scylla sstable schema loading improvements as they fail in CI every other run. I am already working on fixes for these but I am not sure I understand all the failures so it is best to revert and re-post the series later.

Fixes: #13404
Fixes: #13410

Closes #13419

* github.com:scylladb/scylladb:
  Revert "Merge 'tool/scylla-sstable: more flexibility in obtaining the schema' from Botond Dénes"
  Revert "tools/schema_loader: don't require results from optional schema tables"
This commit is contained in:
Nadav Har'El
2023-04-04 18:22:14 +03:00
7 changed files with 17 additions and 452 deletions

View File

@@ -128,7 +128,6 @@ schema_ptr indexes();
schema_ptr tables();
schema_ptr scylla_tables(schema_features features = schema_features::full());
schema_ptr views();
schema_ptr types();
schema_ptr computed_columns();
// Belongs to the "system" keyspace
schema_ptr scylla_table_schema_history();

View File

@@ -35,33 +35,14 @@ You can specify more than one SStable.
Schema
------
All operations need a schema to interpret the SStables with.
This tool tries to auto-detect the location of the ScyllaDB data directories and the name of the table the SStable belongs to.
If the SStable is located in a ScyllaDB data directory, it works out-of-the-box, without any additional input from the user.
If the SStable is located at an external path, you need to specify the names of the keyspace and table to which the SStable belongs. In addition, some hints as to where the ScyllaDB data directory is located may also be required.
Currently, there are two ways to obtain the schema:
The schema can be obtained in the following ways:
* Auto-detected - If the SStable is located in the table's directory within the ScyllaDB data directory.
* ``--keyspace=KEYSPACE --table=TABLE`` - If the SStable is located at an external location, but the ScyllaDB data directory or the config file are located at the standard location. The tool also reads the ``SCYLLA_CONF`` and ``SCYLLA_HOME`` environment variables to try to locate the configuration file.
* ``--schema-file FILENAME`` - Read the schema definition from a file.
* ``--system-schema --keyspace=KEYSPACE --table=TABLE`` - Use the known definition of built-in tables (only works for system tables).
* ``--scylla-data-dir SCYLLA_DATA_DIR_PATH --keyspace=KEYSPACE --table=TABLE`` - Read the schema tables from the data directory at the provided location, needs the keyspace and table name to be provided with ``--keyspace`` and ``--table``.
* ``--scylla-yaml-file SCYLLA_YAML_FILE_PATH --keyspace=KEYSPACE --table=TABLE`` - Read the schema tables from the data directory path obtained from the configuration, needs the keyspace and table name to be provided with ``--keyspace`` and ``--table``.
* ``--system-schema KEYSPACE.TABLE`` - Use the known definition of built-in tables (only works for system tables).
By default (no schema-related options are provided), the tool will try the following sequence:
* Try to load schema from ``schema.cql``.
* Try to deduce the ScyllaDB data directory path and table names from the SStable path.
* Try to load the schema from the ScyllaDB directory located at the standard location (``/var/lib/scylla``). For this to succeed, the table name has to be provided via ``--keyspace`` and ``--table``.
* Try to load the schema from the ScyllaDB directory path obtained from config at the standard location (``./conf/scylla.yaml``). ``SCYLLA_CONF`` and ``SCYLLA_HOME`` environment variables are also checked. For this to succeed, the table name has to be provided via ``--keyspace`` and ``--table``.
The tool stops after the first successful attempt. If none of the above succeed, an error message will be printed.
A user provided schema in ``schema.cql`` (if present) always takes precedence over other methods. This is deliberate, to allow to manually override the schema to be used.
schema.cql
^^^^^^^^^^
By default, the tool uses the first method: ``--schema-file schema.cql``; i.e. it assumes there is a schema file named ``schema.cql`` in the working directory.
If this fails, it will exit with an error.
The schema file should contain all definitions needed to interpret data belonging to the table.
@@ -91,7 +72,7 @@ Note:
* The schema file doesn't have to be called ``schema.cql``, this is just the default name. Any file name is supported (with any extension).
Dropped columns
~~~~~~~~~~~~~~~
^^^^^^^^^^^^^^^
The examined sstable might have columns which were dropped from the schema definition. In this case providing the up-do-date schema will not be enough, the tool will fail when attempting to process a cell for the dropped column.
Dropped columns can be provided to the tool in the form of insert statements into the ``system_schema.dropped_columns`` system table, in the schema definition file. Example:

View File

@@ -77,12 +77,6 @@ def flush(cql, table):
else:
run_nodetool(cql, "flush", ks, cf)
def flush_keyspace(cql, ks):
if has_rest_api(cql):
requests.post(f'{rest_api_url(cql)}/storage_service/keyspace_flush/{ks}')
else:
run_nodetool(cql, "flush", ks)
def compact(cql, table):
ks, cf = table.split('.')
if has_rest_api(cql):

View File

@@ -15,7 +15,6 @@ import pytest
import subprocess
import tempfile
import random
import shutil
import util
# To run the Scylla tools, we need to run Scylla executable itself, so we
@@ -539,75 +538,3 @@ def test_scylla_sstable_script(cql, test_keyspace, scylla_path, scylla_data_dir,
assert dump_lua_json == cxx_json
assert slice_lua_json == cxx_json
def test_scylla_sstable_schema_loading(cql, scylla_path, scylla_data_dir):
keyspace = "system"
table = "scylla_local"
with tempfile.TemporaryDirectory() as workdir:
ext_data_dir = os.path.join(workdir, "data")
os.mkdir(ext_data_dir)
schema_file_dir = os.path.join(workdir, "schema_file_dir")
os.mkdir(schema_file_dir)
top_conf_dir = os.path.join(workdir, "conf_dir")
os.mkdir(top_conf_dir)
conf_dir = os.path.join(top_conf_dir, "conf")
os.mkdir(conf_dir)
# Need to flush system keyspaces whoose sstables we want to meddle
# with, to make sure they are actually on disk.
nodetool.flush_keyspace(cql, "system_schema")
nodetool.flush_keyspace(cql, "system")
sstables = glob.glob(os.path.join(scylla_data_dir, keyspace, table + "-*", "*-Data.db"))
table_data_dir, sstable_filename = os.path.split(sstables[0])
sstable_glob = "-".join(sstable_filename.split("-")[:-1]) + "*"
sstable_components = glob.glob(os.path.join(table_data_dir, sstable_glob))
for c in sstable_components:
shutil.copy(c, ext_data_dir)
external_sstables = glob.glob(os.path.join(ext_data_dir, "*-Data.db"))
schema_file = os.path.join(schema_file_dir, "schema.cql")
with open(schema_file, "w") as f:
f.write("CREATE TABLE system.scylla_local (key text PRIMARY KEY, value text)")
scylla_yaml_file = os.path.join(conf_dir, "scylla.yaml")
with open(scylla_yaml_file, "w") as f:
f.write(f"workdir: {os.path.split(scylla_data_dir)[0]}")
dump_common_args = [scylla_path, "sstable", "dump-data", "--output-format", "json", "--logger-log-level", "scylla-sstable=debug"]
dump_reference = json.loads(subprocess.check_output(dump_common_args + ["--system-schema", "--keyspace", keyspace, "--table", table] + [sstables[0]]))["sstables"]
dump_reference = list(dump_reference.values())[0]
def do_dump(*args, **kwargs):
dump_common_args = [scylla_path, "sstable", "dump-data", "--output-format", "json", "--logger-log-level", "scylla-sstable=debug"]
dump = json.loads(subprocess.check_output(dump_common_args + list(args) + [kwargs["sstable"]], cwd=kwargs.get("cwd", None), env=kwargs.get("env", None)))["sstables"]
dump = list(dump.values())[0]
assert dump == dump_reference
def check_from_table_dir(*args):
return do_dump(*args, sstable=sstables[0])
def check_from_external_dir(*args, **kwargs):
return do_dump(*args, **dict(sstable=external_sstables[0], **kwargs))
# sstable is in table dir
check_from_table_dir("--system-schema", "--keyspace", keyspace, "--table", table)
check_from_table_dir("--schema-file", schema_file)
check_from_table_dir("--scylla-data-dir", scylla_data_dir, "--keyspace", keyspace, "--table", table)
check_from_table_dir("--scylla-yaml-file", scylla_yaml_file, "--keyspace", keyspace, "--table", table)
check_from_table_dir() # auto-detect - deduce from sstable path
# sstable is in external path, user-provided schema should work as before
check_from_external_dir("--system-schema", "--keyspace", keyspace, "--table", table)
check_from_external_dir("--schema-file", schema_file)
check_from_external_dir("--scylla-data-dir", scylla_data_dir, "--keyspace", keyspace, "--table", table)
check_from_external_dir("--scylla-yaml-file", scylla_yaml_file, "--keyspace", keyspace, "--table", table)
# sstable is in external path, auto-detect methods need --keyspace and --table to work
check_from_external_dir(cwd=schema_file_dir) # should auto pick-up schema.cql
check_from_external_dir("--keyspace", keyspace, "--table", table, cwd=top_conf_dir) # should auto pick-up conf/scylla.yaml
check_from_external_dir("--keyspace", keyspace, "--table", table, env={"SCYLLA_CONF": conf_dir}) # should auto pick-up conf path from env
check_from_external_dir("--keyspace", keyspace, "--table", table, env={"SCYLLA_HOME": top_conf_dir}) # should auto pick-up conf path from env

View File

@@ -23,15 +23,10 @@
#include "db/cql_type_parser.hh"
#include "db/config.hh"
#include "db/extensions.hh"
#include "db/large_data_handler.hh"
#include "db/system_distributed_keyspace.hh"
#include "db/schema_tables.hh"
#include "db/system_keyspace.hh"
#include "partition_slice_builder.hh"
#include "readers/combined.hh"
#include "replica/database.hh"
#include "sstables/sstables_manager.hh"
#include "types/list.hh"
#include "data_dictionary/impl.hh"
#include "data_dictionary/data_dictionary.hh"
#include "gms/feature_service.hh"
@@ -324,217 +319,6 @@ std::vector<schema_ptr> do_load_schemas(std::string_view schema_str) {
return schemas;
}
struct sstable_manager_service {
db::nop_large_data_handler large_data_handler;
db::config dbcfg;
gms::feature_service feature_service;
cache_tracker tracker;
sstables::directory_semaphore dir_sem;
sstables::sstables_manager sst_man;
explicit sstable_manager_service()
: feature_service(gms::feature_config_from_db_config(dbcfg))
, dir_sem(1)
, sst_man(large_data_handler, dbcfg, feature_service, tracker, memory::stats().total_memory(), dir_sem) {
}
future<> stop() {
return sst_man.close();
}
};
mutation_opt read_schema_table_mutation(sharded<sstable_manager_service>& sst_man, std::filesystem::path schema_table_data_path,
std::function<schema_ptr()> schema_factory, reader_permit permit, std::string_view keyspace, std::vector<std::string_view> ck_strings) {
sharded<sstables::sstable_directory> sst_dirs;
sst_dirs.start(
sharded_parameter([&sst_man] { return std::ref(sst_man.local().sst_man); }),
sharded_parameter([&schema_factory] { return schema_factory(); }),
schema_table_data_path,
sharded_parameter([] { return default_priority_class(); }),
sharded_parameter([] { return default_io_error_handler_gen(); })).get();
auto stop_sst_dirs = deferred_stop(sst_dirs);
auto sstable_open_infos = sst_dirs.map_reduce0(
[] (sstables::sstable_directory& sst_dir) -> future<std::vector<sstables::foreign_sstable_open_info>> {
co_await sst_dir.process_sstable_dir(sstables::sstable_directory::process_flags{ .sort_sstables_according_to_owner = false });
const auto& unsorted_ssts = sst_dir.get_unsorted_sstables();
std::vector<sstables::foreign_sstable_open_info> open_infos;
open_infos.reserve(unsorted_ssts.size());
for (auto& sst : unsorted_ssts) {
open_infos.push_back(co_await sst->get_open_info());
}
co_return open_infos;
},
std::vector<sstables::foreign_sstable_open_info>{},
[] (std::vector<sstables::foreign_sstable_open_info> a, std::vector<sstables::foreign_sstable_open_info> b) {
std::move(b.begin(), b.end(), std::back_inserter(a));
return a;
}).get();
auto schema_table_schema = schema_factory();
if (sstable_open_infos.empty()) {
return {};
}
std::vector<sstables::shared_sstable> sstables;
sstables.reserve(sstable_open_infos.size());
for (auto& open_info : sstable_open_infos) {
sstables.push_back(sst_dirs.local().load_foreign_sstable(open_info).get());
}
auto pk = partition_key::from_deeply_exploded(*schema_table_schema, {data_value(keyspace)});
auto dk = dht::decorate_key(*schema_table_schema, pk);
auto pr = dht::partition_range::make_singular(dk);
std::vector<data_value> raw_ck_values;
raw_ck_values.reserve(ck_strings.size());
for (const auto& ck_str : ck_strings) {
raw_ck_values.push_back(data_value(ck_str));
}
auto ck = clustering_key::from_deeply_exploded(*schema_table_schema, raw_ck_values);
auto cr = query::clustering_range::make({ck, true}, {ck, true});
auto ps = partition_slice_builder(*schema_table_schema)
.with_range(cr)
.build();
std::vector<flat_mutation_reader_v2> readers;
readers.reserve(sstables.size());
for (const auto& sst : sstables) {
readers.emplace_back(sst->make_reader(schema_table_schema, permit, pr, ps));
}
auto reader = make_combined_reader(schema_table_schema, permit, std::move(readers));
return read_mutation_from_flat_mutation_reader(reader).get();
}
class single_keyspace_user_types_storage : public data_dictionary::user_types_storage {
data_dictionary::user_types_metadata _utm;
public:
single_keyspace_user_types_storage(data_dictionary::user_types_metadata utm) : _utm(std::move(utm)) { }
virtual const data_dictionary::user_types_metadata& get(const sstring& ks) const override {
return _utm;
}
};
std::unordered_map<schema_ptr, std::string> get_schema_table_directories(std::filesystem::path scylla_data_path) {
const std::vector<schema_ptr> schemas{
db::schema_tables::types(),
db::schema_tables::tables(),
db::schema_tables::columns(),
db::schema_tables::view_virtual_columns(),
db::schema_tables::computed_columns(),
db::schema_tables::indexes(),
db::schema_tables::dropped_columns(),
db::schema_tables::scylla_tables()};
std::unordered_map<schema_ptr, std::string> schema_table_table_dir;
auto schema_tables_path = scylla_data_path / db::schema_tables::NAME;
auto schema_tables_dir = open_directory(schema_tables_path.native()).get();
schema_tables_dir.list_directory([&] (directory_entry de) -> future<> {
auto dash_pos = de.name.find_last_of('-');
auto table_name = de.name.substr(0, dash_pos);
auto it = boost::find_if(schemas, [&] (const schema_ptr& s) {
return s->cf_name() == table_name;
});
if (it != schemas.end()) {
if (!de.type) {
throw std::runtime_error(fmt::format("failed loading schema tables from {}: keyspace directory entry {} has unrecognized type", scylla_data_path.native(), de.name));
} else if (*de.type != directory_entry_type::directory) {
throw std::runtime_error(fmt::format("failed loading schema tables from {}: keyspace directory entry {} has unrecognized type {}", scylla_data_path.native(), de.name, static_cast<int>(*de.type)));
}
auto s = *it;
schema_table_table_dir[s] = de.name;
}
return make_ready_future<>();
}).done().get();
if (schema_table_table_dir.size() != schemas.size()) {
throw std::runtime_error(fmt::format("failed loading schema tables from {}: couldn't find table directory for all require schema tables", scylla_data_path.native()));
}
return schema_table_table_dir;
}
schema_ptr do_load_schema_from_schema_tables(std::filesystem::path scylla_data_path, std::string_view keyspace, std::string_view table) {
reader_concurrency_semaphore rcs_sem(reader_concurrency_semaphore::no_limits{}, __FUNCTION__);
auto stop_semaphore = deferred_stop(rcs_sem);
sharded<sstable_manager_service> sst_man;
sst_man.start().get();
auto stop_sst_man_service = deferred_stop(sst_man);
auto schema_table_table_dir = get_schema_table_directories(scylla_data_path);
auto schema_tables_path = scylla_data_path / db::schema_tables::NAME;
auto do_load = [&] (std::function<const schema_ptr()> schema_factory) {
auto s = schema_factory();
return read_schema_table_mutation(
sst_man,
schema_tables_path / schema_table_table_dir[s],
schema_factory,
rcs_sem.make_tracking_only_permit(s.get(), "schema_mutation", db::no_timeout, {}),
keyspace,
{table});
};
mutation_opt tables = do_load(db::schema_tables::tables);
mutation_opt columns = do_load(db::schema_tables::columns);
mutation_opt view_virtual_columns = do_load(db::schema_tables::view_virtual_columns);
mutation_opt computed_columns = do_load(db::schema_tables::computed_columns);
mutation_opt indexes = do_load(db::schema_tables::indexes);
mutation_opt dropped_columns = do_load(db::schema_tables::dropped_columns);
mutation_opt scylla_tables = do_load([] () { return db::schema_tables::scylla_tables(); });
if (!tables || !columns) {
throw std::runtime_error(fmt::format("Failed to find {}.{} in 'tables' and/or 'columns' schema tables", keyspace, table));
}
data_dictionary::user_types_metadata utm;
auto types_mut = read_schema_table_mutation(
sst_man,
schema_tables_path / schema_table_table_dir[db::schema_tables::types()],
db::schema_tables::types,
rcs_sem.make_tracking_only_permit(db::schema_tables::types().get(), "types_mutation", db::no_timeout, {}),
keyspace,
{});
if (types_mut) {
query::result_set result(*types_mut);
auto ks = make_lw_shared<keyspace_metadata>(keyspace, "org.apache.cassandra.locator.LocalStrategy", std::map<sstring, sstring>{}, false);
db::cql_type_parser::raw_builder ut_builder(*ks);
auto get_list = [] (const query::result_set_row& row, const char* name) {
return boost::copy_range<std::vector<sstring>>(
row.get_nonnull<const list_type_impl::native_type&>(name)
| boost::adaptors::transformed([] (const data_value& v) { return value_cast<sstring>(v); }));
};
for (const auto& row : result.rows()) {
const auto name = row.get_nonnull<sstring>("type_name");
const auto field_names = get_list(row, "field_names");
const auto field_types = get_list(row, "field_types");
ut_builder.add(name, field_names, field_types);
}
for (auto&& ut : ut_builder.build()) {
utm.add_type(std::move(ut));
}
}
db::config dbcfg;
auto user_type_storage = std::make_shared<single_keyspace_user_types_storage>(std::move(utm));
db::schema_ctxt ctxt(dbcfg, user_type_storage);
schema_mutations muts(std::move(*tables), std::move(*columns), std::move(view_virtual_columns), std::move(computed_columns), std::move(indexes),
std::move(dropped_columns), std::move(scylla_tables));
return db::schema_tables::create_table_from_mutations(ctxt, muts);
}
} // anonymous namespace
namespace tools {
@@ -578,10 +362,4 @@ schema_ptr load_system_schema(std::string_view keyspace, std::string_view table)
return *tb_it;
}
future<schema_ptr> load_schema_from_schema_tables(std::filesystem::path scylla_data_path, std::string_view keyspace, std::string_view table) {
return async([=] () mutable {
return do_load_schema_from_schema_tables(scylla_data_path, keyspace, table);
});
}
} // namespace tools

View File

@@ -49,13 +49,4 @@ future<schema_ptr> load_one_schema_from_file(std::filesystem::path path);
/// all schema and experimental features enabled.
schema_ptr load_system_schema(std::string_view keyspace, std::string_view table);
/// Load the schema of the table with the designated keyspace and table name,
/// from the system schema table sstables.
///
/// The schema table sstables are accessed for read only. In general this method
/// tries very hard to have no side-effects.
/// The \p scylla_data_path parameter is expected to point to the scylla data
/// directory, which is usually /var/lib/scylla/data.
future<schema_ptr> load_schema_from_schema_tables(std::filesystem::path scylla_data_path, std::string_view keyspace, std::string_view table);
} // namespace tools

View File

@@ -129,102 +129,6 @@ partition_set get_partitions(schema_ptr schema, const bpo::variables_map& app_co
return partitions;
}
std::pair<sstring, sstring> get_keyspace_and_table_options(const bpo::variables_map& app_config) {
sstring keyspace_name, table_name;
auto k_it = app_config.find("keyspace");
auto t_it = app_config.find("table");
if (k_it == app_config.end() || t_it == app_config.end()) {
throw std::runtime_error("don't know which schema to load: --keyspace and/or --table are not provided");
}
return std::pair(k_it->second.as<sstring>(), t_it->second.as<sstring>());
}
schema_ptr try_load_schema_from_user_provided_source(const bpo::variables_map& app_config) {
std::string schema_source_opt;
try {
if (!app_config["schema-file"].defaulted()) {
schema_source_opt = "schema-file";
return tools::load_one_schema_from_file(std::filesystem::path(app_config["schema-file"].as<sstring>())).get();
}
// All the below schema sources require this.
const auto [keyspace_name, table_name] = get_keyspace_and_table_options(app_config);
if (app_config.contains("system-schema")) {
schema_source_opt = "system-schema";
return tools::load_system_schema(keyspace_name, table_name);
}
if (app_config.contains("scylla-data-dir")) {
schema_source_opt = "schema-tables";
return tools::load_schema_from_schema_tables(std::filesystem::path(app_config["scylla-data-dir"].as<sstring>()), keyspace_name, table_name).get();
}
if (app_config.contains("scylla-yaml-file")) {
db::config cfg;
cfg.read_from_file(app_config["scylla-yaml-file"].as<sstring>()).get();
cfg.setup_directories();
return tools::load_schema_from_schema_tables(std::filesystem::path(cfg.data_file_directories()[0]), keyspace_name, table_name).get();
}
} catch (...) {
fmt::print(std::cerr, "error: could not load schema via {}: {}\n", schema_source_opt, std::current_exception());
return nullptr;
}
// Should not happen, but if it does (we all know it will), let's at least have a message printed.
fmt::print(std::cerr, "error: could not load schema from known schema sources: unknown error\n");
return nullptr;
}
schema_ptr try_load_schema_autodetect(const bpo::variables_map& app_config) {
try {
return tools::load_one_schema_from_file(std::filesystem::path(app_config["schema-file"].as<sstring>())).get();
} catch (...) {
sst_log.debug("Trying to read schema file from default location failed: {}", std::current_exception());
}
if (app_config.count("sstables")) {
try {
auto sst_path = std::filesystem::path(app_config["sstables"].as<std::vector<sstring>>().front());
const auto sst_dir_path = std::filesystem::path(sst_path).remove_filename();
const auto sst_filename = sst_path.filename();
auto ed = sstables::entry_descriptor::make_descriptor(sst_dir_path.native(), sst_filename.native());
std::filesystem::path data_dir_path;
// Detect whether sstable is in root table directory, or in a sub-directory
// The last component is "" due to the trailing "/" left by "remove_filename()" above.
// So we need to go back 2 more, to find the supposed keyspace component.
if (ed.ks == std::prev(sst_dir_path.end(), 3)->native()) {
data_dir_path = sst_dir_path / ".." / "..";
} else {
data_dir_path = sst_dir_path / ".." / ".." / "..";
}
return tools::load_schema_from_schema_tables(data_dir_path, ed.ks, ed.cf).get();
} catch (...) {
sst_log.debug("Trying to find scylla data dir based on the sstable path failed: {}", std::current_exception());
}
} else {
sst_log.debug("Trying to find scylla data dir based on sstable path failed: no sstable argument provided");
}
try {
auto scylla_yaml_file = db::config::get_conf_sub("scylla.yaml").string();
db::config cfg;
cfg.read_from_file(scylla_yaml_file).get();
cfg.setup_directories();
auto [keyspace_name, table_name] = get_keyspace_and_table_options(app_config);
return tools::load_schema_from_schema_tables(std::filesystem::path(cfg.data_file_directories()[0]), keyspace_name, table_name).get();
} catch (...) {
sst_log.debug("Trying to find and read scylla.yaml failed: {}", std::current_exception());
}
try {
db::config cfg;
cfg.setup_directories();
auto [keyspace_name, table_name] = get_keyspace_and_table_options(app_config);
return tools::load_schema_from_schema_tables(std::filesystem::path(cfg.data_file_directories()[0]), keyspace_name, table_name).get();
} catch (...) {
sst_log.debug("Trying to find scylla data dir at default location failed: {}", std::current_exception());
}
fmt::print(std::cerr, "Failed to autodetect and load schema, try again with --logger-log-level scylla-sstable=debug to learn more or provide the schema source manually\n");
return nullptr;
}
const std::vector<sstables::shared_sstable> load_sstables(schema_ptr schema, sstables::sstables_manager& sst_man, const std::vector<sstring>& sstable_names) {
std::vector<sstables::shared_sstable> sstables;
sstables.resize(sstable_names.size());
@@ -2891,11 +2795,7 @@ $ scylla sstable validate /path/to/md-123456-big-Data.db /path/to/md-123457-big-
app.add_options()
("schema-file", bpo::value<sstring>()->default_value("schema.cql"), "file containing the schema description")
("keyspace", bpo::value<sstring>(), "keyspace name")
("table", bpo::value<sstring>(), "table name")
("system-schema", "the table designated by --keyspace and --table is a system table, use the hard-coded in-memory hard-coded schema for it")
("scylla-yaml-file", bpo::value<sstring>(), "path to the scylla.yaml config file, to obtain the data directory path from, this can be also provided directly with --scylla-data-dir")
("scylla-data-dir", bpo::value<sstring>(), "path to the scylla data dir (usually /var/lib/scylla/data), to read the schema tables from")
("system-schema", bpo::value<sstring>(), "table has to be a system table, name has to be in `keyspace.table` notation")
;
app.add_positional_options({
{"sstables", bpo::value<std::vector<sstring>>(), "sstable(s) to process for operations that have sstable inputs, can also be provided as positional arguments", -1},
@@ -2922,24 +2822,19 @@ $ scylla sstable validate /path/to/md-123456-big-Data.db /path/to/md-123457-big-
const auto& operation = *found_op;
schema_ptr schema;
{
unsigned schema_sources = 0;
schema_sources += !app_config["schema-file"].defaulted();
schema_sources += app_config.contains("system-schema");
schema_sources += app_config.contains("scylla-data-dir");
schema_sources += app_config.contains("scylla-yaml-file");
if (!schema_sources) {
sst_log.debug("No user-provided schema source, attempting to auto-detect it");
schema = try_load_schema_autodetect(app_config);
} else if (schema_sources == 1) {
sst_log.debug("Single schema source provided");
schema = try_load_schema_from_user_provided_source(app_config);
std::string schema_source_opt;
try {
if (auto it = app_config.find("system-schema"); it != app_config.end()) {
schema_source_opt = "system-schema";
std::vector<sstring> comps;
boost::split(comps, app_config["system-schema"].as<sstring>(), boost::is_any_of("."));
schema = tools::load_system_schema(comps.at(0), comps.at(1));
} else {
fmt::print(std::cerr, "Multiple schema sources provided, please provide exactly one of: --schema-file, --system-schema, --scylla-data-dir or --scylla-yaml-file (with the accompanying --keyspace and --table if necessary)\n");
schema_source_opt = "schema-file";
schema = tools::load_one_schema_from_file(std::filesystem::path(app_config["schema-file"].as<sstring>())).get();
}
}
if (!schema) {
} catch (...) {
fmt::print(std::cerr, "error: could not load {} '{}': {}\n", schema_source_opt, app_config[schema_source_opt].as<sstring>(), std::current_exception());
return 1;
}