alternator: correct implemention of UpdateItem with nested attributes and ReturnValues

This patch fixes the last missing part of nested attribute support in
UpdateItem - returning the correct attributes when ReturnValues is requested.
When the expression says "a.b = :val" and ReturnValues is set to UPDATED_OLD
or UPDATED_NEW, only the actual updated attribute a.b should be returned, not
the entire top-level attribute a as we did before this patch.

This patch was made very simple because our existing hierarchy_filter()
function already does exactly the right thing, and can trivially be made to
accept any attribute_path_map<T> (in our case attribute_path_map<action>),
not just attrs_to_get as it did until now.

This patch also adds several more checks to the test in test_returnvalues.py
to improve the test's coverage even more. Interestingly, I discovered two
esoteric cases where DynamoDB does something which makes little sense, but
apparently simplified their implementation - but the beautiful thing is that
it also simplifies our implementation! See long comments about these two
cases in the test code.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This commit is contained in:
Nadav Har'El
2021-02-14 11:44:51 +02:00
parent 964500e47a
commit 49cd9b3fd5
2 changed files with 63 additions and 9 deletions

View File

@@ -1883,14 +1883,15 @@ static std::string get_item_type_string(const rjson::value& v) {
}
// attrs_to_get saves for each top-level attribute an attrs_to_get_node,
// a hierarchy of subparts that need to be kep. The following function
// a hierarchy of subparts that need to be kept. The following function
// takes a given JSON value and drops its parts which weren't asked to be
// kept. It modifies the given JSON value, or returns false to signify that
// the entire object should be dropped.
// Note that The JSON value is assumed to be encoded using the DynamoDB
// conventions - i.e., it is really a map whose key has a type string,
// and the value is the real object.
static bool hierarchy_filter(rjson::value& val, const attrs_to_get_node& h) {
template<typename T>
static bool hierarchy_filter(rjson::value& val, const attribute_path_map_node<T>& h) {
if (!val.IsObject() || val.MemberCount() != 1) {
// This shouldn't happen. We shouldn't have stored malformed objects.
// But today Alternator does not validate the structure of nested
@@ -2504,17 +2505,37 @@ update_item_operation::apply(std::unique_ptr<rjson::value> previous_item, api::t
auto& row = m.partition().clustered_row(*_schema, _ck);
attribute_collector attrs_collector;
bool any_updates = false;
auto do_update = [&] (bytes&& column_name, const rjson::value& json_value) {
auto do_update = [&] (bytes&& column_name, const rjson::value& json_value,
const attribute_path_map_node<parsed::update_expression::action>* h = nullptr) {
any_updates = true;
if (_returnvalues == returnvalues::ALL_NEW ||
_returnvalues == returnvalues::UPDATED_NEW) {
if (_returnvalues == returnvalues::ALL_NEW) {
rjson::set_with_string_name(_return_attributes,
to_sstring_view(column_name), rjson::copy(json_value));
to_sstring_view(column_name), rjson::copy(json_value));
} else if (_returnvalues == returnvalues::UPDATED_NEW) {
rjson::value&& v = rjson::copy(json_value);
if (h) {
// If the operation was only on specific attribute paths,
// leave only them in _return_attributes.
if (hierarchy_filter(v, *h)) {
rjson::set_with_string_name(_return_attributes,
to_sstring_view(column_name), std::move(v));
}
} else {
rjson::set_with_string_name(_return_attributes,
to_sstring_view(column_name), std::move(v));
}
} else if (_returnvalues == returnvalues::UPDATED_OLD && previous_item) {
std::string_view cn = to_sstring_view(column_name);
const rjson::value* col = rjson::find(*previous_item, cn);
if (col) {
rjson::set_with_string_name(_return_attributes, cn, rjson::copy(*col));
rjson::value&& v = rjson::copy(*col);
if (h) {
if (hierarchy_filter(v, *h)) {
rjson::set_with_string_name(_return_attributes, cn, std::move(v));
}
} else {
rjson::set_with_string_name(_return_attributes, cn, std::move(v));
}
}
}
const column_definition* cdef = _schema->get_column_definition(column_name);
@@ -2605,7 +2626,7 @@ update_item_operation::apply(std::unique_ptr<rjson::value> previous_item, api::t
}
rjson::value result = rjson::copy(*toplevel);
hierarchy_actions(result, actions.second, previous_item.get());
do_update(to_bytes(column_name), std::move(result));
do_update(to_bytes(column_name), std::move(result), &actions.second);
}
}
}

View File

@@ -346,7 +346,6 @@ def test_update_item_returnvalues_updated_new(test_table_s):
# Test the ReturnValues from an UpdateItem directly modifying a *nested*
# attribute, in the relevant ReturnValue modes:
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_item_returnvalues_nested(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 'dog', 'c': [1, 2, 3]}, 'd': 'cat'})
@@ -398,3 +397,37 @@ def test_update_item_returnvalues_nested(test_table_s):
UpdateExpression='SET a.c[1] = :val, a.b = :val2',
ExpressionAttributeValues={':val': 3, ':val2': 'dog'})
assert ret['Attributes'] == {'p': p, 'a': {'b': 'dog', 'c': [1, 3, 3]}, 'd': 'cat' }
# Test with REMOVE, and one of them doing nothing (so shouldn't be in UPDATED_OLD)
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_OLD',
UpdateExpression='REMOVE a.c[1], a.c[3]')
assert ret['Attributes'] == {'a': {'c': [3]}} # a.c[3] did not exist
# When adding a new sub-attribute, UPDATED_OLD does not return anything,
# but UPDATED_NEW does:
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_OLD',
UpdateExpression='SET a.x1 = :val',
ExpressionAttributeValues={':val': 8})
assert not 'Attributes' in ret
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_NEW',
UpdateExpression='SET a.x2 = :val',
ExpressionAttributeValues={':val': 8})
assert ret['Attributes'] == {'a': {'x2': 8}}
# Nevertheless, there is one strange exception - although setting an array
# element *beyond* its end (e.g., a.c[100]) does add a new array item, it
# is *not* returned by UPDATED_NEW. I am not sure DynamoDB did this
# deliberately (is it a bug or a feature?), but it simplifies our
# implementation as well: after setting a.c[100], a.c[100] is not actually
# set (instead, maybe a.c[4] was set) so UPDATED_NEW returning a.c[100]
# returns nothing.
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_NEW',
UpdateExpression='SET a.c[100] = :val',
ExpressionAttributeValues={':val': 70})
assert not 'Attributes' in ret
# When removing an item, it shouldn't appear in UPDATED_NEW. Again there
# a strange exception - which I'm not sure if we should consider it a
# DynamoDB bug or feature - but it simplifies our own implementation as
# well: if we remove a.c[1], the new item *will* have a new a.c[1] (the
# previous a.c[2] is shifted back), so this value is returned. This is
# very odd, but this is what DynamoDB does, so we should too...
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_NEW',
UpdateExpression='REMOVE a.c[1]')
assert ret['Attributes'] == {'a': {'c': [70]}}