Implementing a Concurrent Trie Key-Value Store in C++
Data Structure Overview
A Trie (prefix tree) is an efficient ordered tree data structure commonly used for retrieving values associated with string keys. In this implementation, each node in the tree represents a single character of a key. A boolean flag distinguishes intermediate nodes from terminal nodes, which mark the end of a valid key and store the associated value.
For instance, inserting the key-value pairs ("ab", 1) and ("ac", "val") results in both keys sharing the parent node "a". The value 1 is stored in the terminal node "b", while "val" resides in the terminal node "c".
The goal is to build a generic key-value store mapping string keys to values of any type, backed by a thread-safe Trie. All modifications are confined to a single header file (p0_trie.h), and no existing function signatures should be altered.
TrieNode Implementation
The TrieNode class represents a single character within the tree. It contains the character itself, an is_terminal_ flag, and a map of child nodes. Because child nodes are managed via std::unique_ptr, ownership semantics must be carefully respected. Methods that access children return pointers to the unique_ptr itself, allowing modification without transferring or copying ownership.
The move constructor ensures that child pointers are transferred rather than copied, preserving the uniqueness constraint.
class TrieNode {
public:
explicit TrieNode(char c) : node_char_(c) {}
TrieNode(TrieNode &&src) noexcept
: node_char_(src.node_char_),
is_terminal_(src.is_terminal_),
child_nodes_(std::move(src.child_nodes_)) {}
virtual ~TrieNode() = default;
bool ContainsChild(char c) const {
return child_nodes_.count(c) > 0;
}
bool HasAnyChild() const {
return !child_nodes_.empty();
}
bool IsTerminal() const {
return is_terminal_;
}
char GetChar() const {
return node_char_;
}
std::unique_ptr<TrieNode> *AddChild(char c, std::unique_ptr<TrieNode> &&new_child) {
if (ContainsChild(c) || new_child->GetChar() != c) {
return nullptr;
}
child_nodes_[c] = std::move(new_child);
return &child_nodes_[c];
}
std::unique_ptr<TrieNode> *FetchChild(char c) {
auto it = child_nodes_.find(c);
if (it == child_nodes_.end()) {
return nullptr;
}
return &(it->second);
}
void EraseChild(char c) {
child_nodes_.erase(c);
}
void SetTerminal(bool val) {
is_terminal_ = val;
}
protected:
char node_char_;
bool is_terminal_{false};
std::unordered_map<char, std::unique_ptr<TrieNode>> child_nodes_;
};
ValueNode Implementation
The ValueNode class inherits from TrieNode and acts as a terminal node capable of holding a value of any type T. Its is_terminal_ flag is always true. Two constructors are provided: one initializes a new node from a character and a value, while the other upgrades an existing intermediate TrieNode into a terminal ValueNode by taking ownership of its children via move semantics.
template <typename T>
class ValueNode : public TrieNode {
private:
T stored_data_;
public:
ValueNode(TrieNode &&base_node, T val) : TrieNode(std::move(base_node)) {
stored_data_ = val;
is_terminal_ = true;
}
ValueNode(char c, T val) : TrieNode(c) {
stored_data_ = val;
is_terminal_ = true;
}
~ValueNode() override = default;
T GetData() const { return stored_data_; }
};
Concurrent Trie Operations
The ConcurrentTrie class manages the root node and provides thread-safe insert, remove, and retrieve operations. Concurrency is controlled using a reader-writer lock (std::shared_mutex or a custom latch) acquired at the root level. Read operations require a read lock, while modifications require a write lock. Locks must be strictly released before returning to prevent deadlocks.
Insert
To insert a key-value pair, traverse the tree based on the key's characters. If a character node is missing, create and attach a basic TrieNode. Upon reaching the final character:
- If no node exists for the final character, instantiate a
ValueNodeand attach it to the parent. - If the node exists but is intermediate (
is_terminal_ == false), upgrade it to aValueNodeby moving its contents into the new typed node. - If the node is already a terminal node (
is_terminal_ == true), the key exists. Return false to reject duplicate insertions.
template <typename T>
bool Insert(const std::string &key, T val) {
if (key.empty()) {
return false;
}
latch_.WLock();
std::unique_ptr<TrieNode> *curr = &root_;
for (size_t idx = 0; idx < key.length() - 1; idx++) {
if (!curr->get()->ContainsChild(key[idx])) {
curr->get()->AddChild(key[idx], std::make_unique<TrieNode>(key[idx]));
}
curr = curr->get()->FetchChild(key[idx]);
}
char last_char = key.back();
std::unique_ptr<TrieNode> *end_node = curr->get()->FetchChild(last_char);
if (end_node == nullptr) {
curr->get()->AddChild(last_char, std::make_unique<ValueNode<T>>(last_char, val));
latch_.WUnlock();
return true;
}
if (!end_node->get()->IsTerminal()) {
auto upgraded = new ValueNode<T>(std::move(**end_node), val);
end_node->reset(upgraded);
latch_.WUnlock();
return true;
}
latch_.WUnlock();
return false;
}
Remove
Removing a key involves traversing to its terminal node. If the key does not exist, return false. Otherwise, mark the terminal node as intermediate. Then, backtrack from the end of the key towards the root, deleting nodes that satisfy two conditions: they have no remaining children, and they are not terminal nodes for other keys. Stop the cleanup as soon as a node fails either condition.
bool Remove(const std::string &key) {
if (key.empty()) {
return false;
}
latch_.WLock();
std::vector<std::pair<std::unique_ptr<TrieNode>*, char>> path;
std::unique_ptr<TrieNode> *curr = &root_;
for (char c : key) {
path.emplace_back(curr, c);
curr = curr->get()->FetchChild(c);
if (curr == nullptr) {
latch_.WUnlock();
return false;
}
}
curr->get()->SetTerminal(false);
for (int i = path.size() - 1; i >= 0; --i) {
auto parent_ptr = path[i].first;
char traversed_char = path[i].second;
std::unique_ptr<TrieNode> *child_ptr = parent_ptr->get()->FetchChild(traversed_char);
if (!child_ptr->get()->HasAnyChild() && !child_ptr->get()->IsTerminal()) {
parent_ptr->get()->EraseChild(traversed_char);
} else {
break;
}
}
latch_.WUnlock();
return true;
}
GetValue
Retrieving a value requires traversing the key. If traversal fails, set the success flag to false. If the key is found, use dynamic_cast to safely cast the base TrieNode pointer to a ValueNode<T> pointer. If the cast returns a non-null pointer, the type matches, and the value can be extracted. If the cast fails, the requested type does not match the stored type, and the operation fails.
template <typename T>
T GetValue(const std::string &key, bool *is_success) {
*is_success = false;
if (key.empty()) {
return {};
}
latch_.RLock();
std::unique_ptr<TrieNode> *curr = &root_;
for (char c : key) {
curr = curr->get()->FetchChild(c);
if (curr == nullptr) {
latch_.RUnlock();
return {};
}
}
auto val_node = dynamic_cast<ValueNode<T>*>(curr->get());
if (val_node != nullptr) {
*is_success = true;
T result = val_node->GetData();
latch_.RUnlock();
return result;
}
latch_.RUnlock();
return {};
}
Common Implementation Pitfalls
When implementing backtracking logic in the Remove method, developers often encounter two specific issues:
- Unsigned Integer Underflow: Iterating backwards using an unsigned type like
size_t(for (size_t i = N; i >= 0; --i)) causes an infinite loop because0 - 1wraps around to a massive positive value. Always use a signed integer likeintfor reverse iterations that need to reach a negative boundary. - Erroneous Terminal Node Cleanup: During backtrack deletion, a node should only be removed if it lacks children AND is an intermediate node. Consider a trie containing "a", "aa", and "aaa". Removing "aaa" makes the node for "aa" devoid of children, but "aa" is still a valid terminal key. If the logic only checks for the absence of children, it will incorrectly delete the "aa" key. The
IsTerminal()check is crucial to prevent data loss.
Build and Test Execution
To compile and run the unit tests, ensure the DISABLED_ prefix is removed from the test suite functions in the test source file. Then, execute the build commands:
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Debug ..
$ make -j4
$ make starter_trie_test
$ ./test/starter_trie_test
Prior to packaging the submission, code formatting and static analysis checks must be executed and passed cleanly:
$ make format
$ make check-lint
$ make check-clang-tidy-p0
Finally, create the submission archive containing only the required header file, and verify its contents:
$ cd ../
$ zip project0-submission.zip src/include/primer/p0_trie.h
$ unzip -l project0-submission.zip