CMU 15-445/645 Project 2: B+Tree Index Implementation
Checkpoint #1
Task #1 – B+Tree Pages
Implement three page classes that represent the data storage units of a B+Tree.
B+Tree Parent Page
Files: src/include/storage/page/b_plus_tree_page.h and src/storage/page/b_plus_tree_page.cpp. This is a base class containing fields shared by its two subclasses.
| Field | Size (bytes) | Description |
|---|---|---|
page_type_ |
4 | Indicates whether the page is internal or leaf. |
lsn_ |
4 | Log sequence number (used in Project 4). |
size_ |
4 | Current number of key/value pairs stored. |
max_size_ |
4 | Maximum allowed key/value pairs. |
parent_page_id_ |
4 | Page ID of the parent node. |
page_id_ |
4 | Page ID of this node itself. |
B+Tree Internal Page
Files: src/include/storage/page/b_plus_tree_internal_page.h and src/storage/page/b_plus_tree_internal_page.cpp. An internal node holds an ordered list of m keys and m+1 child pointers (page IDs). The first key position is unused; therefore lookup routines must start from the second key. Each internal page must be at least half full at all times. During deletion two half‑full pages may be merged or rows redistributed to avoid merging. During insertion a full page is split into two.
B+Tree Leaf Page
Files: src/include/storage/page/b_plus_tree_leaf_page.h and src/storage/page/b_plus_tree_leaf_page.cpp. Leaf pages hold an ordered list of m keys and m values. Values are 64‑bit RID objects (defined in src/include/common/rid.h) that point to the actual tuple location. Leaves obey the same occupancy constraints as internal pages and support identical merge, redistribution, and split operations.
Important: Even though leaves and internal pages use the same key type, the value types differ, so max_size may vary between the two page types.
Important: Every leaf and internal page corresponds to the data_ portion of a buffer pool page. To read or modify such a node, first fetch the page from the buffer pool using its page_id, then reinterpret‑cast it to the appropriate page type, and after every write or read operation unpin the page.
Task #2 – B+Tree Data Structure
Files: src/include/index/b_plus_tree.h and src/storage/index/b_plus_tree.cpp. The B+Tree index must support only unique keys: inserting a duplicate key/value pair should return false. When deletion makes a page under‑filled, the tree must correctly merge or redistribute pages (also called coalescing).
For Checkpoint #1 you need only Insert(), point search GetValue(), and Delete(). Splits must be performed when insertion would cause the key/value count in a leaf to equal max_size, or when the number of children in an internal node would reach max_size. Because any write operation may change the root page ID, you must update the root_page_id in the header page (src/include/storage/page/header_page.h) to keep the index persistent on disk. The BPlusTree class already provides a helper function UpdateRootPageId; call it whenever the root page ID changes.
Your B+Tree implementation must be templated to hide key/value types and the comparator:
template <typename KeyType,
typename ValueType,
typename KeyComparator>
class BPlusTree {
// ...
};
The following types are predefined for you:
KeyType– alwaysGenericKey; its actual size is specified via template argument and depends on the indexed attribute type.ValueType– always 64‑bit RID.KeyComparator– a class for comparing two keys (less/greater). Provided with the key type implementation files.
Checkpoint #2
Task #3 – Index Iterator
Files: src/include/storage/index/index_iterator.h and src/index/storage/index_iterator.cpp. Build a general index iterator that traverses all leaf pages efficiently. The idea is to link leaves into a singly linked list and iterate over all key/value pairs in a specific direction. The iterator must follow C++17 iterator conventions, including operators for increment, dereference, equality, and inequality, as well as support for range‑based for loops. To enable that, your B+Tree must correctly implement begin() and end().
Implement these functions in IndexIterator:
isEnd()– check if the iterator points past the last key/value pair.operator++()– advance to the next pair.operator*()– return the current key/value pair.operator==()andoperator!=()– compare two iterators.
Task #4 – Concurrent Index
Update your single‑threaded B+Tree to support concurrent operations using latch crabbing (as described in lectures and the textbook, Chapter 15.10). Threads traverse the tree while acquiring and releasing latches on B+Tree pages. A thread releases the latch on a parent page only when the child page is considered "safe". The definition of safe depends on the operation:
- Search: After obtaining a read (R) latch on a child, release the parent’s latch immediately.
- Insert: Acquire a write (W) latch on the child. After latching the child, check if it is safe (not full). If safe, release all ancestor latches.
- Delete: Acquire a write (W) latch on the child. After latching the child, check if it is safe (at least half‑full, with different criteria for the root). If safe, release all ancestor latches.
Important: These are the basic ideas; consult the lecture and textbook §15.10 before implementing.
Requirements and Hints
- Do not use a global latch to protect the entire index (i.e., do not lock the whole index for an operation). Automated and manual checks will ensure you employ correct latch crabbing.
- A reader‑writer latch implementation is provided in
src/include/common/rwlatch.h, and the page header (src/include/storage/page/page.h) includes helper functions to acquire/release latches. - No mandatory interfaces are added to the B+Tree class. You may add any helper functions as long as all original public interfaces remain unchanged for testing.
- Do not use
malloc/newto allocate large memory for the tree. For new nodes or temporary buffers, use the buffer pool. - You must use the provided
transactionpointer parameter (src/include/concurrency/transaction.h). It provides methods to store pages whose latches you have obtained during traversal, as well as pages that were removed during deletion. Study theFindLeafPagemethod carefully; you may modify it (including its return type) and integrate latch crabbing logic there. FetchPage()from the buffer pool manager returns a pointer to aPageinstance (src/include/storage/page/page.h). You can acquire a latch on aPagebut not directly on a B+Tree node.
Common Pitfalls
- The thread‑safe scan tests only iterator operations without concurrency. The correct implementation must throw a
std::exceptionwhen a leaf page cannot acquire a latch on its sibling to avoid deadlocks. - Be careful about the relationship between
UnpinPage(page_id, is_dirty)of the buffer pool manager andUnLock()of the page. Always release the page’s latch before unpinning it. - When releasing latches, maintain the root‑to‑leaf order. Each thread acquires latches from root to leaf; release them in the same order.
- The member variable
root_page_id(src/include/storage/index/b_plus_tree.h) may be updated during insert and delete. You must protect this shared variable from concurrent updates – for example, by adding an abstraction layer with astd::mutex.
Instructions
Local Testing
$ cd build
$ make b_plus_tree_insert_test -j$(nproc)
$ ./test/b_plus_tree_insert_test
This project also provides a way to generate a visual representation of the B+Tree for local debugging. See the official website for details. You can compare your tree visualization against the correct output.
Submission
$ cd build
$ make format
$ make check-lint
$ make check-clang-tidy-p2
$ make submit-p2