Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

CMU 15-445/645 Project 2: B+Tree Index Implementation

Tech Jul 5 1

Checkpoint #1

Task #1 – B+Tree Pages

Implement three page classes that represent the data storage units of a B+Tree.

B+Tree Parent Page

Files: src/include/storage/page/b_plus_tree_page.h and src/storage/page/b_plus_tree_page.cpp. This is a base class containing fields shared by its two subclasses.

Field Size (bytes) Description
page_type_ 4 Indicates whether the page is internal or leaf.
lsn_ 4 Log sequence number (used in Project 4).
size_ 4 Current number of key/value pairs stored.
max_size_ 4 Maximum allowed key/value pairs.
parent_page_id_ 4 Page ID of the parent node.
page_id_ 4 Page ID of this node itself.

B+Tree Internal Page

Files: src/include/storage/page/b_plus_tree_internal_page.h and src/storage/page/b_plus_tree_internal_page.cpp. An internal node holds an ordered list of m keys and m+1 child pointers (page IDs). The first key position is unused; therefore lookup routines must start from the second key. Each internal page must be at least half full at all times. During deletion two half‑full pages may be merged or rows redistributed to avoid merging. During insertion a full page is split into two.

B+Tree Leaf Page

Files: src/include/storage/page/b_plus_tree_leaf_page.h and src/storage/page/b_plus_tree_leaf_page.cpp. Leaf pages hold an ordered list of m keys and m values. Values are 64‑bit RID objects (defined in src/include/common/rid.h) that point to the actual tuple location. Leaves obey the same occupancy constraints as internal pages and support identical merge, redistribution, and split operations.

Important: Even though leaves and internal pages use the same key type, the value types differ, so max_size may vary between the two page types.

Important: Every leaf and internal page corresponds to the data_ portion of a buffer pool page. To read or modify such a node, first fetch the page from the buffer pool using its page_id, then reinterpret‑cast it to the appropriate page type, and after every write or read operation unpin the page.

Task #2 – B+Tree Data Structure

Files: src/include/index/b_plus_tree.h and src/storage/index/b_plus_tree.cpp. The B+Tree index must support only unique keys: inserting a duplicate key/value pair should return false. When deletion makes a page under‑filled, the tree must correctly merge or redistribute pages (also called coalescing).

For Checkpoint #1 you need only Insert(), point search GetValue(), and Delete(). Splits must be performed when insertion would cause the key/value count in a leaf to equal max_size, or when the number of children in an internal node would reach max_size. Because any write operation may change the root page ID, you must update the root_page_id in the header page (src/include/storage/page/header_page.h) to keep the index persistent on disk. The BPlusTree class already provides a helper function UpdateRootPageId; call it whenever the root page ID changes.

Your B+Tree implementation must be templated to hide key/value types and the comparator:

template <typename KeyType,
         typename ValueType,
         typename KeyComparator>
class BPlusTree {
  // ...
};

The following types are predefined for you:

  • KeyType – always GenericKey; its actual size is specified via template argument and depends on the indexed attribute type.
  • ValueType – always 64‑bit RID.
  • KeyComparator – a class for comparing two keys (less/greater). Provided with the key type implementation files.

Checkpoint #2

Task #3 – Index Iterator

Files: src/include/storage/index/index_iterator.h and src/index/storage/index_iterator.cpp. Build a general index iterator that traverses all leaf pages efficiently. The idea is to link leaves into a singly linked list and iterate over all key/value pairs in a specific direction. The iterator must follow C++17 iterator conventions, including operators for increment, dereference, equality, and inequality, as well as support for range‑based for loops. To enable that, your B+Tree must correctly implement begin() and end().

Implement these functions in IndexIterator:

  • isEnd() – check if the iterator points past the last key/value pair.
  • operator++() – advance to the next pair.
  • operator*() – return the current key/value pair.
  • operator==() and operator!=() – compare two iterators.

Task #4 – Concurrent Index

Update your single‑threaded B+Tree to support concurrent operations using latch crabbing (as described in lectures and the textbook, Chapter 15.10). Threads traverse the tree while acquiring and releasing latches on B+Tree pages. A thread releases the latch on a parent page only when the child page is considered "safe". The definition of safe depends on the operation:

  • Search: After obtaining a read (R) latch on a child, release the parent’s latch immediately.
  • Insert: Acquire a write (W) latch on the child. After latching the child, check if it is safe (not full). If safe, release all ancestor latches.
  • Delete: Acquire a write (W) latch on the child. After latching the child, check if it is safe (at least half‑full, with different criteria for the root). If safe, release all ancestor latches.

Important: These are the basic ideas; consult the lecture and textbook §15.10 before implementing.

Requirements and Hints

  1. Do not use a global latch to protect the entire index (i.e., do not lock the whole index for an operation). Automated and manual checks will ensure you employ correct latch crabbing.
  2. A reader‑writer latch implementation is provided in src/include/common/rwlatch.h, and the page header (src/include/storage/page/page.h) includes helper functions to acquire/release latches.
  3. No mandatory interfaces are added to the B+Tree class. You may add any helper functions as long as all original public interfaces remain unchanged for testing.
  4. Do not use malloc/new to allocate large memory for the tree. For new nodes or temporary buffers, use the buffer pool.
  5. You must use the provided transaction pointer parameter (src/include/concurrency/transaction.h). It provides methods to store pages whose latches you have obtained during traversal, as well as pages that were removed during deletion. Study the FindLeafPage method carefully; you may modify it (including its return type) and integrate latch crabbing logic there.
  6. FetchPage() from the buffer pool manager returns a pointer to a Page instance (src/include/storage/page/page.h). You can acquire a latch on a Page but not directly on a B+Tree node.

Common Pitfalls

  1. The thread‑safe scan tests only iterator operations without concurrency. The correct implementation must throw a std::exception when a leaf page cannot acquire a latch on its sibling to avoid deadlocks.
  2. Be careful about the relationship between UnpinPage(page_id, is_dirty) of the buffer pool manager and UnLock() of the page. Always release the page’s latch before unpinning it.
  3. When releasing latches, maintain the root‑to‑leaf order. Each thread acquires latches from root to leaf; release them in the same order.
  4. The member variable root_page_id (src/include/storage/index/b_plus_tree.h) may be updated during insert and delete. You must protect this shared variable from concurrent updates – for example, by adding an abstraction layer with a std::mutex.

Instructions

Local Testing

$ cd build
$ make b_plus_tree_insert_test -j$(nproc)
$ ./test/b_plus_tree_insert_test

This project also provides a way to generate a visual representation of the B+Tree for local debugging. See the official website for details. You can compare your tree visualization against the correct output.

Submission

$ cd build
$ make format
$ make check-lint
$ make check-clang-tidy-p2
$ make submit-p2

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.