Understanding InnoDB Index Structure and Maintenance
Data Structure
A B+ tree is a multi-way balanced search tree. All leaf nodes reside at the same depth. Leaf nodes are interconnected via a doubly linked list to optimize range scans.
Purpose
Indexes expedite query execution by reducing the number of rows scanned.
Index Creation
Values from the indexed columns are sorted, and a B+ tree structure is built on top of them.
Storage Rules
- Internal nodes hold index keys and pointers to child nodes, stored in interleaved fashion.
- For a clustered index (primary key), leaf nodes contain the full row data.
- For a secondary index (non-primary key), leaf nodes store the index key together with the corresponding primary key value.
Query Execution
Starting from the root, a binary search determines which child pointer to follow. This process repeats at each level until a leaf node is reached.
- If the query uses a clustered index, the row data is retrieved directly from the leaf.
- If it uses a secondary index, the primary key is obtained from the leaf, and a subsequent lookup on the clustered index (called a "table access by index rowid" or "back to the table") fetches the full row.
Index Rebalancing
Because the B+ tree must remain strictly balanced and ordered, any insert, update, or delete operation may trigger node splitting or merging to preserve these properties.
Impact of Rebalancing
- InnoDB stores both index and data in units called "pages" (default 16 KB).
- Page splits allocate new pages and migrate data into them; page merges combine adjacent pages.
- These operations can leave pages partially filled, especially for clustered indexes where leaf pages contain the actual row data. Over time, storage fragmentation increases—more pages with unused space appear.
Index Rebuilding
When fragmentation reaches a certain threshold, MySQL automatically rebuilds the index. New pages are allocated, a fresh index tree is constructed, and all data is moved into the new structure in sorted order.
Consequences of Rebuilding
- Fragmentation is eliminated, improving storage efficiency and scan performance.
- The rebuild consumes significant CPU and I/O resources. To ensure data consistency, MySQL may acquire locks (such as table locks) during the process, potentially causing query delays or brief service stalls.