MySQL Indexing Principles
InnoDB Internals
- InnoDB Internal Architecture
- The Data Structure and Algorithm Behind MySQL Indexing
- B-Trees and B+ Trees
Why B+ Trees are Used for Indexing
Databases use B+ trees for indexing because:
- Insertion and deletion operations in a B+ tree occur only on leaf nodes.
- Each node is the size of a page (typically 4KB), allowing multiple keys per node.
- Internal nodes store only keys (no value data), so they can hold more keys per node, making data more compact.
- Leaf nodes are linked together as a linked list, allowing a single traversal to retrieve all records. This structure is efficient for range queries, inclusive range scans, and full traversals.
Why not a red-black tree? Red-black trees have a greater depth, which requires more disk I/O for indexing operations.
B-Tree vs B+ Tree
A B-tree does not require reaching the leaf node to retrieve a value; frequently accessed keys may be found closer to the root.
In-Depth Understanding of MySQL Internals
Index Failure Cases
The Leftmost Prefix Rule
When a composite index is created on columns a, b, and c (in that order), the following query patterns can utilize the index:
- Queries involving all three columns (
a,b,c) - Queries involving
aandb - Queries involving
aalone
The following pattern cannot use the index:
- Queries involving only
bandc(the leftmost prefixais missing)
Note: The order of columns in the WHERE clause does not matter as long as the leftmost prefix is present. For example, WHERE c = ... AND a = ... AND b = ... will still use the index.
Example
-- Create a composite index
ALTER TABLE newslist ADD INDEX indexName(htmlid, pid, id);
-- Query that can use the index (all three columns present)
EXPLAIN SELECT * FROM newslist WHERE htmlid = 254 AND pid = 1 AND id = 1;
-- Query that cannot use the index (leftmost column missing)
EXPLAIN SELECT * FROM newslist WHERE pid = 1 AND id = 1;
Output examples:
-- Using index (all three columns)
mysql> explain select * from newslist where htmlid=254 and pid=1 and id=1;
+----+-------------+----------+-------+-------------------+---------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+-------------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | newslist | const | PRIMARY,indexName | PRIMARY | 8 | const | 1 | |
+----+-------------+----------+-------+-------------------+---------+---------+-------+------+-------+
-- Not using index (leftmost column missing)
mysql> explain select * from newslist where pid=1 and id=1;
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | newslist | ALL | NULL | NULL | NULL | NULL | 15 | Using where |
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+
Row Locking in MySQL
InnoDB's row-level lock is achieved by locking index entries: the engine locks records by placing a lock on the index entry corresponding to the query. This means:
- InnoDB uses row-level locks only when the query retrieves data through an index condition.
- If the query does not use an index (e.g., a full table scan), InnoDB will fall back to table-level locking.
Reference: InnoDB Row Lock Implementation