Deep Dive into InnoDB: Architecture, Transaction Internals, and MVCC
- InnoDB Storage Engine Overview
InnoDB serves as the default storage engine for MySQL, renowned for its ACID-compliant transaction support, crash recovery, and high concurrency capabilities. Unlike engines that treat indices and data separate, InnoDB utilizes Clustered Indices, meaning the data is stored physically alongside the primary key index.
1.1 Logical Storage Hierarchy
- Tablespace: The top-level logical container for table data and indexes. With
innodb_file_per_tableenabled (default in 8.0), each table resides in its own.ibdfile. - Segment: Manages logical groupings, including Data Segments (leaf nodes of the B+ tree), Index Segments (non-leaf nodes), and Rollback Segments.
- Extent: A block of continuous space, typically 1MB (64 pages of 16KB).
- Page: The fundamental unit of disk I/O, fixed at 16KB.
- Row: The storage unit for records. Every row contains hidden metadata:
DB_TRX_ID(last transaction ID) andDB_ROLL_PTR(pointer to the undo log record).
1.2 Engine Architecture
The architecture is split between Memory Structures and Disk Structures.
Memory Components
- Buffer Pool: The core cache containing data pages, index pages, undo entries, and lock info. It uses an LRU-based algorithm to manage "Dirty Pages" (modified data) before flushing them to disk.
- Change Buffer: Optimized for non-unique secondary indexes. Instead of random disk reads/writes, changes are buffered in memory and merged asynchronous.
- Adaptive Hash Index: Dynamically created for frequently accessed keys to speed up constant-time lookups beyond B+ tree traversal.
- Log Buffer: Temporarily holds log data (Redo/Undo) before committing it to persistent storage.
Background Threads
- Master Thread: Manages core tasks like flushing dirty pages, checkpointing, and purging undo logs.
- IO Thread: Handles asynchronous disk I/O requests.
- Purge Thread: Recycles undo log records once transactions are finalized.
- Page Cleaner Thread: Offloads the burden of flushing dirty pages from the Master Thread.
1.3 Transaction Principles
InnoDB ensures transaction reliability through two primary logs:
- Redo Log: Ensures Durability using Write-Ahead Logging (WAL). It records physical modifications to pages. If a crash occurs before the Buffer Pool flushes to disk, the redo log allows the system to replay operations and maintain consistency.
- Undo Log: Ensures Atomicity and facilitates MVCC. It stores the inverse logical operations (e.g., an
INSERTundo is aDELETE). This allows the system to roll back transactions and provides snapshot capability.
1.4 Multi-Version Concurrency Control (MVCC)
MVCC enables high-performance non-blocking reads by maintaining versions of data.
Mechanism
- Version Chain: Every update creates a new entry in the undo log, linked via the
DB_ROLL_PTR. This chain allows the engine to traverse back to previous states of a row. - ReadView: A point-in-time snapshot of active transactions. It includes:
m_ids: List of currently active (uncommitted) transaction IDs.min_trx_id: The lowest ID in the active list.max_trx_id: The next ID to be assigned.creator_trx_id: The ID of the transaction that created the view.
Isolation Levels & Visibility
- Read Committed (RC): A new ReadView is generated for every snapshot read (SELECT), allowing the transaction to see the latest committed state.
- Repeatable Read (RR): A ReadView is generated only on the first snapshot read within the transaction. This ensures consistent reads throughout the entire transaction lifecycle.