MySQL Architecture Design: Core Concepts and Implementation Principles
1. MySQL Functional Architecture
1.1 Three-Tier Architecture Overview
MySQL employs a three-tier architectural design that separates concerns effectively:
First Layer (Connection Services) The top layer encompasses services that are not exclusive to MySQL. These include connection handling, authentication, and seecurity management for client-server applications. This layer manages the initial TCP handshake, user authentication, and ongoing connection lifecycle.
Second Layer (SQL Processing Layer) This is the core component of MySQL, commonly referred to as the SQL Layer. Before MySQL processes any data, all operations pass through this layer, including:
- Permission validation
- SQL parsing and syntax analysis
- Query execution planning and optimization
- Query cache management
- Built-in functions (date, time, math, encryption)
- Stored procedures, triggers, and view processing
Third Layer (Storage Engine Layer) The bottom layer comprises various storage engines responsible for physical data storage and retrieval. Similar to Linux's diverse filesystem ecosystem, each storage engine has distinct advantages and limitations. The server interacts with storage engines through a standardized API that abstracts implementation differences, providing transparency to the query layer.
2. Storage Engines
2.1 MyISAM Engine
MyISAM was the default storage engine in earlier MySQL versions. Key characteristics include:
- Transaction Support: Does not support transactions; each query executes atomically
- Locking Mechanism: Implements table-level locking—operations lock the entire table
- Index Structure: Uses non-clustered indexes where index files store pointers to data files. Secondary indexes mirror primary indexes without uniqueness requirements
- Row Count Storage: Maintains total row count metadata
- File Structure: Each MyISAM table consists of three files—index file (.MYI), table definition file (.frm), and data file (.MYD)
2.2 InnoDB Engine
InnoDB became the default engine starting MySQL 5.5 and provides robust enterprise features:
- Transaction Support: Fully supports ACID transactions with all four isolation levels
- Locking: Implements row-level locking with foreign key constraints, enabling concurrent write operations
- Clustered Index: Primary key uses clustered index where index data stores actual row data. Secondary indexes store primary key values, requiring two index lookups for retrieval
- Auto-increment Recommendation: Using auto-increment primary keys prevents page splits during insertions, maintaining B+ tree efficiency
- Row Count: Does not store total row count
- Tablespace Options: Can use shared tablespace (ibdata files) or independent tablespace (.ibd files per table)
2.3 MEMORY Engine (HEAP)
The MEMORY engine stores data in RAM, providing exceptional read/write speeds with significant limitations:
- Volatility: Data disappears on database restart or crash—unsuitable for persistent data
- Index Types: Defaults to hash indexes rather than B+ tree indexes
- Use Cases: Primarily used for temporary tables storing intermediate query results
Setting B-Tree Indexes on Memory Tables
Memory tables can utilize B-Tree indexes when beneficial:
ALTER TABLE t1 ADD INDEX a_btree_index USING BTREE (id);
Memory Table Lock Granularity
Memory tables support only table-level locks, not row-level locks. A single update operation blocks all other operations on that table, making concurrent transaction handling inefficient.
Data Persistence Considerations
Memory tables lose all data on restart. In master-slave replication scenarios, this creates problems:
- Slave restarts flush memory tables
- Subsequent UPDATE attempts fail with "row not found" errors
- This causes replication to stop, and after failover, clients observe data loss
Recommended Use Cases
Memory tables are suitable only for temporary data:
- Not shared across threads (no concurrency concerns)
- Data deletion on restart is acceptable
- Slave temporary tables don't affect master user sessions
Example: Optimizing JOIN operations
-- Using InnoDB temporary table
CREATE TEMPORARY TABLE temp_t (id INT PRIMARY KEY, a INT, b INT, INDEX(b)) ENGINE=INNODB;
INSERT INTO temp_t SELECT * FROM t2 WHERE b >= 1 AND b <= 2000;
SELECT * FROM t1 JOIN temp_t ON (t1.b = temp_t.b);
-- Optimized with MEMORY engine
CREATE TEMPORARY TABLE temp_t (id INT PRIMARY KEY, a INT, b INT, INDEX(b)) ENGINE=MEMORY;
INSERT INTO temp_t SELECT * FROM t2 WHERE b >= 1 AND b <= 2000;
SELECT * FROM t1 JOIN temp_t ON (t1.b = temp_t.b);
The MEMORY version offers faster writes and hash-based index lookups for limited datasets.
3. InnoDB Logical Storage Structure
InnoDB organizes data in a tablespace containing segments, extents, and pages.
3.1 Data Pages
InnoDB reads and writes data in 16KB pages by default. When a query needs a specific row, the entire containing page loads into memory.
Page Split
When data inserts are sequential, indexes remain compact. Random insertions cause page splitting:
When a page reaches capacity and new data arrives, InnoDB allocates a new page and moves approximately half the original page's data to the new page, creating "holes" from deleted or updated records. These holes accumulate through extensive INSERT/UPDATE/DELETE operations.
3.2 Tablespace
With innodb_file_per_table enabled (default), each table has its own tablespace file (.ibd). Disabling this option stores all tables in the system tablespace (ibdata files).
System Tablespace Configuration:
innodb_data_file_path=ibdata1:12M:autoextend
Independent Tablespace Benefits:
- Each table's data and indexes in separate files
- Enables table movement between databases
- Space can be reclaimed after bulk deletes
4. MySQL Disk Directory Structure
4.1 Main Directories
- bin/: Executable files including client and server programs
- data/: Default data directory containing database files
- include/: Header files (mysql.h, mysqld_error.h)
- lib/: Library files
- share/: Character set files
- support-files/: Startup scripts
4.2 Data Directory Contents
Database Storage
Creating a database creates a directory with the database name containing a db.opt file (deprecated in MySQL 8.0, storing charset and collation settings).
Table Structure Storage
InnoDB Tables:
- Table structure stored in system tablespace (no .frm files in MySQL 8.0+)
- Data and indexes in .ibd files or system tablespace
CREATE TABLE demo_innodb (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT 'User ID',
name VARCHAR(30) NOT NULL COMMENT 'User Name',
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='User Information';
4.3 Log Files
Binary Logs (binlog):
SHOW VARIABLES LIKE 'log_bin';
- Records all data changes for replicatoin and point-in-time recovery
- Parameters:
max_binlog_size,binlog_expire_logs_seconds
Redo Logs:
- MySQL 8.0.30+ uses
#innodb_redoNfiles in#innodb_redodirectory - Default 32 files distributed by
innodb_redo_log_capacity - Files without
_tmpsuffix are active;_tmpsuffix indicates available
Undo Logs: Stored in undo tablespaces, not separate disk files
5. InnoDB Memory Model
5.1 Buffer Pool
The Buffer Pool is a memory region bridging disk speed and CPU speed through caching:
Page Management:
- When reading a page, it loads into the Buffer Pool ("fix")
- Subsequent reads check Buffer Pool first (cache hit) or load from disk (cache miss)
- Modified pages in Buffer Pool flush to disk periodically via Checkpoint mechanism
Cached Page Types: Index pages, data pages, undo pages, insert buffer, adaptive hash index, lock info, data dictionary
5.2 Buffer Pool Configuration
Buffer Pool Size:
SHOW VARIABLES LIKE '%innodb_buffer_pool_size%';
Recommended: 60-80% of available physical memory
Multiple Instances:
SHOW VARIABLES LIKE '%innodb_buffer_pool_instances%';
From InnoDB 1.0.x, multiple instances reduce contention and improve concurrency.
Memory Hit Rate:
SHOW ENGINE INNODB STATUS;
Look for "Buffer pool hit rate" - target 99%+ for production systems.
5.3 Buffer Pool Memory Recovery
InnoDB uses an LRU (Least Recently Used) algorithm to evict old pages when full.
Basic LRU Algorithm:
- Recently accessed pages at list head
- New accesses move pages to head
- When full, LRU tail page evicts
Improved LRU Algorithm: InnoDB divides the LRU list into young (5/8) and old (3/8) regions:
- Pages in young region stay cache-hot
- Old region pages over 1 second old (innodb_old_blocks_time) move to head
- Full table scans populate old region without affecting young region cache
This prevents large scans from evicting frequently-accessed data.
6. Partition Tables
6.1 What Are Partition Tables
CREATE TABLE `t` (
`ftime` DATETIME NOT NULL,
`c` INT(11) DEFAULT NULL,
KEY (`ftime`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
PARTITION BY RANGE (YEAR(ftime))
(PARTITION p_2017 VALUES LESS THAN (2017) ENGINE = InnoDB,
PARTITION p_2018 VALUES LESS THAN (2018) ENGINE = InnoDB,
PARTITION p_2019 VALUES LESS THAN (2019) ENGINE = InnoDB,
PARTITION p_others VALUES LESS THAN MAXVALUE ENGINE = InnoDB);
Each partition gets its own .ibd file. For the engine layer, these are separate tables; for the Server layer, it's one table.
6.2 Partition Table Limitations
- Initial access opens all partitions—exceeds open_files_limit if >1000 partitions
- Server layer treats as single table—DDL affects all partitions with shared MDL lock
6.3 Alternatives
- Manual physical sharding by year
- Sharding middleware (ShardingSphere)
7. SQL Execution Flow
7.1 Query Execution Process
For: SELECT * FROM T WHERE ID=10;
1. Connection Handler
- Establishes TCP connection and authenticates user
- Loads user permissions—subsequent commands use cached permissions
- Long connections reduce connection overhead but accumulate memory
- Periodic disconnection or
mysql_reset_connectionsolves memory issues
2. Query Cache
- Checks if statement was previously executed
- Invalidated on any table update
- Removed entirely in MySQL 8.0
3. Parser
- Lexical analysis identifies keywords, table names, column names
- Syntax analysis validates SQL syntax
4. Optimizer
- Identifies possible indexes
- Calculates full table scan vs. index scan costs
- Determines optimal execution plan
- Decides join order for multi-table queries
5. Executor
- Opens table and checks permissions
- For unindexed tables: iterates through rows, checking conditions
- For indexed tables: uses index range scans
- Accumulates matching rows in result set
Execution Order:
FROM -> ON -> JOIN -> WHERE -> GROUP BY -> HAVING
-> SELECT -> DISTINCT -> UNION -> ORDER BY -> LIMIT
7.2 Update Statement Execution
Update statements follow query flow plus two critical logs:
Redo Log:
- WAL (Write-Ahead Logging): writes to log first, disk later
- Fixed-size circular buffer (default 4 files × 1GB)
- Write position advances; checkpoint marks erasure point
- Ensures crash recovery
Binary Log:
- Records statement logic (not physical changes)
- Used for replication and point-in-time recovery
- Three formats: STATEMENT, ROW, MIXED
Two-Phase Commit:
- Write to redo log (prepare state)
- Write to binlog
- Commit redo log (commit state)
This ensures binlog and data consistency after crashes.
7.3 Delete Statement Flow
DELETE only marks records as deletable (not physical removal). Deleted space becomes "holes" reusable by similar-range inserts.
7.4 DDL Statement Execution
Table Rebuild:
ALTER TABLE t ENGINE=InnoDB;
Online DDL Process:
- Scan source table pages to temporary file
- Log changes during copy (row log)
- Apply logged changes to temporary file
- Replace original with temporary file
MDL lock downgrades from write to read during copy, allowing concurrent DML.
7.5 ORDER BY Execution
Full-field Sorting:
- Initialize sort_buffer with required fields
- Fetch matching rows from index
- Sort in memory or external merge sort
- Return sorted results
Rowid Sorting (when rows are large):
- Store only sort key and rowid in sort_buffer
- Sort by key
- Fetch actual rows using rowid
Index Optimization: Creating composite indexes eliminates sorting:
ALTER TABLE t ADD INDEX idx_city_name(city, name);
7.6 COUNT(*) Execution
MyISAM: Stores row count in metadata
InnoDB: Must scan rows due to MVCC visibility
- Optimizer selects smallest index to scan
- Count performance: COUNT(*) ≈ COUNT(1) > COUNT(id) > COUNT(column)
Optimization Options:
- Use separate counter table with transactional updates
- Redis caching (with consistency caveats)
7.7 JOIN Execution
Index Nested-Loop Join (NLJ):
- Drives table by scanning outer table
- Inner table uses index lookup
- Best when index exists on join column
- Complexity: N + N×2×log₂M
Block Nested-Loop Join (BNL):
- No index available on inner table
- Loads outer table rows into join_buffer
- Scans entire inner table for each batch
- Generally avoid this approach
Best Practices:
- Use smaller table as driving table
- Ensure indexes exist on join columns
- Use BKA (Batched Key Access) optimization
7.8 UNION Execution
(SELECT 1000 AS f) UNION (SELECT id FROM t1 ORDER BY id DESC LIMIT 2);
Uses internal temporary table with unique index for deduplication.
UNION ALL: Skips deduplication, no temporary table needed.
7.9 GROUP BY Execution
Uses temporary table for grouping:
SELECT id%10 AS m, COUNT(*) AS c FROM t1 GROUP BY m;
Optimizations:
- Add ORDER BY NULL if sorting unnecessary
- Create indexes matching GROUP BY columns
- Increase tmp_table_size for memory operations
- Use SQL_BIG_RESULT for large datasets
7.10 DISTINCT Execution
SELECT DISTINCT a FROM t;
SELECT a FROM t GROUP BY a ORDER BY NULL;
Both create temporary table with unique index—semantically and performance-equivalant.
8. Core Mechanisms
8.1 WAL Mechanism
Write-Ahead Logging ensures durability:
Redo Log:
- Prevents data loss from crashes
- Converts random writes to sequential writes
- Configurable persistence: innodb_flush_log_at_trx_commit
- 0: Log buffer only
- 1: Synchronous (safest)
- 2: Write to OS cache
Binlog:
- Statement-based replication
- Row-based replication (recommended)
- sync_binlog parameter controls persistence
Undo Log:
- Provides transaction rollback capability
- Enables MVCC (Multi-Version Concurrency Control)
8.2 Change Buffer
Optimizes secondary index updates:
Write Process:
- If page in memory: update directly
- If page not in memory: record change in buffer
- Log changes to redo log
Read Process:
- When page accessed, merge changes from buffer
- Generates correct page version
Merge Triggers:
- Page accessed
- Background threads
- Database shutdown
Best Use Cases: Write-heavy, read-light workloads Avoid When: Updates immediately followed by queries
8.3 Crash-Safe Mechanism
Two-Phase Commit Ensures Consistency:
- Prepare phase: write redo log (prepare)
- Write binlog
- Commit phase: write redo log (commit)
Recovery Rules:
- If redo log complete → commit
- If redo log prepare only → check binlog existence → commit or rollback
8.4 Flush Mechanism
When Flush Occurs:
- Redo log full (write_pos catches checkpoint)
- Buffer pool memory insufficient
- System idle
- Normal shutdown
Performance Impact Factors:
- innodb_io_capacity: Disk capability setting
- innodb_max_dirty_pages_pct: Dirty page threshold (default 75%)
- innodb_flush_neighbors: SSD设置为0避免连锁flush
8.5 Temporary Tables
Internal Temporary Tables:
- Created for UNION, UNION ALL, GROUP BY operations
- Memory or disk-based (tmp_table_size limit)
- Uses priority queue for small LIMIT sorts
User Temporary Tables:
CREATE TEMPORARY TABLE temp_t (id INT PRIMARY KEY);
- Session-scoped visibility
- Auto-cleanup on session end
- Useful for complex joins and aggregations
8.6 Redo Log Write Mechanism
Three States:
- In redo log buffer (process memory)
- In page cache (OS memory)
- On disk (persistent)
Write Strategies:
- innodb_flush_log_at_trx_commit controls persistence timing
- Group commit combines multiple transactions for IO efficiency
- LSN (Log Sequence Number) tracks write positions
8.7 Binlog Write Mechanism
Process:
- Write to binlog cache (per-thread allocation)
- Transaction commit: write to page cache, fsync to disk
sync_binlog Settings:
- 0: Write only, no fsync
- 1: Full sync on each commit
- N: Batch sync every N commits
8.8 Auto-Increment Primary Key
Why Auto-Increment?
- Sequential insertions avoid page splits
- Indexes remain compact
Storage:
- MyISAM: In data file
- InnoDB 5.7 and earlier: In memory only (lost on restart)
- InnoDB 8.0+: Persisted in redo log
Non-Consecutive Reasons:
- Unique key conflicts
- Transaction rollbacks
- Batch inserts (exponential allocation)
Lock Modes:
- 0: Statement-level lock (legacy)
- 1: Mixed (simple INSERT: immediate; bulk: statement-level)
- 2: All statement-level locks released immediately
Recommendation for Bulk Inserts:
SET GLOBAL innodb_autoinc_lock_mode = 2;
SET GLOBAL binlog_format = 'ROW';
This ensures both performance and data consistency in replication scenarios.
References
- MySQL Documentation and Source Code
- "MySQL High Performance" - Optimizing Backups, Indexes, and Queries
- "High Performance MySQL" - Optimization, Backups, and Replication