Home > Notes > Content

MySQL Architecture Design: Core Concepts and Implementation Principles

Notes May 13 14

1. MySQL Functional Architecture

1.1 Three-Tier Architecture Overview

MySQL employs a three-tier architectural design that separates concerns effectively:

First Layer (Connection Services) The top layer encompasses services that are not exclusive to MySQL. These include connection handling, authentication, and seecurity management for client-server applications. This layer manages the initial TCP handshake, user authentication, and ongoing connection lifecycle.

Second Layer (SQL Processing Layer) This is the core component of MySQL, commonly referred to as the SQL Layer. Before MySQL processes any data, all operations pass through this layer, including:

Permission validation
SQL parsing and syntax analysis
Query execution planning and optimization
Query cache management
Built-in functions (date, time, math, encryption)
Stored procedures, triggers, and view processing

Third Layer (Storage Engine Layer) The bottom layer comprises various storage engines responsible for physical data storage and retrieval. Similar to Linux's diverse filesystem ecosystem, each storage engine has distinct advantages and limitations. The server interacts with storage engines through a standardized API that abstracts implementation differences, providing transparency to the query layer.

2. Storage Engines

2.1 MyISAM Engine

MyISAM was the default storage engine in earlier MySQL versions. Key characteristics include:

Transaction Support: Does not support transactions; each query executes atomically
Locking Mechanism: Implements table-level locking—operations lock the entire table
Index Structure: Uses non-clustered indexes where index files store pointers to data files. Secondary indexes mirror primary indexes without uniqueness requirements
Row Count Storage: Maintains total row count metadata
File Structure: Each MyISAM table consists of three files—index file (.MYI), table definition file (.frm), and data file (.MYD)

2.2 InnoDB Engine

InnoDB became the default engine starting MySQL 5.5 and provides robust enterprise features:

Transaction Support: Fully supports ACID transactions with all four isolation levels
Locking: Implements row-level locking with foreign key constraints, enabling concurrent write operations
Clustered Index: Primary key uses clustered index where index data stores actual row data. Secondary indexes store primary key values, requiring two index lookups for retrieval
Auto-increment Recommendation: Using auto-increment primary keys prevents page splits during insertions, maintaining B+ tree efficiency
Row Count: Does not store total row count
Tablespace Options: Can use shared tablespace (ibdata files) or independent tablespace (.ibd files per table)

2.3 MEMORY Engine (HEAP)

The MEMORY engine stores data in RAM, providing exceptional read/write speeds with significant limitations:

Volatility: Data disappears on database restart or crash—unsuitable for persistent data
Index Types: Defaults to hash indexes rather than B+ tree indexes
Use Cases: Primarily used for temporary tables storing intermediate query results

Setting B-Tree Indexes on Memory Tables

Memory tables can utilize B-Tree indexes when beneficial:

ALTER TABLE t1 ADD INDEX a_btree_index USING BTREE (id);

Memory Table Lock Granularity

Memory tables support only table-level locks, not row-level locks. A single update operation blocks all other operations on that table, making concurrent transaction handling inefficient.

Data Persistence Considerations

Memory tables lose all data on restart. In master-slave replication scenarios, this creates problems:

Slave restarts flush memory tables
Subsequent UPDATE attempts fail with "row not found" errors
This causes replication to stop, and after failover, clients observe data loss

Recommended Use Cases

Memory tables are suitable only for temporary data:

Not shared across threads (no concurrency concerns)
Data deletion on restart is acceptable
Slave temporary tables don't affect master user sessions

Example: Optimizing JOIN operations

-- Using InnoDB temporary table
CREATE TEMPORARY TABLE temp_t (id INT PRIMARY KEY, a INT, b INT, INDEX(b)) ENGINE=INNODB;
INSERT INTO temp_t SELECT * FROM t2 WHERE b >= 1 AND b <= 2000;
SELECT * FROM t1 JOIN temp_t ON (t1.b = temp_t.b);

-- Optimized with MEMORY engine
CREATE TEMPORARY TABLE temp_t (id INT PRIMARY KEY, a INT, b INT, INDEX(b)) ENGINE=MEMORY;
INSERT INTO temp_t SELECT * FROM t2 WHERE b >= 1 AND b <= 2000;
SELECT * FROM t1 JOIN temp_t ON (t1.b = temp_t.b);

The MEMORY version offers faster writes and hash-based index lookups for limited datasets.

3. InnoDB Logical Storage Structure

InnoDB organizes data in a tablespace containing segments, extents, and pages.

3.1 Data Pages

InnoDB reads and writes data in 16KB pages by default. When a query needs a specific row, the entire containing page loads into memory.

Page Split

When data inserts are sequential, indexes remain compact. Random insertions cause page splitting:

When a page reaches capacity and new data arrives, InnoDB allocates a new page and moves approximately half the original page's data to the new page, creating "holes" from deleted or updated records. These holes accumulate through extensive INSERT/UPDATE/DELETE operations.

3.2 Tablespace

With innodb_file_per_table enabled (default), each table has its own tablespace file (.ibd). Disabling this option stores all tables in the system tablespace (ibdata files).

System Tablespace Configuration:

innodb_data_file_path=ibdata1:12M:autoextend

Independent Tablespace Benefits:

Each table's data and indexes in separate files
Enables table movement between databases
Space can be reclaimed after bulk deletes

4. MySQL Disk Directory Structure

4.1 Main Directories

bin/: Executable files including client and server programs
data/: Default data directory containing database files
include/: Header files (mysql.h, mysqld_error.h)
lib/: Library files
share/: Character set files
support-files/: Startup scripts

4.2 Data Directory Contents

Database Storage

Creating a database creates a directory with the database name containing a db.opt file (deprecated in MySQL 8.0, storing charset and collation settings).

Table Structure Storage

InnoDB Tables:

Table structure stored in system tablespace (no .frm files in MySQL 8.0+)
Data and indexes in .ibd files or system tablespace

CREATE TABLE demo_innodb (
    id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT 'User ID',
    name VARCHAR(30) NOT NULL COMMENT 'User Name',
    PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='User Information';

4.3 Log Files

Binary Logs (binlog):

SHOW VARIABLES LIKE 'log_bin';

Records all data changes for replicatoin and point-in-time recovery
Parameters: max_binlog_size, binlog_expire_logs_seconds

Redo Logs:

MySQL 8.0.30+ uses #innodb_redoN files in #innodb_redo directory
Default 32 files distributed by innodb_redo_log_capacity
Files without _tmp suffix are active; _tmp suffix indicates available

Undo Logs: Stored in undo tablespaces, not separate disk files

5. InnoDB Memory Model

5.1 Buffer Pool

The Buffer Pool is a memory region bridging disk speed and CPU speed through caching:

Page Management:

When reading a page, it loads into the Buffer Pool ("fix")
Subsequent reads check Buffer Pool first (cache hit) or load from disk (cache miss)
Modified pages in Buffer Pool flush to disk periodically via Checkpoint mechanism

Cached Page Types: Index pages, data pages, undo pages, insert buffer, adaptive hash index, lock info, data dictionary

5.2 Buffer Pool Configuration

Buffer Pool Size:

SHOW VARIABLES LIKE '%innodb_buffer_pool_size%';

Recommended: 60-80% of available physical memory

Multiple Instances:

SHOW VARIABLES LIKE '%innodb_buffer_pool_instances%';

From InnoDB 1.0.x, multiple instances reduce contention and improve concurrency.

Memory Hit Rate:

SHOW ENGINE INNODB STATUS;

Look for "Buffer pool hit rate" - target 99%+ for production systems.

5.3 Buffer Pool Memory Recovery

InnoDB uses an LRU (Least Recently Used) algorithm to evict old pages when full.

Basic LRU Algorithm:

Recently accessed pages at list head
New accesses move pages to head
When full, LRU tail page evicts

Improved LRU Algorithm: InnoDB divides the LRU list into young (5/8) and old (3/8) regions:

Pages in young region stay cache-hot
Old region pages over 1 second old (innodb_old_blocks_time) move to head
Full table scans populate old region without affecting young region cache

This prevents large scans from evicting frequently-accessed data.

6. Partition Tables

6.1 What Are Partition Tables

CREATE TABLE `t` (
  `ftime` DATETIME NOT NULL,
  `c` INT(11) DEFAULT NULL,
  KEY (`ftime`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
PARTITION BY RANGE (YEAR(ftime))
(PARTITION p_2017 VALUES LESS THAN (2017) ENGINE = InnoDB,
 PARTITION p_2018 VALUES LESS THAN (2018) ENGINE = InnoDB,
 PARTITION p_2019 VALUES LESS THAN (2019) ENGINE = InnoDB,
 PARTITION p_others VALUES LESS THAN MAXVALUE ENGINE = InnoDB);

Each partition gets its own .ibd file. For the engine layer, these are separate tables; for the Server layer, it's one table.

6.2 Partition Table Limitations

Initial access opens all partitions—exceeds open_files_limit if >1000 partitions
Server layer treats as single table—DDL affects all partitions with shared MDL lock

6.3 Alternatives

Manual physical sharding by year
Sharding middleware (ShardingSphere)

7. SQL Execution Flow

7.1 Query Execution Process

For: SELECT * FROM T WHERE ID=10;

1. Connection Handler

Establishes TCP connection and authenticates user
Loads user permissions—subsequent commands use cached permissions
Long connections reduce connection overhead but accumulate memory
Periodic disconnection or mysql_reset_connection solves memory issues

2. Query Cache

Checks if statement was previously executed
Invalidated on any table update
Removed entirely in MySQL 8.0

3. Parser

Lexical analysis identifies keywords, table names, column names
Syntax analysis validates SQL syntax

4. Optimizer

Identifies possible indexes
Calculates full table scan vs. index scan costs
Determines optimal execution plan
Decides join order for multi-table queries

5. Executor

Opens table and checks permissions
For unindexed tables: iterates through rows, checking conditions
For indexed tables: uses index range scans
Accumulates matching rows in result set

Execution Order:

FROM -> ON -> JOIN -> WHERE -> GROUP BY -> HAVING 
-> SELECT -> DISTINCT -> UNION -> ORDER BY -> LIMIT

7.2 Update Statement Execution

Update statements follow query flow plus two critical logs:

Redo Log:

WAL (Write-Ahead Logging): writes to log first, disk later
Fixed-size circular buffer (default 4 files × 1GB)
Write position advances; checkpoint marks erasure point
Ensures crash recovery

Binary Log:

Records statement logic (not physical changes)
Used for replication and point-in-time recovery
Three formats: STATEMENT, ROW, MIXED

Two-Phase Commit:

Write to redo log (prepare state)
Write to binlog
Commit redo log (commit state)

This ensures binlog and data consistency after crashes.

7.3 Delete Statement Flow

DELETE only marks records as deletable (not physical removal). Deleted space becomes "holes" reusable by similar-range inserts.

7.4 DDL Statement Execution

Table Rebuild:

ALTER TABLE t ENGINE=InnoDB;

Online DDL Process:

Scan source table pages to temporary file
Log changes during copy (row log)
Apply logged changes to temporary file
Replace original with temporary file

MDL lock downgrades from write to read during copy, allowing concurrent DML.

7.5 ORDER BY Execution

Full-field Sorting:

Initialize sort_buffer with required fields
Fetch matching rows from index
Sort in memory or external merge sort
Return sorted results

Rowid Sorting (when rows are large):

Store only sort key and rowid in sort_buffer
Sort by key
Fetch actual rows using rowid

Index Optimization: Creating composite indexes eliminates sorting:

ALTER TABLE t ADD INDEX idx_city_name(city, name);

7.6 COUNT(*) Execution

MyISAM: Stores row count in metadata

InnoDB: Must scan rows due to MVCC visibility

Optimizer selects smallest index to scan
Count performance: COUNT(*) ≈ COUNT(1) > COUNT(id) > COUNT(column)

Optimization Options:

Use separate counter table with transactional updates
Redis caching (with consistency caveats)

7.7 JOIN Execution

Index Nested-Loop Join (NLJ):

Drives table by scanning outer table
Inner table uses index lookup
Best when index exists on join column
Complexity: N + N×2×log₂M

Block Nested-Loop Join (BNL):

No index available on inner table
Loads outer table rows into join_buffer
Scans entire inner table for each batch
Generally avoid this approach

Best Practices:

Use smaller table as driving table
Ensure indexes exist on join columns
Use BKA (Batched Key Access) optimization

7.8 UNION Execution

(SELECT 1000 AS f) UNION (SELECT id FROM t1 ORDER BY id DESC LIMIT 2);

Uses internal temporary table with unique index for deduplication.

UNION ALL: Skips deduplication, no temporary table needed.

7.9 GROUP BY Execution

Uses temporary table for grouping:

SELECT id%10 AS m, COUNT(*) AS c FROM t1 GROUP BY m;

Optimizations:

Add ORDER BY NULL if sorting unnecessary
Create indexes matching GROUP BY columns
Increase tmp_table_size for memory operations
Use SQL_BIG_RESULT for large datasets

7.10 DISTINCT Execution

SELECT DISTINCT a FROM t;
SELECT a FROM t GROUP BY a ORDER BY NULL;

Both create temporary table with unique index—semantically and performance-equivalant.

8. Core Mechanisms

8.1 WAL Mechanism

Write-Ahead Logging ensures durability:

Redo Log:

Prevents data loss from crashes
Converts random writes to sequential writes
Configurable persistence: innodb_flush_log_at_trx_commit
- 0: Log buffer only
- 1: Synchronous (safest)
- 2: Write to OS cache

Binlog:

Statement-based replication
Row-based replication (recommended)
sync_binlog parameter controls persistence

Undo Log:

Provides transaction rollback capability
Enables MVCC (Multi-Version Concurrency Control)

8.2 Change Buffer

Optimizes secondary index updates:

Write Process:

If page in memory: update directly
If page not in memory: record change in buffer
Log changes to redo log

Read Process:

When page accessed, merge changes from buffer
Generates correct page version

Merge Triggers:

Page accessed
Background threads
Database shutdown

Best Use Cases: Write-heavy, read-light workloads Avoid When: Updates immediately followed by queries

8.3 Crash-Safe Mechanism

Two-Phase Commit Ensures Consistency:

Prepare phase: write redo log (prepare)
Write binlog
Commit phase: write redo log (commit)

Recovery Rules:

If redo log complete → commit
If redo log prepare only → check binlog existence → commit or rollback

8.4 Flush Mechanism

When Flush Occurs:

Redo log full (write_pos catches checkpoint)
Buffer pool memory insufficient
System idle
Normal shutdown

Performance Impact Factors:

innodb_io_capacity: Disk capability setting
innodb_max_dirty_pages_pct: Dirty page threshold (default 75%)
innodb_flush_neighbors: SSD设置为0避免连锁flush

8.5 Temporary Tables

Internal Temporary Tables:

Created for UNION, UNION ALL, GROUP BY operations
Memory or disk-based (tmp_table_size limit)
Uses priority queue for small LIMIT sorts

User Temporary Tables:

CREATE TEMPORARY TABLE temp_t (id INT PRIMARY KEY);

Session-scoped visibility
Auto-cleanup on session end
Useful for complex joins and aggregations

8.6 Redo Log Write Mechanism

Three States:

In redo log buffer (process memory)
In page cache (OS memory)
On disk (persistent)

Write Strategies:

innodb_flush_log_at_trx_commit controls persistence timing
Group commit combines multiple transactions for IO efficiency
LSN (Log Sequence Number) tracks write positions

8.7 Binlog Write Mechanism

Process:

Write to binlog cache (per-thread allocation)
Transaction commit: write to page cache, fsync to disk

sync_binlog Settings:

0: Write only, no fsync
1: Full sync on each commit
N: Batch sync every N commits

8.8 Auto-Increment Primary Key

Why Auto-Increment?

Sequential insertions avoid page splits
Indexes remain compact

Storage:

MyISAM: In data file
InnoDB 5.7 and earlier: In memory only (lost on restart)
InnoDB 8.0+: Persisted in redo log

Non-Consecutive Reasons:

Unique key conflicts
Transaction rollbacks
Batch inserts (exponential allocation)

Lock Modes:

0: Statement-level lock (legacy)
1: Mixed (simple INSERT: immediate; bulk: statement-level)
2: All statement-level locks released immediately

Recommendation for Bulk Inserts:

SET GLOBAL innodb_autoinc_lock_mode = 2;
SET GLOBAL binlog_format = 'ROW';

This ensures both performance and data consistency in replication scenarios.

References

MySQL Documentation and Source Code
"MySQL High Performance" - Optimizing Backups, Indexes, and Queries
"High Performance MySQL" - Optimization, Backups, and Replication

Back to List

Prev: Architecture and Implementation Strategies for High-Performance Caching

Next: TCP Connection Lifecycle: Handshakes, Termination, and State Transitions

Fading Coder