Fading Coder

One Final Commit for the Last Sprint

Home > Notes > Content

MySQL Architecture Design: Core Concepts and Implementation Principles

Notes May 13 1

1. MySQL Functional Architecture

1.1 Three-Tier Architecture Overview

MySQL employs a three-tier architectural design that separates concerns effectively:

First Layer (Connection Services) The top layer encompasses services that are not exclusive to MySQL. These include connection handling, authentication, and seecurity management for client-server applications. This layer manages the initial TCP handshake, user authentication, and ongoing connection lifecycle.

Second Layer (SQL Processing Layer) This is the core component of MySQL, commonly referred to as the SQL Layer. Before MySQL processes any data, all operations pass through this layer, including:

  • Permission validation
  • SQL parsing and syntax analysis
  • Query execution planning and optimization
  • Query cache management
  • Built-in functions (date, time, math, encryption)
  • Stored procedures, triggers, and view processing

Third Layer (Storage Engine Layer) The bottom layer comprises various storage engines responsible for physical data storage and retrieval. Similar to Linux's diverse filesystem ecosystem, each storage engine has distinct advantages and limitations. The server interacts with storage engines through a standardized API that abstracts implementation differences, providing transparency to the query layer.

2. Storage Engines

2.1 MyISAM Engine

MyISAM was the default storage engine in earlier MySQL versions. Key characteristics include:

  1. Transaction Support: Does not support transactions; each query executes atomically
  2. Locking Mechanism: Implements table-level locking—operations lock the entire table
  3. Index Structure: Uses non-clustered indexes where index files store pointers to data files. Secondary indexes mirror primary indexes without uniqueness requirements
  4. Row Count Storage: Maintains total row count metadata
  5. File Structure: Each MyISAM table consists of three files—index file (.MYI), table definition file (.frm), and data file (.MYD)

2.2 InnoDB Engine

InnoDB became the default engine starting MySQL 5.5 and provides robust enterprise features:

  1. Transaction Support: Fully supports ACID transactions with all four isolation levels
  2. Locking: Implements row-level locking with foreign key constraints, enabling concurrent write operations
  3. Clustered Index: Primary key uses clustered index where index data stores actual row data. Secondary indexes store primary key values, requiring two index lookups for retrieval
  4. Auto-increment Recommendation: Using auto-increment primary keys prevents page splits during insertions, maintaining B+ tree efficiency
  5. Row Count: Does not store total row count
  6. Tablespace Options: Can use shared tablespace (ibdata files) or independent tablespace (.ibd files per table)

2.3 MEMORY Engine (HEAP)

The MEMORY engine stores data in RAM, providing exceptional read/write speeds with significant limitations:

  1. Volatility: Data disappears on database restart or crash—unsuitable for persistent data
  2. Index Types: Defaults to hash indexes rather than B+ tree indexes
  3. Use Cases: Primarily used for temporary tables storing intermediate query results

Setting B-Tree Indexes on Memory Tables

Memory tables can utilize B-Tree indexes when beneficial:

ALTER TABLE t1 ADD INDEX a_btree_index USING BTREE (id);

Memory Table Lock Granularity

Memory tables support only table-level locks, not row-level locks. A single update operation blocks all other operations on that table, making concurrent transaction handling inefficient.

Data Persistence Considerations

Memory tables lose all data on restart. In master-slave replication scenarios, this creates problems:

  • Slave restarts flush memory tables
  • Subsequent UPDATE attempts fail with "row not found" errors
  • This causes replication to stop, and after failover, clients observe data loss

Recommended Use Cases

Memory tables are suitable only for temporary data:

  • Not shared across threads (no concurrency concerns)
  • Data deletion on restart is acceptable
  • Slave temporary tables don't affect master user sessions

Example: Optimizing JOIN operations

-- Using InnoDB temporary table
CREATE TEMPORARY TABLE temp_t (id INT PRIMARY KEY, a INT, b INT, INDEX(b)) ENGINE=INNODB;
INSERT INTO temp_t SELECT * FROM t2 WHERE b >= 1 AND b <= 2000;
SELECT * FROM t1 JOIN temp_t ON (t1.b = temp_t.b);

-- Optimized with MEMORY engine
CREATE TEMPORARY TABLE temp_t (id INT PRIMARY KEY, a INT, b INT, INDEX(b)) ENGINE=MEMORY;
INSERT INTO temp_t SELECT * FROM t2 WHERE b >= 1 AND b <= 2000;
SELECT * FROM t1 JOIN temp_t ON (t1.b = temp_t.b);

The MEMORY version offers faster writes and hash-based index lookups for limited datasets.

3. InnoDB Logical Storage Structure

InnoDB organizes data in a tablespace containing segments, extents, and pages.

3.1 Data Pages

InnoDB reads and writes data in 16KB pages by default. When a query needs a specific row, the entire containing page loads into memory.

Page Split

When data inserts are sequential, indexes remain compact. Random insertions cause page splitting:

When a page reaches capacity and new data arrives, InnoDB allocates a new page and moves approximately half the original page's data to the new page, creating "holes" from deleted or updated records. These holes accumulate through extensive INSERT/UPDATE/DELETE operations.

3.2 Tablespace

With innodb_file_per_table enabled (default), each table has its own tablespace file (.ibd). Disabling this option stores all tables in the system tablespace (ibdata files).

System Tablespace Configuration:

innodb_data_file_path=ibdata1:12M:autoextend

Independent Tablespace Benefits:

  • Each table's data and indexes in separate files
  • Enables table movement between databases
  • Space can be reclaimed after bulk deletes

4. MySQL Disk Directory Structure

4.1 Main Directories

  • bin/: Executable files including client and server programs
  • data/: Default data directory containing database files
  • include/: Header files (mysql.h, mysqld_error.h)
  • lib/: Library files
  • share/: Character set files
  • support-files/: Startup scripts

4.2 Data Directory Contents

Database Storage

Creating a database creates a directory with the database name containing a db.opt file (deprecated in MySQL 8.0, storing charset and collation settings).

Table Structure Storage

InnoDB Tables:

  • Table structure stored in system tablespace (no .frm files in MySQL 8.0+)
  • Data and indexes in .ibd files or system tablespace
CREATE TABLE demo_innodb (
    id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT 'User ID',
    name VARCHAR(30) NOT NULL COMMENT 'User Name',
    PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='User Information';

4.3 Log Files

Binary Logs (binlog):

SHOW VARIABLES LIKE 'log_bin';
  • Records all data changes for replicatoin and point-in-time recovery
  • Parameters: max_binlog_size, binlog_expire_logs_seconds

Redo Logs:

  • MySQL 8.0.30+ uses #innodb_redoN files in #innodb_redo directory
  • Default 32 files distributed by innodb_redo_log_capacity
  • Files without _tmp suffix are active; _tmp suffix indicates available

Undo Logs: Stored in undo tablespaces, not separate disk files

5. InnoDB Memory Model

5.1 Buffer Pool

The Buffer Pool is a memory region bridging disk speed and CPU speed through caching:

Page Management:

  • When reading a page, it loads into the Buffer Pool ("fix")
  • Subsequent reads check Buffer Pool first (cache hit) or load from disk (cache miss)
  • Modified pages in Buffer Pool flush to disk periodically via Checkpoint mechanism

Cached Page Types: Index pages, data pages, undo pages, insert buffer, adaptive hash index, lock info, data dictionary

5.2 Buffer Pool Configuration

Buffer Pool Size:

SHOW VARIABLES LIKE '%innodb_buffer_pool_size%';

Recommended: 60-80% of available physical memory

Multiple Instances:

SHOW VARIABLES LIKE '%innodb_buffer_pool_instances%';

From InnoDB 1.0.x, multiple instances reduce contention and improve concurrency.

Memory Hit Rate:

SHOW ENGINE INNODB STATUS;

Look for "Buffer pool hit rate" - target 99%+ for production systems.

5.3 Buffer Pool Memory Recovery

InnoDB uses an LRU (Least Recently Used) algorithm to evict old pages when full.

Basic LRU Algorithm:

  1. Recently accessed pages at list head
  2. New accesses move pages to head
  3. When full, LRU tail page evicts

Improved LRU Algorithm: InnoDB divides the LRU list into young (5/8) and old (3/8) regions:

  • Pages in young region stay cache-hot
  • Old region pages over 1 second old (innodb_old_blocks_time) move to head
  • Full table scans populate old region without affecting young region cache

This prevents large scans from evicting frequently-accessed data.

6. Partition Tables

6.1 What Are Partition Tables

CREATE TABLE `t` (
  `ftime` DATETIME NOT NULL,
  `c` INT(11) DEFAULT NULL,
  KEY (`ftime`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
PARTITION BY RANGE (YEAR(ftime))
(PARTITION p_2017 VALUES LESS THAN (2017) ENGINE = InnoDB,
 PARTITION p_2018 VALUES LESS THAN (2018) ENGINE = InnoDB,
 PARTITION p_2019 VALUES LESS THAN (2019) ENGINE = InnoDB,
 PARTITION p_others VALUES LESS THAN MAXVALUE ENGINE = InnoDB);

Each partition gets its own .ibd file. For the engine layer, these are separate tables; for the Server layer, it's one table.

6.2 Partition Table Limitations

  1. Initial access opens all partitions—exceeds open_files_limit if >1000 partitions
  2. Server layer treats as single table—DDL affects all partitions with shared MDL lock

6.3 Alternatives

  • Manual physical sharding by year
  • Sharding middleware (ShardingSphere)

7. SQL Execution Flow

7.1 Query Execution Process

For: SELECT * FROM T WHERE ID=10;

1. Connection Handler

  • Establishes TCP connection and authenticates user
  • Loads user permissions—subsequent commands use cached permissions
  • Long connections reduce connection overhead but accumulate memory
  • Periodic disconnection or mysql_reset_connection solves memory issues

2. Query Cache

  • Checks if statement was previously executed
  • Invalidated on any table update
  • Removed entirely in MySQL 8.0

3. Parser

  • Lexical analysis identifies keywords, table names, column names
  • Syntax analysis validates SQL syntax

4. Optimizer

  • Identifies possible indexes
  • Calculates full table scan vs. index scan costs
  • Determines optimal execution plan
  • Decides join order for multi-table queries

5. Executor

  • Opens table and checks permissions
  • For unindexed tables: iterates through rows, checking conditions
  • For indexed tables: uses index range scans
  • Accumulates matching rows in result set

Execution Order:

FROM -> ON -> JOIN -> WHERE -> GROUP BY -> HAVING 
-> SELECT -> DISTINCT -> UNION -> ORDER BY -> LIMIT

7.2 Update Statement Execution

Update statements follow query flow plus two critical logs:

Redo Log:

  • WAL (Write-Ahead Logging): writes to log first, disk later
  • Fixed-size circular buffer (default 4 files × 1GB)
  • Write position advances; checkpoint marks erasure point
  • Ensures crash recovery

Binary Log:

  • Records statement logic (not physical changes)
  • Used for replication and point-in-time recovery
  • Three formats: STATEMENT, ROW, MIXED

Two-Phase Commit:

  1. Write to redo log (prepare state)
  2. Write to binlog
  3. Commit redo log (commit state)

This ensures binlog and data consistency after crashes.

7.3 Delete Statement Flow

DELETE only marks records as deletable (not physical removal). Deleted space becomes "holes" reusable by similar-range inserts.

7.4 DDL Statement Execution

Table Rebuild:

ALTER TABLE t ENGINE=InnoDB;

Online DDL Process:

  1. Scan source table pages to temporary file
  2. Log changes during copy (row log)
  3. Apply logged changes to temporary file
  4. Replace original with temporary file

MDL lock downgrades from write to read during copy, allowing concurrent DML.

7.5 ORDER BY Execution

Full-field Sorting:

  1. Initialize sort_buffer with required fields
  2. Fetch matching rows from index
  3. Sort in memory or external merge sort
  4. Return sorted results

Rowid Sorting (when rows are large):

  1. Store only sort key and rowid in sort_buffer
  2. Sort by key
  3. Fetch actual rows using rowid

Index Optimization: Creating composite indexes eliminates sorting:

ALTER TABLE t ADD INDEX idx_city_name(city, name);

7.6 COUNT(*) Execution

MyISAM: Stores row count in metadata

InnoDB: Must scan rows due to MVCC visibility

  • Optimizer selects smallest index to scan
  • Count performance: COUNT(*) ≈ COUNT(1) > COUNT(id) > COUNT(column)

Optimization Options:

  • Use separate counter table with transactional updates
  • Redis caching (with consistency caveats)

7.7 JOIN Execution

Index Nested-Loop Join (NLJ):

  • Drives table by scanning outer table
  • Inner table uses index lookup
  • Best when index exists on join column
  • Complexity: N + N×2×log₂M

Block Nested-Loop Join (BNL):

  • No index available on inner table
  • Loads outer table rows into join_buffer
  • Scans entire inner table for each batch
  • Generally avoid this approach

Best Practices:

  • Use smaller table as driving table
  • Ensure indexes exist on join columns
  • Use BKA (Batched Key Access) optimization

7.8 UNION Execution

(SELECT 1000 AS f) UNION (SELECT id FROM t1 ORDER BY id DESC LIMIT 2);

Uses internal temporary table with unique index for deduplication.

UNION ALL: Skips deduplication, no temporary table needed.

7.9 GROUP BY Execution

Uses temporary table for grouping:

SELECT id%10 AS m, COUNT(*) AS c FROM t1 GROUP BY m;

Optimizations:

  • Add ORDER BY NULL if sorting unnecessary
  • Create indexes matching GROUP BY columns
  • Increase tmp_table_size for memory operations
  • Use SQL_BIG_RESULT for large datasets

7.10 DISTINCT Execution

SELECT DISTINCT a FROM t;
SELECT a FROM t GROUP BY a ORDER BY NULL;

Both create temporary table with unique index—semantically and performance-equivalant.

8. Core Mechanisms

8.1 WAL Mechanism

Write-Ahead Logging ensures durability:

Redo Log:

  • Prevents data loss from crashes
  • Converts random writes to sequential writes
  • Configurable persistence: innodb_flush_log_at_trx_commit
    • 0: Log buffer only
    • 1: Synchronous (safest)
    • 2: Write to OS cache

Binlog:

  • Statement-based replication
  • Row-based replication (recommended)
  • sync_binlog parameter controls persistence

Undo Log:

  • Provides transaction rollback capability
  • Enables MVCC (Multi-Version Concurrency Control)

8.2 Change Buffer

Optimizes secondary index updates:

Write Process:

  1. If page in memory: update directly
  2. If page not in memory: record change in buffer
  3. Log changes to redo log

Read Process:

  • When page accessed, merge changes from buffer
  • Generates correct page version

Merge Triggers:

  • Page accessed
  • Background threads
  • Database shutdown

Best Use Cases: Write-heavy, read-light workloads Avoid When: Updates immediately followed by queries

8.3 Crash-Safe Mechanism

Two-Phase Commit Ensures Consistency:

  1. Prepare phase: write redo log (prepare)
  2. Write binlog
  3. Commit phase: write redo log (commit)

Recovery Rules:

  • If redo log complete → commit
  • If redo log prepare only → check binlog existence → commit or rollback

8.4 Flush Mechanism

When Flush Occurs:

  1. Redo log full (write_pos catches checkpoint)
  2. Buffer pool memory insufficient
  3. System idle
  4. Normal shutdown

Performance Impact Factors:

  • innodb_io_capacity: Disk capability setting
  • innodb_max_dirty_pages_pct: Dirty page threshold (default 75%)
  • innodb_flush_neighbors: SSD设置为0避免连锁flush

8.5 Temporary Tables

Internal Temporary Tables:

  • Created for UNION, UNION ALL, GROUP BY operations
  • Memory or disk-based (tmp_table_size limit)
  • Uses priority queue for small LIMIT sorts

User Temporary Tables:

CREATE TEMPORARY TABLE temp_t (id INT PRIMARY KEY);
  • Session-scoped visibility
  • Auto-cleanup on session end
  • Useful for complex joins and aggregations

8.6 Redo Log Write Mechanism

Three States:

  1. In redo log buffer (process memory)
  2. In page cache (OS memory)
  3. On disk (persistent)

Write Strategies:

  • innodb_flush_log_at_trx_commit controls persistence timing
  • Group commit combines multiple transactions for IO efficiency
  • LSN (Log Sequence Number) tracks write positions

8.7 Binlog Write Mechanism

Process:

  1. Write to binlog cache (per-thread allocation)
  2. Transaction commit: write to page cache, fsync to disk

sync_binlog Settings:

  • 0: Write only, no fsync
  • 1: Full sync on each commit
  • N: Batch sync every N commits

8.8 Auto-Increment Primary Key

Why Auto-Increment?

  • Sequential insertions avoid page splits
  • Indexes remain compact

Storage:

  • MyISAM: In data file
  • InnoDB 5.7 and earlier: In memory only (lost on restart)
  • InnoDB 8.0+: Persisted in redo log

Non-Consecutive Reasons:

  1. Unique key conflicts
  2. Transaction rollbacks
  3. Batch inserts (exponential allocation)

Lock Modes:

  • 0: Statement-level lock (legacy)
  • 1: Mixed (simple INSERT: immediate; bulk: statement-level)
  • 2: All statement-level locks released immediately

Recommendation for Bulk Inserts:

SET GLOBAL innodb_autoinc_lock_mode = 2;
SET GLOBAL binlog_format = 'ROW';

This ensures both performance and data consistency in replication scenarios.

References

  1. MySQL Documentation and Source Code
  2. "MySQL High Performance" - Optimizing Backups, Indexes, and Queries
  3. "High Performance MySQL" - Optimization, Backups, and Replication

Related Articles

Designing Alertmanager Templates for Prometheus Notifications

How to craft Alertmanager templates to format alert messages, improving clarity and presentation. Alertmanager uses Go’s text/template engine with additional helper functions. Alerting rules referenc...

Deploying a Maven Web Application to Tomcat 9 Using the Tomcat Manager

Tomcat 9 does not provide a dedicated Maven plugin. The Tomcat Manager interface, however, is backward-compatible, so the Tomcat 7 Maven Plugin can be used to deploy to Tomcat 9. This guide shows two...

Skipping Errors in MySQL Asynchronous Replication

When a replica halts because the SQL thread encounters an error, you can resume replication by skipping the problematic event(s). Two common approaches are available. Methods to Skip Errors 1) Skip a...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.