MySQL Storage Engines, Index Optimization, and Programmability
Storage Engine Architecture
MySQL supports multiple storage engines, each optimized for specific workload characteristics. To inspect available engines and their support levels:
SHOW ENGINES;
Engine Comparison Matrix
| Capability | InnoDB | MyISAM | Memory |
|---|---|---|---|
| ACID Compliance | Supported | Not Supported | Not Supported |
| Granularity | Row-level locking | Table-level locking | Tible-level locking |
| B+Tree Structure | Supported | Supported | Supported |
| Hash Access | Not Supported | Not Supported | Supported |
| Text Search | Supported (v5.6+) | Supported | Not Supported |
| Disk Footprint | High | Low | N/A |
| RAM Utilization | High | Low | Moderate |
| Bulk Insert Rate | Lower | Higher | Higher |
| Referential Integrity | Supported | Not Supported | Not Supported |
Selection Guidelines:
- InnoDB: Default engine for transactional workloads requiring foreign key constraints, crash recovery, and high concurrency with consistent reads. Ideal for OLTP systems with frequent updates and deletes.
- MyISAM: Suitable for read-heavy workloads with rare updates, such as data warehousing or logging scenarios where transaction safety is unnecessary.
- Memory: Stores entire datasets in RAM for microsecond latency access. Useful for temporary lookup tables, session caches, or derived statistics, though data volatility on restart must be considered.
Indexing Fundamentals
Structural Classifications
| Structure | Implementation Details |
|---|---|
| B+Tree | Balanced tree maintaining sorted order; supports range scans and equality matches across most angines. |
| Hash | Hash table implementation providing O(1) equality lookups; ineffective for range predicates. |
| R-Tree | Specialized for spatial data types (GIS); available exclusively in MyISAM. |
| Inverted | Full-text indexes mapping lexical terms to document locations. |
Engine Support Matrix
| Index Type | InnoDB | MyISAM | Memory |
|---|---|---|---|
| B+Tree | Yes | Yes | Yes |
| Hash | No | No | Yes |
| R-Tree | No | Yes | No |
| Full-text | Yes (5.6+) | Yes | No |
Logical Index Categories
| Category | Purpose | Cardinality | Keyword |
|---|---|---|---|
| Primary Key | Unique row identifier | One per table | PRIMARY KEY |
| Unique Constraint | Duplicate prevention | Multiple allowed | UNIQUE |
| Secondary | General query acceleration | Multiple allowed | None |
| Full-text | Lexical content search | Multiple allowed | FULLTEXT |
InnoDB Physical Organization
InnoDB employs a clustered index architecture where the primary key determines physical row storage order:
- Clustered Index: Integrates data pages with index leaf nodes. Every table requires exactly one clustered index.
- Secondary Index: Maintains separate index structures where leaf entries contain primary key values for row lookup (bookmark lookup).
Clustered Index Selection Hierarchy:
- Explicit primary key definition
- First non-nullable unique index
- Auto-generated hidden
rowidcolumn
Index Maintenance Syntax
-- Create standard or specialized indexes
CREATE [UNIQUE | FULLTEXT] INDEX idx_name ON tbl_name (col_list);
-- Inspect existing indexes
SHOW INDEX FROM tbl_name;
-- Remove indexes
DROP INDEX idx_name ON tbl_name;
Performance Diagnostics
Server Activity Metrics
Analyze statement execution frequency to identify optimization opportunities:
-- Global statement counters (7-character command names)
SHOW GLOBAL STATUS LIKE 'Com_______';
Interpretation focuses on the read-to-write ratio; elevated Com_select values relative to modifications indicate query optimization priorities.
Slow Query Identification
Enable logging for statements exceeding execution thresholds:
-- Check current configuration
SELECT @@slow_query_log, @@long_query_time;
Persistent Configuration (my.cnf):
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow.log
long_query_time = 2
Runtime Activation:
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1.5;
Execution Profiling
Measure granular statement execution phases:
-- Enable profiling
SET profiling = 1;
-- Execute queries, then analyze
SHOW PROFILES;
SHOW PROFILE CPU, BLOCK IO FOR QUERY 42;
Execution Plan Analysis
Prefix queries with EXPLAIN or DESC to analyze optimizer decisions:
EXPLAIN SELECT * FROM
orders o
JOIN items i ON o.order_id = i.order_id
WHERE o.status = 'shipped';
Key Output Columns:
- type: Acess method efficiency (NULL → system → const → eq_ref → ref → range → index → ALL)
- possible_keys: Candidate indexes considered
- key: Index actually selected
- rows: Estimated rows to examine
- filtered: Percentage of rows satisfying query conditions
Index Optimization Techniques
Leftmost Prefix Principle
For composite indexes, query predicates must match the leading column(s) in order.
Demonstration Setup:
CREATE TABLE employee_records (
dept_id INT,
surname VARCHAR(50),
hire_date DATE,
salary DECIMAL(10,2)
);
CREATE INDEX idx_dept_name_date ON employee_records(dept_id, surname, hire_date);
Effective Usage Patterns:
WHERE dept_id = 5(uses first column)WHERE dept_id = 5 AND surname = 'Smith'(uses first two columns)WHERE surname = 'Smith' AND dept_id = 5(optimizer reordering permitted)
Ineffective Patterns:
WHERE surname = 'Smith'(missing leading column)WHERE dept_id = 5 AND hire_date = '2023-01-01'(gap in middle column)
Functional Invalidation
Applying functions to indexed columns prevents index utilization:
-- Inefficient: Function applied to column
EXPLAIN SELECT * FROM employee_records WHERE YEAR(hire_date) = 2023;
-- Efficient: Range comparison on raw column
EXPLAIN SELECT * FROM employee_records
WHERE hire_date BETWEEN '2023-01-01' AND '2023-12-31';
Range Condition Boundaries
Range predicates on a column prevent subsequent columns in composite indexes from being used:
-- Only dept_id uses the index; surname and hire_date do not
EXPLAIN SELECT * FROM employee_records
WHERE dept_id > 10 AND surname = 'Jones';
Wildcard Positioning
Leading wildcards disable index seeks:
-- Index scan (inefficient)
EXPLAIN SELECT * FROM employee_records WHERE surname LIKE '%son';
-- Index seek (efficient)
EXPLAIN SELECT * FROM employee_records WHERE surname LIKE 'John%';
Disjunctive Condition Limitations
OR conditions mixing indexed and non-indexed columns typically result in full table scans:
-- Full scan likely, even if dept_id is indexed
EXPLAIN SELECT * FROM employee_records
WHERE dept_id = 5 OR salary > 50000;
Selectivity and Data Distribution
The optimizer may bypass indexes when expected selectivity suggests sequential scans are faster:
CREATE TABLE sample_data (id INT, category CHAR(1));
CREATE INDEX idx_cat ON sample_data(category);
-- With uniform distribution, > 30% of rows might scan
EXPLAIN SELECT * FROM sample_data WHERE category = 'A';
Index Hinting
Explicitly influence optimizer index selection:
-- Suggest specific index
SELECT * FROM employee_records USE INDEX (idx_dept_name_date) WHERE dept_id = 5;
-- Exclude specific index
SELECT * FROM employee_records IGNORE INDEX (idx_dept_name_date) WHERE dept_id = 5;
-- Mandate specific index
SELECT * FROM employee_records FORCE INDEX (idx_dept_name_date) WHERE dept_id = 5;
Covering Indexes
When all query columns exist in the index, eliminating table lookups:
-- Covering: Only indexed columns retrieved
EXPLAIN SELECT dept_id, surname FROM employee_records WHERE dept_id = 5;
-- Extra: Using index
-- Non-covering: Requires table access for salary
EXPLAIN SELECT dept_id, surname, salary FROM employee_records WHERE dept_id = 5;
-- Extra: Using index condition
Prefix Indexing
For lengthy string columns, index leading characters to reduce B+Tree depth:
-- Calculate optimal prefix length via selectivity
SELECT
COUNT(DISTINCT LEFT(email, 6)) / COUNT(*) AS sel_6,
COUNT(DISTINCT LEFT(email, 8)) / COUNT(*) AS sel_8
FROM contact_list;
-- Create prefix index
CREATE INDEX idx_email_prefix ON contact_list(email(8));
Data Modification Optimization
Batch Insertion Strategies
- Consolidate multiple rows into single
INSERTstatements (500-1000 rows optimal) - Wrap sequential inserts in explicit transactions to reduce log flushes
- For massive datasets, disable keys temporarily:
ALTER TABLE large_table DISABLE KEYS;
-- ... bulk insert operations ...
ALTER TABLE large_table ENABLE KEYS;
Bulk Loading from Files
-- Enable local file loading
SET GLOBAL local_infile = 1;
-- High-performance data import
LOAD DATA LOCAL INFILE '/tmp/dataset.csv'
INTO TABLE target_table
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n';
Primary Key Design
- Utilize auto-incrementing integers to maintain sequential page allocation
- Avoid UUIDs or random strings that cause page splits and fragmentation
- Minimize primary key byte length to reduce secondary index size
Ordering and Grouping
Create indexes on columns used in ORDER BY and GROUP BY clauses to avoid file-sort operations:
CREATE INDEX idx_age ON personnel(age);
-- Now uses index scan instead of filesort
SELECT * FROM personnel ORDER BY age LIMIT 1000;
Database Views
Views provide virtual table abstraction based on stored queries:
-- Definition
CREATE VIEW active_orders AS
SELECT order_id, customer_id, total
FROM orders
WHERE status IN ('pending', 'processing');
-- Utilization
SELECT * FROM active_orders WHERE total > 1000;
-- Metadata
SHOW CREATE VIEW active_orders;
-- Removal
DROP VIEW active_orders;
-- Modification
CREATE OR REPLACE VIEW active_orders AS
SELECT order_id, customer_id, total, created_at
FROM orders
WHERE status = 'pending';
Integrity Constraints with CHECK OPTION
Prevent view modifications that would exclude rows from the view's result set:
CREATE VIEW high_value_orders AS
SELECT * FROM orders WHERE total > 500
WITH CASCADED CHECK OPTION;
-- Rejected: violates view predicate
INSERT INTO high_value_orders (total) VALUES (300);
CASCADED vs LOCAL:
- CASCADED: Validates against current view and all underlying view predicates
- LOCAL: Validates only current view's predicates
Stored Routines
Delimiter Management
Change statement delimiter to allow semicolons within procedure bodies:
DELIMITER //
-- procedure definition here
DELIMITER ;
Procedure Definition and Invocation
DELIMITER //
CREATE PROCEDURE calculate_statistics()
BEGIN
DECLARE record_count INT DEFAULT 0;
SELECT COUNT(*) INTO record_count FROM transactions;
SELECT record_count AS total_records;
END //
DELIMITER ;
-- Execution
CALL calculate_statistics();
-- Removal
DROP PROCEDURE calculate_statistics;
Variable Scoping
System Variables:
-- Global vs Session scope
SET GLOBAL max_connections = 500;
SET SESSION sort_buffer_size = 262144;
SELECT @@global.max_connections, @@session.sort_buffer_size;
User-Defined Variables:
SET @threshold = 1000;
SET @result := (SELECT COUNT(*) FROM sales WHERE amount > @threshold);
SELECT @result;
Local Variables:
DELIMITER //
CREATE PROCEDURE compute_factorial(IN n INT)
BEGIN
DECLARE result BIGINT DEFAULT 1;
DECLARE counter INT DEFAULT 1;
WHILE counter <= n DO
SET result = result * counter;
SET counter = counter + 1;
END WHILE;
SELECT result AS factorial;
END //
DELIMITER ;
Parameter Modes
DELIMITER //
CREATE PROCEDURE adjust_inventory(
IN product_id INT, -- Input parameter
OUT previous_stock INT, -- Output parameter
INOUT quantity_change INT -- Bidirectional parameter
)
BEGIN
SELECT stock INTO previous_stock FROM products WHERE id = product_id;
UPDATE products
SET stock = stock + quantity_change
WHERE id = product_id;
SET quantity_change = previous_stock + quantity_change;
END //
DELIMITER ;
-- Invocation
SET @current_qty = -5;
CALL adjust_inventory(101, @old_stock, @current_qty);
SELECT @old_stock, @current_qty;
Control Structures
Conditional Logic:
DELIMITER //
CREATE PROCEDURE classify_performance(IN score DECIMAL(5,2))
BEGIN
IF score >= 90 THEN
SELECT 'Exceptional' AS rating;
ELSEIF score >= 75 THEN
SELECT 'Proficient' AS rating;
ELSEIF score >= 60 THEN
SELECT 'Developing' AS rating;
ELSE
SELECT 'Insufficient' AS rating;
END IF;
END //
DELIMITER ;
CASE Expressions:
DELIMITER //
CREATE PROCEDURE get_season(IN month_num INT)
BEGIN
CASE month_num
WHEN 12, 1, 2 THEN SELECT 'Winter' AS season;
WHEN 3, 4, 5 THEN SELECT 'Spring' AS season;
WHEN 6, 7, 8 THEN SELECT 'Summer' AS season;
WHEN 9, 10, 11 THEN SELECT 'Autumn' AS season;
ELSE SELECT 'Invalid' AS season;
END CASE;
END //
DELIMITER ;
Iteration Constructs
WHILE Loop:
DELIMITER //
CREATE PROCEDURE generate_series(IN max_val INT)
BEGIN
DECLARE current_val INT DEFAULT 1;
CREATE TEMPORARY TABLE IF NOT EXISTS number_series (val INT);
TRUNCATE TABLE number_series;
WHILE current_val <= max_val DO
INSERT INTO number_series VALUES (current_val);
SET current_val = current_val + 1;
END WHILE;
SELECT * FROM number_series;
END //
DELIMITER ;
REPEAT Loop:
DELIMITER //
CREATE PROCEDURE sum_even_numbers(IN limit_val INT)
BEGIN
DECLARE sum_val INT DEFAULT 0;
DECLARE current INT DEFAULT 2;
REPEAT
SET sum_val = sum_val + current;
SET current = current + 2;
UNTIL current > limit_val
END REPEAT;
SELECT sum_val AS even_sum;
END //
DELIMITER ;
LOOP with Control:
DELIMITER //
CREATE PROCEDURE process_cursor_example()
BEGIN
DECLARE done BOOLEAN DEFAULT FALSE;
DECLARE emp_name VARCHAR(100);
DECLARE emp_salary DECIMAL(10,2);
DECLARE cur CURSOR FOR SELECT name, salary FROM employees WHERE department = 'Sales';
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
CREATE TEMPORARY TABLE IF NOT EXISTS high_earners (name VARCHAR(100), salary DECIMAL(10,2));
OPEN cur;
read_loop: LOOP
FETCH cur INTO emp_name, emp_salary;
IF done THEN
LEAVE read_loop;
END IF;
IF emp_salary > 50000 THEN
INSERT INTO high_earners VALUES (emp_name, emp_salary);
END IF;
END LOOP read_loop;
CLOSE cur;
SELECT * FROM high_earners;
DROP TEMPORARY TABLE high_earners;
END //
DELIMITER ;
Error Handling
DELIMITER //
CREATE PROCEDURE safe_division(IN numerator DECIMAL(10,2), IN denominator DECIMAL(10,2), OUT result DECIMAL(10,2))
BEGIN
DECLARE EXIT HANDLER FOR SQLEXCEPTION
BEGIN
SET result = NULL;
SELECT 'Calculation error occurred' AS message;
END;
DECLARE EXIT HANDLER FOR SQLSTATE '22012' -- Division by zero
BEGIN
SET result = 0;
SELECT 'Division by zero handled' AS message;
END;
SET result = numerator / denominator;
END //
DELIMITER ;