Home > Tech > Content

MySQL Storage Engines, Index Optimization, and Programmability

Tech Apr 17 18

Storage Engine Architecture

MySQL supports multiple storage engines, each optimized for specific workload characteristics. To inspect available engines and their support levels:

SHOW ENGINES;

Engine Comparison Matrix

Capability	InnoDB	MyISAM	Memory
ACID Compliance	Supported	Not Supported	Not Supported
Granularity	Row-level locking	Table-level locking	Tible-level locking
B+Tree Structure	Supported	Supported	Supported
Hash Access	Not Supported	Not Supported	Supported
Text Search	Supported (v5.6+)	Supported	Not Supported
Disk Footprint	High	Low	N/A
RAM Utilization	High	Low	Moderate
Bulk Insert Rate	Lower	Higher	Higher
Referential Integrity	Supported	Not Supported	Not Supported

Selection Guidelines:

InnoDB: Default engine for transactional workloads requiring foreign key constraints, crash recovery, and high concurrency with consistent reads. Ideal for OLTP systems with frequent updates and deletes.
MyISAM: Suitable for read-heavy workloads with rare updates, such as data warehousing or logging scenarios where transaction safety is unnecessary.
Memory: Stores entire datasets in RAM for microsecond latency access. Useful for temporary lookup tables, session caches, or derived statistics, though data volatility on restart must be considered.

Indexing Fundamentals

Structural Classifications

Structure	Implementation Details
B+Tree	Balanced tree maintaining sorted order; supports range scans and equality matches across most angines.
Hash	Hash table implementation providing O(1) equality lookups; ineffective for range predicates.
R-Tree	Specialized for spatial data types (GIS); available exclusively in MyISAM.
Inverted	Full-text indexes mapping lexical terms to document locations.

Engine Support Matrix

Index Type	InnoDB	MyISAM	Memory
B+Tree	Yes	Yes	Yes
Hash	No	No	Yes
R-Tree	No	Yes	No
Full-text	Yes (5.6+)	Yes	No

Logical Index Categories

Category	Purpose	Cardinality	Keyword
Primary Key	Unique row identifier	One per table	`PRIMARY KEY`
Unique Constraint	Duplicate prevention	Multiple allowed	`UNIQUE`
Secondary	General query acceleration	Multiple allowed	None
Full-text	Lexical content search	Multiple allowed	`FULLTEXT`

InnoDB Physical Organization

InnoDB employs a clustered index architecture where the primary key determines physical row storage order:

Clustered Index: Integrates data pages with index leaf nodes. Every table requires exactly one clustered index.
Secondary Index: Maintains separate index structures where leaf entries contain primary key values for row lookup (bookmark lookup).

Clustered Index Selection Hierarchy:

Explicit primary key definition
First non-nullable unique index
Auto-generated hidden rowid column

Index Maintenance Syntax

-- Create standard or specialized indexes
CREATE [UNIQUE | FULLTEXT] INDEX idx_name ON tbl_name (col_list);

-- Inspect existing indexes
SHOW INDEX FROM tbl_name;

-- Remove indexes
DROP INDEX idx_name ON tbl_name;

Performance Diagnostics

Server Activity Metrics

Analyze statement execution frequency to identify optimization opportunities:

-- Global statement counters (7-character command names)
SHOW GLOBAL STATUS LIKE 'Com_______';

Interpretation focuses on the read-to-write ratio; elevated Com_select values relative to modifications indicate query optimization priorities.

Slow Query Identification

Enable logging for statements exceeding execution thresholds:

-- Check current configuration
SELECT @@slow_query_log, @@long_query_time;

Persistent Configuration (my.cnf):

slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow.log
long_query_time = 2

Runtime Activation:

SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1.5;

Execution Profiling

Measure granular statement execution phases:

-- Enable profiling
SET profiling = 1;

-- Execute queries, then analyze
SHOW PROFILES;
SHOW PROFILE CPU, BLOCK IO FOR QUERY 42;

Execution Plan Analysis

Prefix queries with EXPLAIN or DESC to analyze optimizer decisions:

EXPLAIN SELECT * FROM 
    orders o 
    JOIN items i ON o.order_id = i.order_id 
    WHERE o.status = 'shipped';

Key Output Columns:

type: Acess method efficiency (NULL → system → const → eq_ref → ref → range → index → ALL)
possible_keys: Candidate indexes considered
key: Index actually selected
rows: Estimated rows to examine
filtered: Percentage of rows satisfying query conditions

Index Optimization Techniques

Leftmost Prefix Principle

For composite indexes, query predicates must match the leading column(s) in order.

Demonstration Setup:

CREATE TABLE employee_records (
    dept_id INT,
    surname VARCHAR(50),
    hire_date DATE,
    salary DECIMAL(10,2)
);

CREATE INDEX idx_dept_name_date ON employee_records(dept_id, surname, hire_date);

Effective Usage Patterns:

WHERE dept_id = 5 (uses first column)
WHERE dept_id = 5 AND surname = 'Smith' (uses first two columns)
WHERE surname = 'Smith' AND dept_id = 5 (optimizer reordering permitted)

Ineffective Patterns:

WHERE surname = 'Smith' (missing leading column)
WHERE dept_id = 5 AND hire_date = '2023-01-01' (gap in middle column)

Functional Invalidation

Applying functions to indexed columns prevents index utilization:

-- Inefficient: Function applied to column
EXPLAIN SELECT * FROM employee_records WHERE YEAR(hire_date) = 2023;

-- Efficient: Range comparison on raw column
EXPLAIN SELECT * FROM employee_records 
WHERE hire_date BETWEEN '2023-01-01' AND '2023-12-31';

Range Condition Boundaries

Range predicates on a column prevent subsequent columns in composite indexes from being used:

-- Only dept_id uses the index; surname and hire_date do not
EXPLAIN SELECT * FROM employee_records 
WHERE dept_id > 10 AND surname = 'Jones';

Wildcard Positioning

Leading wildcards disable index seeks:

-- Index scan (inefficient)
EXPLAIN SELECT * FROM employee_records WHERE surname LIKE '%son';

-- Index seek (efficient)
EXPLAIN SELECT * FROM employee_records WHERE surname LIKE 'John%';

Disjunctive Condition Limitations

OR conditions mixing indexed and non-indexed columns typically result in full table scans:

-- Full scan likely, even if dept_id is indexed
EXPLAIN SELECT * FROM employee_records 
WHERE dept_id = 5 OR salary > 50000;

Selectivity and Data Distribution

The optimizer may bypass indexes when expected selectivity suggests sequential scans are faster:

CREATE TABLE sample_data (id INT, category CHAR(1));
CREATE INDEX idx_cat ON sample_data(category);

-- With uniform distribution, > 30% of rows might scan
EXPLAIN SELECT * FROM sample_data WHERE category = 'A';

Index Hinting

Explicitly influence optimizer index selection:

-- Suggest specific index
SELECT * FROM employee_records USE INDEX (idx_dept_name_date) WHERE dept_id = 5;

-- Exclude specific index
SELECT * FROM employee_records IGNORE INDEX (idx_dept_name_date) WHERE dept_id = 5;

-- Mandate specific index
SELECT * FROM employee_records FORCE INDEX (idx_dept_name_date) WHERE dept_id = 5;

Covering Indexes

When all query columns exist in the index, eliminating table lookups:

-- Covering: Only indexed columns retrieved
EXPLAIN SELECT dept_id, surname FROM employee_records WHERE dept_id = 5;
-- Extra: Using index

-- Non-covering: Requires table access for salary
EXPLAIN SELECT dept_id, surname, salary FROM employee_records WHERE dept_id = 5;
-- Extra: Using index condition

Prefix Indexing

For lengthy string columns, index leading characters to reduce B+Tree depth:

-- Calculate optimal prefix length via selectivity
SELECT 
    COUNT(DISTINCT LEFT(email, 6)) / COUNT(*) AS sel_6,
    COUNT(DISTINCT LEFT(email, 8)) / COUNT(*) AS sel_8
FROM contact_list;

-- Create prefix index
CREATE INDEX idx_email_prefix ON contact_list(email(8));

Data Modification Optimization

Batch Insertion Strategies

Consolidate multiple rows into single INSERT statements (500-1000 rows optimal)
Wrap sequential inserts in explicit transactions to reduce log flushes
For massive datasets, disable keys temporarily:

ALTER TABLE large_table DISABLE KEYS;
-- ... bulk insert operations ...
ALTER TABLE large_table ENABLE KEYS;

Bulk Loading from Files

-- Enable local file loading
SET GLOBAL local_infile = 1;

-- High-performance data import
LOAD DATA LOCAL INFILE '/tmp/dataset.csv' 
INTO TABLE target_table 
FIELDS TERMINATED BY ',' 
LINES TERMINATED BY '\n';

Primary Key Design

Utilize auto-incrementing integers to maintain sequential page allocation
Avoid UUIDs or random strings that cause page splits and fragmentation
Minimize primary key byte length to reduce secondary index size

Ordering and Grouping

Create indexes on columns used in ORDER BY and GROUP BY clauses to avoid file-sort operations:

CREATE INDEX idx_age ON personnel(age);

-- Now uses index scan instead of filesort
SELECT * FROM personnel ORDER BY age LIMIT 1000;

Database Views

Views provide virtual table abstraction based on stored queries:

-- Definition
CREATE VIEW active_orders AS 
SELECT order_id, customer_id, total 
FROM orders 
WHERE status IN ('pending', 'processing');

-- Utilization
SELECT * FROM active_orders WHERE total > 1000;

-- Metadata
SHOW CREATE VIEW active_orders;

-- Removal
DROP VIEW active_orders;

-- Modification
CREATE OR REPLACE VIEW active_orders AS 
SELECT order_id, customer_id, total, created_at 
FROM orders 
WHERE status = 'pending';

Integrity Constraints with CHECK OPTION

Prevent view modifications that would exclude rows from the view's result set:

CREATE VIEW high_value_orders AS 
SELECT * FROM orders WHERE total > 500
WITH CASCADED CHECK OPTION;

-- Rejected: violates view predicate
INSERT INTO high_value_orders (total) VALUES (300);

CASCADED vs LOCAL:

CASCADED: Validates against current view and all underlying view predicates
LOCAL: Validates only current view's predicates

Stored Routines

Delimiter Management

Change statement delimiter to allow semicolons within procedure bodies:

DELIMITER //
-- procedure definition here
DELIMITER ;

Procedure Definition and Invocation

DELIMITER //
CREATE PROCEDURE calculate_statistics()
BEGIN
    DECLARE record_count INT DEFAULT 0;
    SELECT COUNT(*) INTO record_count FROM transactions;
    SELECT record_count AS total_records;
END //
DELIMITER ;

-- Execution
CALL calculate_statistics();

-- Removal
DROP PROCEDURE calculate_statistics;

Variable Scoping

System Variables:

-- Global vs Session scope
SET GLOBAL max_connections = 500;
SET SESSION sort_buffer_size = 262144;
SELECT @@global.max_connections, @@session.sort_buffer_size;

User-Defined Variables:

SET @threshold = 1000;
SET @result := (SELECT COUNT(*) FROM sales WHERE amount > @threshold);
SELECT @result;

Local Variables:

DELIMITER //
CREATE PROCEDURE compute_factorial(IN n INT)
BEGIN
    DECLARE result BIGINT DEFAULT 1;
    DECLARE counter INT DEFAULT 1;
    
    WHILE counter <= n DO
        SET result = result * counter;
        SET counter = counter + 1;
    END WHILE;
    
    SELECT result AS factorial;
END //
DELIMITER ;

Parameter Modes

DELIMITER //
CREATE PROCEDURE adjust_inventory(
    IN product_id INT,           -- Input parameter
    OUT previous_stock INT,      -- Output parameter
    INOUT quantity_change INT    -- Bidirectional parameter
)
BEGIN
    SELECT stock INTO previous_stock FROM products WHERE id = product_id;
    UPDATE products 
    SET stock = stock + quantity_change 
    WHERE id = product_id;
    SET quantity_change = previous_stock + quantity_change;
END //
DELIMITER ;

-- Invocation
SET @current_qty = -5;
CALL adjust_inventory(101, @old_stock, @current_qty);
SELECT @old_stock, @current_qty;

Control Structures

Conditional Logic:

DELIMITER //
CREATE PROCEDURE classify_performance(IN score DECIMAL(5,2))
BEGIN
    IF score >= 90 THEN
        SELECT 'Exceptional' AS rating;
    ELSEIF score >= 75 THEN
        SELECT 'Proficient' AS rating;
    ELSEIF score >= 60 THEN
        SELECT 'Developing' AS rating;
    ELSE
        SELECT 'Insufficient' AS rating;
    END IF;
END //
DELIMITER ;

CASE Expressions:

DELIMITER //
CREATE PROCEDURE get_season(IN month_num INT)
BEGIN
    CASE month_num
        WHEN 12, 1, 2 THEN SELECT 'Winter' AS season;
        WHEN 3, 4, 5 THEN SELECT 'Spring' AS season;
        WHEN 6, 7, 8 THEN SELECT 'Summer' AS season;
        WHEN 9, 10, 11 THEN SELECT 'Autumn' AS season;
        ELSE SELECT 'Invalid' AS season;
    END CASE;
END //
DELIMITER ;

Iteration Constructs

WHILE Loop:

DELIMITER //
CREATE PROCEDURE generate_series(IN max_val INT)
BEGIN
    DECLARE current_val INT DEFAULT 1;
    CREATE TEMPORARY TABLE IF NOT EXISTS number_series (val INT);
    TRUNCATE TABLE number_series;
    
    WHILE current_val <= max_val DO
        INSERT INTO number_series VALUES (current_val);
        SET current_val = current_val + 1;
    END WHILE;
    
    SELECT * FROM number_series;
END //
DELIMITER ;

REPEAT Loop:

DELIMITER //
CREATE PROCEDURE sum_even_numbers(IN limit_val INT)
BEGIN
    DECLARE sum_val INT DEFAULT 0;
    DECLARE current INT DEFAULT 2;
    
    REPEAT
        SET sum_val = sum_val + current;
        SET current = current + 2;
    UNTIL current > limit_val
    END REPEAT;
    
    SELECT sum_val AS even_sum;
END //
DELIMITER ;

LOOP with Control:

DELIMITER //
CREATE PROCEDURE process_cursor_example()
BEGIN
    DECLARE done BOOLEAN DEFAULT FALSE;
    DECLARE emp_name VARCHAR(100);
    DECLARE emp_salary DECIMAL(10,2);
    
    DECLARE cur CURSOR FOR SELECT name, salary FROM employees WHERE department = 'Sales';
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
    
    CREATE TEMPORARY TABLE IF NOT EXISTS high_earners (name VARCHAR(100), salary DECIMAL(10,2));
    
    OPEN cur;
    read_loop: LOOP
        FETCH cur INTO emp_name, emp_salary;
        IF done THEN
            LEAVE read_loop;
        END IF;
        
        IF emp_salary > 50000 THEN
            INSERT INTO high_earners VALUES (emp_name, emp_salary);
        END IF;
    END LOOP read_loop;
    
    CLOSE cur;
    SELECT * FROM high_earners;
    DROP TEMPORARY TABLE high_earners;
END //
DELIMITER ;

Error Handling

DELIMITER //
CREATE PROCEDURE safe_division(IN numerator DECIMAL(10,2), IN denominator DECIMAL(10,2), OUT result DECIMAL(10,2))
BEGIN
    DECLARE EXIT HANDLER FOR SQLEXCEPTION
    BEGIN
        SET result = NULL;
        SELECT 'Calculation error occurred' AS message;
    END;
    
    DECLARE EXIT HANDLER FOR SQLSTATE '22012'  -- Division by zero
    BEGIN
        SET result = 0;
        SELECT 'Division by zero handled' AS message;
    END;
    
    SET result = numerator / denominator;
END //
DELIMITER ;

Tags: MySQL Database Optimization Indexing

Back to List

Prev: Essential File Compression and Decompression Techniques in Python

Next: Practical C# Syntactic Features for Cleaner, More Readable Code

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Fading Coder