MySQL Index Types and Implementation Details
In MySQL, an index (also referred to as a key) is a data structure that storage engines use to locate rows rapidly. By functioning similarly to a book's table of contents, indexes prevent the database from scanning the entire table to find relevant data, thereby significantly improving query performance.
Classification of Indexes
MySQL supports various index types to accommodate different data access patterns and integrity requirements.
1. Primary Key Index
A primary key index enforces unique identification for each record. A table is restricted to a single primary key. While auto-incrementing columns are frequently used as primary keys, the reverse is not required. It is best practice to define primary keys on meaningless columns (such as surrogate keys) using numeric data types for optimal performance.
2. Standard Index (INDEX or KEY)
Standard indexes, also known as non-unique indexes, are the most common type. They are typically added after table creation to improve retrieval speed.
- Single-Column Index: An index created on a specific column. ```
CREATE INDEX idx_user_email ON employees(email);
- Composite Index: An index covering multiple columns. This type follows the "Leftmost Prefix" rule. ```
CREATE INDEX idx_user_profile ON employees(last_name, department_id);
For the index above: - A query filtering by `last_name` will utilize the index. - A query filtering only by `department_id` will likely bypass the index.
3. Unique Index
Unique indexes ensure that all values in the indexed column are distinct. While similar to a primary key, they differ in key aspects: unique indexes allow NULL values (depending on the storage engine), whereas primary keys do not. Additionally, a table can have multiple unique indexes but only one primary key.
ALTER TABLE products ADD UNIQUE INDEX idx_sku (product_sku);
4. Full-Text Index
Designed for text-based searches, full-text indexes were historically limited to the MyISAM engine but are now supported by InnoDB (MySQL 5.6+). They enable efficient querying of text columns using natural language search patterns.
CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
title VARCHAR(200),
body TEXT,
FULLTEXT INDEX ft_search (title, body)
) ENGINE=InnoDB;
Index Data Structures
The underlying storage structure of an index dictates its behavior and efficiency. MySQL generally supports B-Tree, Hash, R-Tree, and Full-Text structures.
1. B-Tree Indexes
B-Tree (and its variant B+Tree) is the default index type for most MySQL storage engines.
- MyISAM Implementation: The index file stores pointers (disk addresses) to the actual data records. The leaf nodes contain the data address.
- InnoDB Implementation: InnoDB uses a clustered index structure. For the primary key, the leaf nodes contain the actual data record. Secondary indexes store the primary key value at the leaf nodes.
B+Tree vs. B-Tree:
- Non-leaf nodes in B+Tree only store key values, while leaf nodes store keys and data (or pointers).
- B+Tree leaf nodes are linked via a doubly-linked list, optimizing range scans.
2. Hash Indexes
Hash indexes are based on a hash table. They are primarily supported by the MEMORY engine.
- Pros: Extremely fast for equality comparisons (
=,IN). - Cons: They cannot be used for range queries or sorting. They are also susceptible to hash collisions.
3. R-Tree Indexes
Used for spatial data types (GEOMETRY). Support is limited to specific engines like MyISAM and InnoDB.
Common Causes of Index Invalidation
Indexes are not always used by the query optimizer. Several scenarios can cause an index to be ignored, resulting in a full table scan:
- Improper
ORUsage: If one column in anORcondition lacks an index, the optimizer may ignore all indexes for that query. - Composite Index Violation: Failing to use the leftmost column of a composite index prevents the index from being utilized.
- Leading Wildcard: Queries using
LIKE '%abc'cannot utilize a B-Tree index, whereasLIKE 'abc%'can. - Implicit Type Conversion: If an indexed column is a string type (e.g., VARCHAR) and the query supplies a numeric value without quotes, the index may be bypassed.
- Functions on Columns: Applying a function to an indexed column (e.g.,
WHERE YEAR(created_at) = 2023) invalidates the index. - Low Selectivity: If an index column has very low cardinality (e.g., a "Gender" column with only 'M' and 'F'), the optimizer may calculate that a full table scan is faster than traversing the index.
Locking Implications
In InnoDB, row-level locking is implemented through index records. If a query fails to use an index (e.g., due to the invalidation reasons above), InnoDB may resort to a table lock rather than row locks, potentialyl impacting concurrency.