Understanding Clustered and Non-Clustered Indexes in SQL
Indexes are sorted data structures that improve the speed of data retrieval operations on database tables. The most common index structures are B+ trees and hash tables. Both clustered and non-clustered indexes typically use B+ tree implementations. This article covers index behavior in SQL Server, MySQL InnoDB, and MySQL MyISAM.
Clustered Index
A clustered index determines the physical order of data rows in a table. The logical order of the index matches the physical storage order of the rows. A table can have only one clustered index because the data rows themselves can only be sorted in one order.
Consider a dictionary analogy: the pinyin index (alphabetical order) directs you to the exact page where a word is stored. The physical page numbers follow the same alphabetical sequence. If a new word starting with 'B' is added, it must be inserted after the last 'A' word, physically rearranging pages. Similarly, in a table, the physical address of a row correlates with the indexed column value.
Example table:
| Address | id | username | score |
|---|---|---|---|
| 0x01 | 1 | Alice | 90 |
| 0x02 | 2 | Bob | 80 |
| 0x03 | 3 | Charlie | 92 |
| ... | ... | ... | ... |
| 0xff | 256 | Diana | 70 |
The physical address increases with id because id is the clustered index. The leaf nodes of a clustered index contain the actual data rows. This design means queries using the clustered index often perform faster because no extra lookup is needed to retrieve the full row.
Creating a clustered index:
- When a primary key is defined, most systems create a clustered index by default (SQL Server defaults to clustered; MySQL always uses the primary key as the clustered index).
-- SQL Server (explicit clustered)
CREATE TABLE t1 (
id INT PRIMARY KEY CLUSTERED,
name NVARCHAR(255)
);
-- MySQL (primary key is always clustered)
CREATE TABLE t1 (
id INT PRIMARY KEY,
name VARCHAR(255)
);
- Adding after table creation:
-- SQL Server
CREATE CLUSTERED INDEX idx_clustered ON table_name(column_name);
-- MySQL
ALTER TABLE table_name ADD PRIMARY KEY(column_name);
Adding a clsutered endex later is expensive because it reorganizes all data rows physically.
Non-Clustered Index
A non-clustered index maintains a separate structure where the logical order of index entries differs from the physical row order. A table can have many non-clustered indexes. These indexes are like the radical index in a dictionary—they provide a different lookup path independent of the physical arrangement.
The leaf nodes of a non-clustered index store the index key values and a pointer to the actual data row (either the row's location or the clustered index key). Because of this indirection, queries that request columns not covered by the non-clustered index must perform a second lookup (often called a "bookmark lookup" or "key lookup") to fetch additional columns.
Example of a query that requires a second lookup:
Assume table t1 has a clustered index on id and a non-clustered index on username.
-- No second lookup needed (index covers requested columns)
SELECT id, username FROM t1 WHERE username = 'Alice';
-- Second lookup required (score not in index)
SELECT username, score FROM t1 WHERE username = 'Alice';
In SQL Server, the execution plan may show an Index Seek followed by a Key Lookup. If the table is small, SQL Server might ignore the non-clustered index and scan the clustered index instead to avoid the overhead of multiple lookups. This behavior is database‑dependent: MySQL’s InnoDB will still use the non‑clustered index even for small tables, potentially performing more lookups but still benefiting from the index’s selectivity.
Avoiding the second lookup:
- Covering index (composite index): Create an index that includes all columns referenced in the query. For example:
CREATE INDEX idx_covering ON t1(username, score);
SELECT username, score FROM t1 WHERE username = 'Alice'; -- no extra lookup
The query must satisfy the leftmost prefix rule; otherwise the index is not used.
- Included columns (SQL Server): You can add non‑key columns to a non‑clustered index without making them part of the key:
CREATE INDEX idx_include ON t1(username) INCLUDE (score);
This allows the index to cover the query without expanding the key.
Key Takeaways
- Performance: Clustered indexes are generally faster for range scans and point lookups because data is stored together. However, frequent updates to the clustered column cause physical row movement, reducing write performance.
- Avoid extra lookups: When using non‑clustered endexes, design covering indexes to eliminate the need for a second lookup. This can significantly boost query speed.
- When to index: Indexes are most beneficial on large tables with highly selective columns. Small tables or columns with low selectivity (many duplicate values) may not gain much from indexing.