Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Understanding Clustered and Non-Clustered Indexes in SQL

Tech May 14 1

Indexes are sorted data structures that improve the speed of data retrieval operations on database tables. The most common index structures are B+ trees and hash tables. Both clustered and non-clustered indexes typically use B+ tree implementations. This article covers index behavior in SQL Server, MySQL InnoDB, and MySQL MyISAM.

Clustered Index

A clustered index determines the physical order of data rows in a table. The logical order of the index matches the physical storage order of the rows. A table can have only one clustered index because the data rows themselves can only be sorted in one order.

Consider a dictionary analogy: the pinyin index (alphabetical order) directs you to the exact page where a word is stored. The physical page numbers follow the same alphabetical sequence. If a new word starting with 'B' is added, it must be inserted after the last 'A' word, physically rearranging pages. Similarly, in a table, the physical address of a row correlates with the indexed column value.

Example table:

Address id username score
0x01 1 Alice 90
0x02 2 Bob 80
0x03 3 Charlie 92
... ... ... ...
0xff 256 Diana 70

The physical address increases with id because id is the clustered index. The leaf nodes of a clustered index contain the actual data rows. This design means queries using the clustered index often perform faster because no extra lookup is needed to retrieve the full row.

Creating a clustered index:

  • When a primary key is defined, most systems create a clustered index by default (SQL Server defaults to clustered; MySQL always uses the primary key as the clustered index).
-- SQL Server (explicit clustered)
CREATE TABLE t1 (
    id INT PRIMARY KEY CLUSTERED,
    name NVARCHAR(255)
);

-- MySQL (primary key is always clustered)
CREATE TABLE t1 (
    id INT PRIMARY KEY,
    name VARCHAR(255)
);
  • Adding after table creation:
-- SQL Server
CREATE CLUSTERED INDEX idx_clustered ON table_name(column_name);

-- MySQL
ALTER TABLE table_name ADD PRIMARY KEY(column_name);

Adding a clsutered endex later is expensive because it reorganizes all data rows physically.

Non-Clustered Index

A non-clustered index maintains a separate structure where the logical order of index entries differs from the physical row order. A table can have many non-clustered indexes. These indexes are like the radical index in a dictionary—they provide a different lookup path independent of the physical arrangement.

The leaf nodes of a non-clustered index store the index key values and a pointer to the actual data row (either the row's location or the clustered index key). Because of this indirection, queries that request columns not covered by the non-clustered index must perform a second lookup (often called a "bookmark lookup" or "key lookup") to fetch additional columns.

Example of a query that requires a second lookup: Assume table t1 has a clustered index on id and a non-clustered index on username.

-- No second lookup needed (index covers requested columns)
SELECT id, username FROM t1 WHERE username = 'Alice';

-- Second lookup required (score not in index)
SELECT username, score FROM t1 WHERE username = 'Alice';

In SQL Server, the execution plan may show an Index Seek followed by a Key Lookup. If the table is small, SQL Server might ignore the non-clustered index and scan the clustered index instead to avoid the overhead of multiple lookups. This behavior is database‑dependent: MySQL’s InnoDB will still use the non‑clustered index even for small tables, potentially performing more lookups but still benefiting from the index’s selectivity.

Avoiding the second lookup:

  • Covering index (composite index): Create an index that includes all columns referenced in the query. For example:
CREATE INDEX idx_covering ON t1(username, score);
SELECT username, score FROM t1 WHERE username = 'Alice';  -- no extra lookup

The query must satisfy the leftmost prefix rule; otherwise the index is not used.

  • Included columns (SQL Server): You can add non‑key columns to a non‑clustered index without making them part of the key:
CREATE INDEX idx_include ON t1(username) INCLUDE (score);

This allows the index to cover the query without expanding the key.

Key Takeaways

  1. Performance: Clustered indexes are generally faster for range scans and point lookups because data is stored together. However, frequent updates to the clustered column cause physical row movement, reducing write performance.
  2. Avoid extra lookups: When using non‑clustered endexes, design covering indexes to eliminate the need for a second lookup. This can significantly boost query speed.
  3. When to index: Indexes are most beneficial on large tables with highly selective columns. Small tables or columns with low selectivity (many duplicate values) may not gain much from indexing.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.