Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Optimizing MySQL Queries: Joins, Sorting, Pagination, Subqueries, and Group By

Tech 1

Optimizing Join Queries

1. Setup for Testing

CREATE TABLE IF NOT EXISTS category(
   id INT UNSIGNED NOT NULL AUTO_INCREMENT,
   ref_code INT UNSIGNED NOT NULL,
   PRIMARY KEY(id)
);

CREATE TABLE IF NOT EXISTS item(
   item_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
   ref_code INT UNSIGNED NOT NULL,
   PRIMARY KEY(item_id)
);

INSERT INTO category(ref_code) VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO item(ref_code) VALUES (FLOOR(1 + (RAND() * 20)));

Case 1: LEFT OUTER JOIN

EXPLAIN SELECT SQL_NO_CACHE * FROM category LEFT JOIN item ON category.ref_code = item.ref_code;

After adding an index:

CREATE INDEX idx_ref ON item(ref_code);

INNER JOIN

EXPLAIN SELECT SQL_NO_CACHE * FROM category INNER JOIN item ON category.ref_code = item.ref_code;

For inner joins, the optimizer selects the driving and driven tables. If only one table has an index on the join condition, that table becomes the driven table. With indexes on both tables, the smaller table is chosen as the driver.

Join Mechanisms

In a query like SELECT * FROM A JOIN B ON ..., A is not always the driver; the optimizer decides based on query analysis. Use EXPLAIN to identify roles.

For outer joins, such as SELECT * FROM A LEFT JOIN B ON ..., A typically drives, but this can vary.

Simple Nested-Loop Join

This method performs a full scan of the driven table for each row in the driver table, leading to high cost with large datasets.

Index Nested-Loop Join

Requires an index on the driven table's join column. It reduces matches by using the index for lookups, improving performance.

Block Nested-Loop Join

Uses a join buffer to store rows from the driver table, reducing disk I/O. Enable with block_nested_loop=on.

Check settings:

SHOW VARIABLES LIKE '%optimizer_switch%';
SHOW VARIABLES LIKE '%join_buffer_size%';

Join Summary

From MySQL 8.0.20, hash joins replace BNLJ by default. Ensure join columns are indexed, data types match, and use small tables as drivers for left joins. Prefer joins over subqueries to avoid temporary tables.

Subquery Optimization

Subqueries allow nesting SELECT statements but can be ineffficient due to temporary table creasion and lack of indexing. Replace with JOIN queries for better performence, especially avoiding NOT IN or NOT EXISTS in favor of LEFT JOIN with IS NULL checks.

Sorting Optimization

MySQL uses Index or FileSort for ordering. Index sorting is faster; FileSort occurs in memory or disk, consuming CPU. Optimize by:

  • Using indexes on WHERE and ORDER BY clauses.
  • Employing composite indexes for different columns.
  • Avoiding SELECT * to enable index-only scans.
  • Addding LIMIT clauses to reduce sorted data.

Index usage fails with incorrect column order or mixed sort directions. Ensure filtering exists to leverage indexes.

FileSort Algorithms

Two methods: two-pass (old) and single-pass (improved). Increase sort_buffer_size to handle large sorts and minimize I/O.

Group By Optimization

Similar to ORDER BY, GROUP BY benefits from indexes and follows leftmost prefix rules. Increase buffer sizes if indexes aren't used. Prefer WHERE over HAVING for filtering. Limit result sets to under 1000 rows to avoid slow queries.

Pagination Query Optimization

For queries like LIMIT 2000000,10, performance degrades due to sorting overhead. Optimize by:

  1. Sorting on an indexed column and joining back:
EXPLAIN SELECT * FROM item t, (SELECT item_id FROM item ORDER BY item_id LIMIT 2000000,10) s WHERE t.item_id = s.item_id;
  1. For auto-increment primary keys, use range queries:
EXPLAIN SELECT * FROM item WHERE item_id > 2000000 LIMIT 10;

EXISTS vs IN

Choose based on table size: use EXISTS for small driving tables and IN for large ones, considering index availability.

COUNT(*) vs COUNT(column)

COUNT(*) and COUNT(1) are generally efficient for row counts; COUNT(column) counts non-null values and may be slower if endexing differs.

Avoiding SELECT *

Specify columns instead of using * to reduce parsing overhead and enable covering indexes.

Using LIMIT 1

Add LIMIT 1 to full-table scans when only one row is expected to stop early. Not needed with unique indexed queries.

Frequent COMMIT Usage

Use COMMIT in transactions to release locks and improve concurrency.

Tags: MySQL

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.