Home > Tech > Content

MySQL Interview Questions: Database Design, Query Optimization, and Performance Tuning

Tech Apr 28 25

Database Normalization Forms

First Normal Form: Columns cannot be further divided. Each column must contain atomic values, not sets or arrays.

Second Normal Form: Building on the first normal form, all non-key columns must depend entirely on the primary key, not just on part of it. This applies to tables with composite primary keys.

Third Normal Form: Building on the second normal form, there should be no transitive dependencies between non-key columns. In other words, non-key columns should depend only on the primary key, not on other non-key columns.

These normalization forms are essential guidelines for building relational databases. They help designers avoid data anomalies and reduce maintenance costs while improving query efficiency. In practice, appropriate denormalization may be used to improve read performance when business requirements demand it.

Inner Join vs Left Join vs Right Join

Inner Join: Returns only rows that have matching values in both tables.
Left Join: Returns all rows from the left table and matching rows from the right table. Non-matching right-side columns return NULL.
Right Join: Returns all rows from the right table and matching rows from the left table. Non-matching left-side columns return NULL.

Why Prefer Inner Join?

When results are equivalent, prefer inner join for better performance. If using left join, the left table should be as small as possible.

Inner join preserves only matching records from both tables, resulting in fewer rows. Left join retains all rows from the left table regardless of matches, while right join does the same for the right table.

MySQL optimization principles favor using smaller datasets to drive larger ones, which improves overall performance.

Table Join Limitations

It is generally recommended to limit the number of table joins to three or fewer. More joins increase compilation time and memory overhead as temporary tables are created for each join. Complex joins also reduce code readability.

UNION vs UNION ALL

UNION: Combines results and removes duplicate rows. Performs default sorting on the combined result set.

UNION ALL: Combines results without removing duplicates. Does not perform sorting.

UNION ALL is faster when duplicates are acceptable or not present, as it skips deduplication and sorting operations.

COUNT(1), COUNT(*), and COUNT(column) Differences

COUNT(1): Counts all rows including those with NULL values in any column.
COUNT(*): Counts all rows similarly, equivalent to COUNT(1) in functionality.
COUNT(column_name): Counts only non-NULL values in the specified column.

MySQL Data Types Overview

Numeric Types:

TINYINT: -128 to 127 (or 0-255 unsigned)
SMALLINT: -32768 to 32767 (or 0-65535 unsigned)
MEDIUMINT: -8388608 to 8388607 (or 0-16777215 unsigned)
INT: -2147483648 to 2147483647 (or 0-4294967295 unsigned)
BIGINT: Large integers for significant numeric values

Floating-Point Types:

FLOAT: Single-precision (approximately 7 significant digits)
DOUBLE: Double-precision (approximately 16 significant digits)
DECIMAL: Fixed-point for exact precision, ideal for financial calculations

Date and Time Types:

DATE: Year, month, day
TIME: Hours, minutes, seconds
DATETIME: Full date and time
TIMESTAMP: Unix timestamp with automatic updates

String Types:

CHAR: Fixed-length strings
VARCHAR: Variable-length strings
TEXT: Large text data
BLOB: Binary large objects

Converting INT to BIGINT

To modify a column from INT to BIGINT:

Backup your data before making changes.
Use DESCRIBE table_name to verify current structure.
Execute: ALTER TABLE table_name MODIFY COLUMN column_name BIGINT;
Verify the change with DESCRIBE again.
Test application functionality thoroughly.

FLOAT vs DOUBLE

Precision: DOUBLE provides higher precision (16 digits) than FLOAT (7 digits).

Storage: FLOAT uses 4 bytes, DOUBLE uses 8 bytes.

Speed: FLOAT computations are faster due to simpler arithmetic.

Use DOUBLE for high-precision requirements, FLOAT when space or speed is criticla.

BLOB vs TEXT

BLOB stores binary data (images, audio, video). TEXT stores character data. TEXT requires a character set for collation and sorting, while BLOB stores raw bytes without character set processing. TEXT comparisons are case-insensitive; BLOB comparisons are case-sensitive.

CHAR vs VARCHAR

CHAR has fixed length; VARCHAR has variable length. CHAR(10) always uses 10 bytes regardless of actual data, while VARCHAR(10) uses only the required bytes plus overhead.

CHAR offers slightly better performance but wastes space. Use CHAR for fixed-length data like country codes, VARCHAR for vraiable-length strings.

Currency Storage

DECIMAL is recommended for monetary values. Use DECIMAL(10,2) for values up to 99,999,999.99. DECIMAL ensures exact precision unlike FLOAT/DOUBLE which are approximate.

Numeric vs String Types

Numeric types are preferred because engines compare numbers once, while strings require character-by-character comparison. Strings also increase storage overhead and reduce query and join performance.

BLOB/TEXT Separation

MySQL temporary tables don't support BLOB/TEXT efficiently. Queries with these types require disk-based temporary tables and additional lookups, impacting performance. If used, separate these columns into extended tables and select only necessary columns.

Choosing Efficient Data Types

Use smaller types: Smaller data types use less disk, memory, and CPU cache.
Keep it simple: Integer operations are faster than string operations.
Avoid NULL when possible: NULL columns complicate indexing and require special handling.

MySQL Character Sets

Character Set	Bytes	Description
GBK	2	Chinese characters, not universal
UTF-8	1-3	Universal, supports mixed languages
latin1	1	MySQL default
utf8mb4	1-4	Full UTF-8 compatibility

Query Execution Process

Client sends query to server
Server checks query cache; returns cached data if available
MySQL parses SQL into a parse tree
Preprocessor validates table/column existence and permissions
Query optimizer selects execution plan
Storage engine executes and returns results

MySQL Architecture Layers

Connection Layer: Manages connections, authentication, and security. Each client connection gets a dedicated thread.

Query Processing Layer: Parses queries, optimizes execution plans, handles stored procedures and views.

Storage Engine Layer: Handles data storage and retrieval. Different engines (InnoDB, MyISAM) provide various features.

MyISAM vs InnoDB

Feature	MyISAM	InnoDB
Transactions	No	Yes
Locking	Table-level	Row-level
Foreign Keys	No	Yes
Primary Key	Optional	Required
Full-text Index	Yes	No (plugin required)
Crash Recovery	No	Yes
Use Case	Read-heavy, simple queries	Write-heavy, transactional

InnoDB Features

ACID transaction support
Foreign key constraints
Row-level locking
Multi-Version Concurrency Control (MVCC)
Buffer pool for caching
Hot backup support
Large dataset support

InnoDB Data Distribution

InnoDB uses clustered indexes where leaf nodes contain primary key values, transaction IDs, rollback pointers, and column data. Secondary indexes store primary key values as pointers rather than row pointers.

Best practice: Insert rows in primary key order. Avoid random primary keys like UUIDs which cause page splits and fragmentation.

ACID Properties

Atomicity: All operations in a transaction complete or none do
Consistency: Database moves from one valid state to another
Isolation: Concurrent transactions don't interfere with each other
Durability: Committed data survives system crashes

MVCC (Multi-Version Concurrency Control)

MVCC allows multiple transactions to read data without conflicts by providing snapshots of data at transaction start. This improves concurrency and read performance without blocking writers.

TRUNCATE vs DELETE

TRUNCATE is faster as it deallocates data pages rather than deleting rows individually. TRUNCATE resets auto-increment counters and cannot be rolled back. It doesn't trigger triggers.

DELETE/UPDATE with LIMIT

Reduces risk of SQL errors affecting all data
Improves efficiency by stopping after first match
Prevents long transactions that lock many rows
Avoids CPU saturation with large deletions
Prevents lock table timeouts

IN vs EXISTS

IN is efficient when subquery returns small result sets. EXISTS is efficient when subquery table is large and can use indexes. NOT IN typically performs full table scans.

Pagination in MySQL vs Oracle

MySQL: SELECT * FROM table LIMIT 10 OFFSET 0

Oracle 11g: Use ROWNUM in subquery

Oracle 12c+: SELECT * FROM table OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY

Query Cache

MySQL caches query results, table metadata, index data (MyISAM key buffer), and InnoDB buffer pool. Enable query cache for repetitive read-only queries with rarely changing data.

Cache doesn't match with different whitespace or comments. Functions like NOW(), user variables, and temporary tables prevent caching.

Primary Key Generation Methods

Auto-increment: Simple, efficient, sequential. Issues in distributed environments.

UUID: Globally unique, no coordination needed.缺点: random order, larger storage, slower queries.

Unique Constraint: Applied to non-primary key columns.

Distributed ID Generation

Database auto-increment with different offsets
UUID generation
Dedicated ID generation services (Snowflake algorithm)
Middleware like Redis or Zookeeper
Business logic-based IDs

Views

Views are virtual tables storing SELECT queries. They simplify complex queries, enforce security by restricting data access, and shield users from table structure changes. Single-table views can update data; multi-table views are typically read-only.

Materialized Views

Unlike regular views, materialized views store actual data. MySQL doesn't natively support them; use tools like Flexviews. They store query results and can be incrementally refreshed.

Index Principles

Don't create too many indexes (typically 5 or fewer)
Indexes increase storage space and slow inserts/updates
Each index requires additional maintenance during data changes

Index Data Structure

Indexes use B-Tree structure for fast data lookup. B-Tree indexes store values in sorted order, with leaf nodes containing actual data pointers. This enables efficient range queries and ORDER BY operations.

When NOT to Use Indexes

Frequently updated columns
Small tables
Columns with low cardinality (few unique values)
BLOB/TEXT columns
Columns used in LIKE patterns starting with wildcard

Index失效Rules

Don't skip composite index columns or use them out of order
Avoid operations on indexed columns (calculations, functions)
Don't use != or <> on indexed columns
Use LIKE with constant prefix, not wildcard at start
Avoid type conversion in WHERE clauses
Avoid OR conditions that bypass indexes

Foreign Key Considerations

Foreign keys require extra queries for validation and add overhead. They can cause lock waits and deadlocks. Many projects avoid foreign keys for performance, implementing referential integrity in application code instead.

Index Types

Regular Index: Basic index, allows duplicates
Unique Index: No duplicate values allowed
Primary Key Index: Unique, non-null, auto-created
Full-text Index: For text search
Clustered Index: Data stored in index order
Non-clustered Index: Separate index and data storage

Composite Index Leftmost Prefix

For composite index (a, b, c), queries using a, or a+b, or a+b+c can use the index. Using only b or c won't use the index.

Covering Index

A covering index contains all columns needed for a query, eliminating the need to access the actual table (回表). Using covering indexes improves performance significantly.

B+Tree Advantages

All data in leaf nodes simplifies range queries
Non-leaf nodes store only keys, enabling more entries per node
Lower tree height reduces disk I/O
Leaf nodes linked for efficient sequential access

Transaction Isolation Levels

READ UNCOMMITTED: Can read uncommitted changes (dirty reads)
READ COMMITTED: Only reads committed data (non-repeatable reads possible)
REPEATABLE READ: Default in InnoDB (phantom reads possible)
SERIALIZABLE: Highest isolation, prevents all concurrency issues

Concurrency Problems

Lost Update: Last update overwrites previous ones
Dirty Read: Reading uncommitted data
Non-repeatable Read: Different results on re-read
Phantom Read: New rows appear in re-read

MySQL Lock Types

Table-level: MyISAM default
Row-level: InnoDB, higher concurrency
Page-level: Between table and row

InnoDB locks: Shared locks, exclusive locks, intention locks, gap locks, auto-increment locks.

Slow Query Optimization

Use EXPLAIN to analyze query plans
Create appropriate indexes
Optimize SQL statements
Limit result sets
Avoid functions in WHERE clauses
Use JOIN instead of subqueries
Consider partitioning large tables

Large Table Optimization

Optimize table structure and data types
Create effective indexes
Use partitioning
Upgrade hardware and database configuration
Archive old data

ORDER BY Optimization

MySQL uses either single-pass or two-pass sorting. Adjust max_length_for_sort_data parameter. Avoid SELECT *, use consistent sort order, ensure composite indexes match query patterns.

JOIN Optimization

Use smaller driving tables. Create indexes on frequently queried columns. Ensure JOIN conditions use indexes. Use EXPLAIN to verify execution plans.

EXPLAIN Execution Plan

Key columns: id (query identifier), type (access type), possible_keys (available indexes), key (actual index used), rows (estimated rows to scan), Extra (additional information).

Good types: const, eq_ref, ref. Bad types: ALL (full table scan), index (full index scan).

Database Connection Pool

A connection pool reuses database connections instead of creating new ones for each request. This reduces overhead from connection establishment and improves application performance.

Sharding and Partitioning

After sharding, handle primary keys using UUID, database auto-increment with offsets, Redis, Snowflake algorithm, or composite keys with sharding logic.

Challenges include cross-shard JOINs, distributed transactions, sorting/pagination, and data consistency.

High Concurrency Updates

Pessimistic Lock: SELECT ... FOR UPDATE locks the row.

Optimistic Lock: Use version column, update only if version matches.

MySQL Binlog Formats

Statement: Logs SQL statements, compact but may have inconsistencies
Row: Logs actual row changes, accurate but larger logs
Mixed: Auto-selects between Statement and Row based on context

Oracle Partitioning Types

Range: Based on value ranges (dates, numbers)
List: Based on specific values
Hash: Even distribution using hash function
Composite: Combination of range and list

Partitioning Considerations

Partition/column mismatch reduces performance
Partition selection can be expensive
Opening all底层 tables has overhead
All partitions must use same storage engine

Tags: MySQL

Back to List

Prev: DOM Manipulation Techniques: Exclusive Selection Patterns and Node Operations

Next: Client-Side Web Storage: localStorage and sessionStorage Compared

Fading Coder