Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

MySQL Interview Questions: Database Design, Query Optimization, and Performance Tuning

Tech Apr 28 12

Database Normalization Forms

First Normal Form: Columns cannot be further divided. Each column must contain atomic values, not sets or arrays.

Second Normal Form: Building on the first normal form, all non-key columns must depend entirely on the primary key, not just on part of it. This applies to tables with composite primary keys.

Third Normal Form: Building on the second normal form, there should be no transitive dependencies between non-key columns. In other words, non-key columns should depend only on the primary key, not on other non-key columns.

These normalization forms are essential guidelines for building relational databases. They help designers avoid data anomalies and reduce maintenance costs while improving query efficiency. In practice, appropriate denormalization may be used to improve read performance when business requirements demand it.

Inner Join vs Left Join vs Right Join

  • Inner Join: Returns only rows that have matching values in both tables.
  • Left Join: Returns all rows from the left table and matching rows from the right table. Non-matching right-side columns return NULL.
  • Right Join: Returns all rows from the right table and matching rows from the left table. Non-matching left-side columns return NULL.

Why Prefer Inner Join?

When results are equivalent, prefer inner join for better performance. If using left join, the left table should be as small as possible.

Inner join preserves only matching records from both tables, resulting in fewer rows. Left join retains all rows from the left table regardless of matches, while right join does the same for the right table.

MySQL optimization principles favor using smaller datasets to drive larger ones, which improves overall performance.

Table Join Limitations

It is generally recommended to limit the number of table joins to three or fewer. More joins increase compilation time and memory overhead as temporary tables are created for each join. Complex joins also reduce code readability.

UNION vs UNION ALL

UNION: Combines results and removes duplicate rows. Performs default sorting on the combined result set.

UNION ALL: Combines results without removing duplicates. Does not perform sorting.

UNION ALL is faster when duplicates are acceptable or not present, as it skips deduplication and sorting operations.

COUNT(1), COUNT(*), and COUNT(column) Differences

  • COUNT(1): Counts all rows including those with NULL values in any column.
  • COUNT(*): Counts all rows similarly, equivalent to COUNT(1) in functionality.
  • COUNT(column_name): Counts only non-NULL values in the specified column.

MySQL Data Types Overview

Numeric Types:

  • TINYINT: -128 to 127 (or 0-255 unsigned)
  • SMALLINT: -32768 to 32767 (or 0-65535 unsigned)
  • MEDIUMINT: -8388608 to 8388607 (or 0-16777215 unsigned)
  • INT: -2147483648 to 2147483647 (or 0-4294967295 unsigned)
  • BIGINT: Large integers for significant numeric values

Floating-Point Types:

  • FLOAT: Single-precision (approximately 7 significant digits)
  • DOUBLE: Double-precision (approximately 16 significant digits)
  • DECIMAL: Fixed-point for exact precision, ideal for financial calculations

Date and Time Types:

  • DATE: Year, month, day
  • TIME: Hours, minutes, seconds
  • DATETIME: Full date and time
  • TIMESTAMP: Unix timestamp with automatic updates

String Types:

  • CHAR: Fixed-length strings
  • VARCHAR: Variable-length strings
  • TEXT: Large text data
  • BLOB: Binary large objects

Converting INT to BIGINT

To modify a column from INT to BIGINT:

  1. Backup your data before making changes.
  2. Use DESCRIBE table_name to verify current structure.
  3. Execute: ALTER TABLE table_name MODIFY COLUMN column_name BIGINT;
  4. Verify the change with DESCRIBE again.
  5. Test application functionality thoroughly.

FLOAT vs DOUBLE

Precision: DOUBLE provides higher precision (16 digits) than FLOAT (7 digits).

Storage: FLOAT uses 4 bytes, DOUBLE uses 8 bytes.

Speed: FLOAT computations are faster due to simpler arithmetic.

Use DOUBLE for high-precision requirements, FLOAT when space or speed is criticla.

BLOB vs TEXT

BLOB stores binary data (images, audio, video). TEXT stores character data. TEXT requires a character set for collation and sorting, while BLOB stores raw bytes without character set processing. TEXT comparisons are case-insensitive; BLOB comparisons are case-sensitive.

CHAR vs VARCHAR

CHAR has fixed length; VARCHAR has variable length. CHAR(10) always uses 10 bytes regardless of actual data, while VARCHAR(10) uses only the required bytes plus overhead.

CHAR offers slightly better performance but wastes space. Use CHAR for fixed-length data like country codes, VARCHAR for vraiable-length strings.

Currency Storage

DECIMAL is recommended for monetary values. Use DECIMAL(10,2) for values up to 99,999,999.99. DECIMAL ensures exact precision unlike FLOAT/DOUBLE which are approximate.

Numeric vs String Types

Numeric types are preferred because engines compare numbers once, while strings require character-by-character comparison. Strings also increase storage overhead and reduce query and join performance.

BLOB/TEXT Separation

MySQL temporary tables don't support BLOB/TEXT efficiently. Queries with these types require disk-based temporary tables and additional lookups, impacting performance. If used, separate these columns into extended tables and select only necessary columns.

Choosing Efficient Data Types

  • Use smaller types: Smaller data types use less disk, memory, and CPU cache.
  • Keep it simple: Integer operations are faster than string operations.
  • Avoid NULL when possible: NULL columns complicate indexing and require special handling.

MySQL Character Sets

Character Set Bytes Description
GBK 2 Chinese characters, not universal
UTF-8 1-3 Universal, supports mixed languages
latin1 1 MySQL default
utf8mb4 1-4 Full UTF-8 compatibility

Query Execution Process

  1. Client sends query to server
  2. Server checks query cache; returns cached data if available
  3. MySQL parses SQL into a parse tree
  4. Preprocessor validates table/column existence and permissions
  5. Query optimizer selects execution plan
  6. Storage engine executes and returns results

MySQL Architecture Layers

Connection Layer: Manages connections, authentication, and security. Each client connection gets a dedicated thread.

Query Processing Layer: Parses queries, optimizes execution plans, handles stored procedures and views.

Storage Engine Layer: Handles data storage and retrieval. Different engines (InnoDB, MyISAM) provide various features.

MyISAM vs InnoDB

Feature MyISAM InnoDB
Transactions No Yes
Locking Table-level Row-level
Foreign Keys No Yes
Primary Key Optional Required
Full-text Index Yes No (plugin required)
Crash Recovery No Yes
Use Case Read-heavy, simple queries Write-heavy, transactional

InnoDB Features

  • ACID transaction support
  • Foreign key constraints
  • Row-level locking
  • Multi-Version Concurrency Control (MVCC)
  • Buffer pool for caching
  • Hot backup support
  • Large dataset support

InnoDB Data Distribution

InnoDB uses clustered indexes where leaf nodes contain primary key values, transaction IDs, rollback pointers, and column data. Secondary indexes store primary key values as pointers rather than row pointers.

Best practice: Insert rows in primary key order. Avoid random primary keys like UUIDs which cause page splits and fragmentation.

ACID Properties

  • Atomicity: All operations in a transaction complete or none do
  • Consistency: Database moves from one valid state to another
  • Isolation: Concurrent transactions don't interfere with each other
  • Durability: Committed data survives system crashes

MVCC (Multi-Version Concurrency Control)

MVCC allows multiple transactions to read data without conflicts by providing snapshots of data at transaction start. This improves concurrency and read performance without blocking writers.

TRUNCATE vs DELETE

TRUNCATE is faster as it deallocates data pages rather than deleting rows individually. TRUNCATE resets auto-increment counters and cannot be rolled back. It doesn't trigger triggers.

DELETE/UPDATE with LIMIT

  • Reduces risk of SQL errors affecting all data
  • Improves efficiency by stopping after first match
  • Prevents long transactions that lock many rows
  • Avoids CPU saturation with large deletions
  • Prevents lock table timeouts

IN vs EXISTS

IN is efficient when subquery returns small result sets. EXISTS is efficient when subquery table is large and can use indexes. NOT IN typically performs full table scans.

Pagination in MySQL vs Oracle

MySQL: SELECT * FROM table LIMIT 10 OFFSET 0

Oracle 11g: Use ROWNUM in subquery

Oracle 12c+: SELECT * FROM table OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY

Query Cache

MySQL caches query results, table metadata, index data (MyISAM key buffer), and InnoDB buffer pool. Enable query cache for repetitive read-only queries with rarely changing data.

Cache doesn't match with different whitespace or comments. Functions like NOW(), user variables, and temporary tables prevent caching.

Primary Key Generation Methods

Auto-increment: Simple, efficient, sequential. Issues in distributed environments.

UUID: Globally unique, no coordination needed.缺点: random order, larger storage, slower queries.

Unique Constraint: Applied to non-primary key columns.

Distributed ID Generation

  • Database auto-increment with different offsets
  • UUID generation
  • Dedicated ID generation services (Snowflake algorithm)
  • Middleware like Redis or Zookeeper
  • Business logic-based IDs

Views

Views are virtual tables storing SELECT queries. They simplify complex queries, enforce security by restricting data access, and shield users from table structure changes. Single-table views can update data; multi-table views are typically read-only.

Materialized Views

Unlike regular views, materialized views store actual data. MySQL doesn't natively support them; use tools like Flexviews. They store query results and can be incrementally refreshed.

Index Principles

  • Don't create too many indexes (typically 5 or fewer)
  • Indexes increase storage space and slow inserts/updates
  • Each index requires additional maintenance during data changes

Index Data Structure

Indexes use B-Tree structure for fast data lookup. B-Tree indexes store values in sorted order, with leaf nodes containing actual data pointers. This enables efficient range queries and ORDER BY operations.

When NOT to Use Indexes

  • Frequently updated columns
  • Small tables
  • Columns with low cardinality (few unique values)
  • BLOB/TEXT columns
  • Columns used in LIKE patterns starting with wildcard

Index失效Rules

  • Don't skip composite index columns or use them out of order
  • Avoid operations on indexed columns (calculations, functions)
  • Don't use != or <> on indexed columns
  • Use LIKE with constant prefix, not wildcard at start
  • Avoid type conversion in WHERE clauses
  • Avoid OR conditions that bypass indexes

Foreign Key Considerations

Foreign keys require extra queries for validation and add overhead. They can cause lock waits and deadlocks. Many projects avoid foreign keys for performance, implementing referential integrity in application code instead.

Index Types

  • Regular Index: Basic index, allows duplicates
  • Unique Index: No duplicate values allowed
  • Primary Key Index: Unique, non-null, auto-created
  • Full-text Index: For text search
  • Clustered Index: Data stored in index order
  • Non-clustered Index: Separate index and data storage

Composite Index Leftmost Prefix

For composite index (a, b, c), queries using a, or a+b, or a+b+c can use the index. Using only b or c won't use the index.

Covering Index

A covering index contains all columns needed for a query, eliminating the need to access the actual table (回表). Using covering indexes improves performance significantly.

B+Tree Advantages

  • All data in leaf nodes simplifies range queries
  • Non-leaf nodes store only keys, enabling more entries per node
  • Lower tree height reduces disk I/O
  • Leaf nodes linked for efficient sequential access

Transaction Isolation Levels

  • READ UNCOMMITTED: Can read uncommitted changes (dirty reads)
  • READ COMMITTED: Only reads committed data (non-repeatable reads possible)
  • REPEATABLE READ: Default in InnoDB (phantom reads possible)
  • SERIALIZABLE: Highest isolation, prevents all concurrency issues

Concurrency Problems

  • Lost Update: Last update overwrites previous ones
  • Dirty Read: Reading uncommitted data
  • Non-repeatable Read: Different results on re-read
  • Phantom Read: New rows appear in re-read

MySQL Lock Types

  • Table-level: MyISAM default
  • Row-level: InnoDB, higher concurrency
  • Page-level: Between table and row

InnoDB locks: Shared locks, exclusive locks, intention locks, gap locks, auto-increment locks.

Slow Query Optimization

  • Use EXPLAIN to analyze query plans
  • Create appropriate indexes
  • Optimize SQL statements
  • Limit result sets
  • Avoid functions in WHERE clauses
  • Use JOIN instead of subqueries
  • Consider partitioning large tables

Large Table Optimization

  • Optimize table structure and data types
  • Create effective indexes
  • Use partitioning
  • Upgrade hardware and database configuration
  • Archive old data

ORDER BY Optimization

MySQL uses either single-pass or two-pass sorting. Adjust max_length_for_sort_data parameter. Avoid SELECT *, use consistent sort order, ensure composite indexes match query patterns.

JOIN Optimization

Use smaller driving tables. Create indexes on frequently queried columns. Ensure JOIN conditions use indexes. Use EXPLAIN to verify execution plans.

EXPLAIN Execution Plan

Key columns: id (query identifier), type (access type), possible_keys (available indexes), key (actual index used), rows (estimated rows to scan), Extra (additional information).

Good types: const, eq_ref, ref. Bad types: ALL (full table scan), index (full index scan).

Database Connection Pool

A connection pool reuses database connections instead of creating new ones for each request. This reduces overhead from connection establishment and improves application performance.

Sharding and Partitioning

After sharding, handle primary keys using UUID, database auto-increment with offsets, Redis, Snowflake algorithm, or composite keys with sharding logic.

Challenges include cross-shard JOINs, distributed transactions, sorting/pagination, and data consistency.

High Concurrency Updates

Pessimistic Lock: SELECT ... FOR UPDATE locks the row.

Optimistic Lock: Use version column, update only if version matches.

MySQL Binlog Formats

  • Statement: Logs SQL statements, compact but may have inconsistencies
  • Row: Logs actual row changes, accurate but larger logs
  • Mixed: Auto-selects between Statement and Row based on context

Oracle Partitioning Types

  • Range: Based on value ranges (dates, numbers)
  • List: Based on specific values
  • Hash: Even distribution using hash function
  • Composite: Combination of range and list

Partitioning Considerations

  • Partition/column mismatch reduces performance
  • Partition selection can be expensive
  • Opening all底层 tables has overhead
  • All partitions must use same storage engine
Tags: MySQL

Related Articles

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Overview In a recent project, I utilized the SBUS protocol with the Fus remote controller to control a vehicle's basic operations, including movement, lights, and mode switching. This article is aimed...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.