Essential MySQL Interview Questions and Answers
Database Normalization Principles
The three normal forms guide relational database design:
First Normal Form (1NF): Ensures atomic columns with no repeating groups.
Second Normal Form (2NF): Requires all non-key attributes to fully depend on the entire primary key.
Third Normal Form (3NF): Eliminates transitive dependencies among non-key attributes.
These principles help prevent data anomalies and maintain consistency while improving query performance. However, practical implementations may require trade-offs between normalization and performance optimization.
Join Types Comparison
Inner joins return matching records from both tables.
Left joins include all records from the left table and matched records from the right table.
Right joins include all records from the right table and matched records from the left table.
Join Performance Optimization
When results are equivalent, prefer INNER JOINs because they typically return fewer rows. For LEFT JOINs, place smaller tables on the left side to minimize result set size.
The principle of "small table driving large table" improves efficiency by reducing data volume during processing.
Limiting Table Joins
Limit joins to three or fewer tables due to:
- Increased compilation time and overhead
- Temporary table generation per join
- Reduced readability
- Potential poor design if more joins are needed
UNION vs UNION ALL
UNION removes duplicate rows and sorts results, while UNION ALL preserves duplicates and doesn't sort.
UNION ALL performs better when duplicates aren't required and sorting isn't needed.
COUNT Function Variations
COUNT(1): Counts all rows including those with NULL values.
COUNT(*): Equivalent to counting all rows, ignoring NULLs.
COUNT(column): Only counts non-NULL values in specified column.
MySQL Data Types Overview
Integer Types
TINYINT: -128 to 127 or 0 to 255 SMALLINT: -32768 to 32767 or 0 to 65535 MEDIUMINT: -8388608 to 8388607 or 0 to 16777215 INT: -2147483648 to 2147483647 or 0 to 4294967295 BIGINT: -9223372036854775808 to 9223372036854775807
Floating Point Types
FLOAT: Single precision with ~7 digits precision DOUBLE: Double precision with ~16 digits precision DECIMAL: Fixed-point decimal numbers for precise calculations
Date and Time Types
DATE: Year-month-day TIME: Hour-minute-second DATETIME: Date and time TIMESTAMP: Date and time with automatic updates
String Types
CHAR: Fixed-length string VARCHAR: Variable-length string TEXT: Large text data BLOB: Binary large objects
Binary Types
BINARY: Binary string VARBINARY: Variable-length binary string TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB: Different sizes of binary data
Changing INT to BIGINT
- Backup existing data
- Identify target columns and tables
- Execute ALTER TABLE command:
ALTER TABLE table_name MODIFY COLUMN column_name BIGINT; - Verify changes with DESCRIBE
- Test application functionality
FLOAT vs DOUBLE Differences
FLOAT uses 4 bytes with ~7 digit precision DOUBLE uses 8 bytes with ~16 digit precision
Use FLOAT for memory-sensitive applications with lower precision requirements Use DOUBLE for higher accuracy scenarios
BLOB vs TEXT
BLOB stores binary data (images, files) TEXT stores textual data (documents, logs)
TEXT requires character set specification and supports collation BLOB does not use character sets and stores raw bytes
TEXT comparisons are case-insensitive BLOB comparisons are case-sensitive
CHAR vs VARCHAR
CHAR has fixed length, padding with spaces VARCHAR has variable length
CHAR offers slightly better performance VARCHAR saves space but may cause row migration issues
Currency Storage
Use DECIMAL type for financial data Example: DECIMAL(10,2) stores up to 99999999.99
Why Use DECIMAL for Financial Data
DECIMAL provides exact precision without floating-point errors Storage determined by defined width Can handle larger integers than BIGINT
Numerical vs String Usage
Numerical comparisons require single comparison String comparisons check each character Numerical types offer better performance and reduced storage overhead
BLOB/TEXT Separation
Memory temporary tables don't suppport TEXT/BLOB Requires disk-based temporary tables Performance degrades due to secondary queries
Recommend separating BLOB/TEXT columns into separate extension tables
Choosing Efficient Data Types
Smaller is Better
Smaller data types consume less disk, memory, and CPU cache
Simpler is Better
Integer comparisons are faster than string operations
Avoid NULL Values
Use NOT NULL constraints NULL values complicate indexing and increase storage
Oracle Data Types
Character Types
CHAR, VARCHAR2, NCHAR, NVARCHAR2 for various character encodings
Numeric Types
NUMBER type supports integer and decimal values with configurable precision
Date Types
DATE, TIMESTAMP, TIMESTAMP WITH TIME ZONE for date-time handling
LOB Types
BLOB, CLOB, NCLOB for large object storage
RAW Type
Binary data storage
Row ID Types
ROWID, UROWID for unique row identification
Using Default Values Instead of NULL
NULL handling complicates optimization Default values can enable index usage
Avoiding != and <> Operators
These operators often disable index usage Require full table scans
MySQL Character Sets
GBK: 2-byte, Chinese support UTF-8: 3-byte, international standard latin1: 1-byte, default MySQL charset utf8mb4: 4-byte, complete UTF-8 compatibility
Basic Functions
LEFT(), LOWER(), LPAD(), LTRIM(), MID(), POSITION(), REPEAT(), REPLACE(), REVERSE(), RIGHT(), RPAD(), RTRIM(), SPACE(), STRCMP(), SUBSTR(), SUBSTRING(), SUBSTRING_INDEX(), TRIM(), UCASE(), UPPER()
String Concatenation
CONCAT(): Merge strings CONCAT_WS(): Merge with separator
Character Operations
ASCII(): Get first character's ASCII CHAR_LENGTH(): Return character count
Number Operations
FORMAT(): Format numbers with decimal places INSERT(): Replace substring LOCATE(): Find substring position
Mathematical Functions
ABS(), ACOS(), ASIN(), ATAN(), CEIL(), EXP(), FLOOR(), ROUND(), SIGN(), SIN(), TAN(), POW()
Date Functions
DATE_FORMAT(), STR_TO_DATE(), DATE_ADD(), DATE_SUB(), NOW(), SYSDATE(), CURDATE(), CURTIME(), CURTIMESTAMP()
Operators
Arithmetic: +, -, *, /, % Comparison: >, <, =, !=, >=, <= Logical: AND, OR, NOT Bitwise: &, |, ^, <<, >>, >>>
Query Execution Process
- Client sends query to server
- Server checks query cache
- Parser analyzes and validates syntax
- Preprocessor verifies table/column existence and permissions
- Execution engine retrieves data via storage engines
- Results returned and cached
Database Design Standards
- Use lowercase names with underscores
- Avoid reserved keywords
- Keep names descriptive within 32 characters
- Temp tables: tmp_*_YYYYMMDD
- Backup tables: bak_*_timestamp
- Consistent column definitions
Three-Layer Architecture
Layer 1: Connection management, authentication, security Layer 2: Query parsing, optimization Layer 3: Storage engines for data handling
MyISAM vs InnoDB
| Feature | MyISAM | InnoDB |
|---|---|---|
| Transactions | No | Yes |
| Locking | Table-level | Row-level |
| Foreign Keys | No | Yes |
| Primary Key | Not required | Required |
| Full-text Search | Supported | Plugin support |
| Crash Recovery | No | Yes |
| Buffering | Key buffer | Own buffer pool |
| Row Count | Full scan | Stored |
| Use Case | Read-heavy | Write-heavy |
InnoDB Features
- ACID transaction support
- Foreign key constraints
- Row-level locking
- Multi-Version Concurrency Control (MVCC)
- Buffer pool for caching
- Online hot backup
- Supports large datasets
Clustered Index Structure
InnoDB uses clustered indexes where leaf nodes contain primary keys, transaction IDs, rollback pointers, and data columns.
Secondary indexes store primary keys instead of row pointers.
Best practice: Insert in primary key order for optimal performance.
MyISAM Structure
Data stored in insertion order Index nodes contain row numbers
Oracle Data Storage
Data organized in tablespaces containing data files Tables store rows as row fragments
Oracle Logical Block Structure
Block header, row data, available space
Auto-increment Behavior
With InnoDB, auto-increment ID stored in memory After restart, deleted max ID is lost With MyISAM, ID persists after restart
ACID Properties
Atomicity: All-or-nothing transactions Consistency: Data integrity maintained Isolation: Concurrent transactions don't interfere Durability: Committed changes persist
MVCC Explanation
Multi-Version Concurrency Control allows concurrent reads without blocking writes Each transaction sees consistent snapshot of data
Truncate vs Delete
Truncate faster, less logging, resets auto-increment Delete slower, logs operations, maintains auto-increment
Batch Delete Considerations
Prevents accidental mass deletion Improves execution efficiency Avoids long-running transactions Reduces CPU load Prevents lock timeouts
IN vs EXISTS
IN: Subquery executes once, returns all results EXISTS: Checks for existence per row, can use indexes
Pagination Implementation
MySQL:
SELECT * FROM user LIMIT 10 OFFSET 0;
SELECT * FROM user LIMIT 10 OFFSET 10;
Oracle 12c+:
SELECT * FROM employees ORDER BY employee_id OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY;
Pagination Advantages
- Reduced network traffic
- Faster page loading
- Memory efficiency
- Improved performance with large datasets
Pagination Disadvantages
- Data inconsistency with real-time updates
- Sorting complexity
- Performance impact with large data
- Complex implementation
Pagination Optimization
- Use indexes on sort columns
- Avoid full table scans
- Optimize SQL statements
- Implement caching
- Use database partitioning
- Apply lazy loading
- Use pagination plugins
MySQL Caching Mechanisms
- Query cache: Stores SELECT results
- Table cache: Caches table metadata
- Key buffer: MyISAM index caching
- InnoDB buffer pool: Data and index caching
Cache Hit Detection
MySQL compares exact SQL text and client info Even minor differences cause cache misses
Query Cache Impact
Adds overhead for read/write operations Invalidates all related table caches on write
InnoDB and Query Cache
InnoDB controls cache access based on transaction IDs Locks prevent cache access Transaction commits invalidate table caches
Cache Tables vs Summary Tables
Cache tables store frequently accessed data Summary tables aggregate grouped data
Primary Key Uniqueness
Auto-increment
Simple, efficient, but distributed issues
UUID
Global uniqueness, but poor performance and storage
Unique Constraints
Flexible for non-primary key columns
InnoDB Primary Key Requirements
Every InnoDB table must have a primary key Use auto-increment IDs Avoid updating frequent columns
Distributed Primary Key Solutions
- Database sequences
- UUID generation
- Global ID services (Snowflake)
- Middleware solutions (Redis/Zookeeper)
- Business logic combinations
- Sharding strategies
Database Design Principles
- Allocate significant time to design
- Consider system interactions
- Plan for performance optimization
- Add necessary redundancy
- Design proper relationships
- Avoid premature constraints
- Choose appropriate primary key strategy
Design Principles
- Separate business logic from data
- Implement security controls
- Plan for performance needs
- Consider data growth patterns
- Design backup strategies
- Define object relationships
- Clarify table dependencies
- Handle many-to-many relations
- Follow naming conventions
- Separate data types
- Minimize stored procedures
Views
Virtual tables created from SELECT statements Used for security and simplification
Materialized Views
Pre-computed views with actual data storage Not natively supported in MySQL
Cache and Summary Tables
Cache tables: Fast-access slow-data Summary tables: Aggregated grouped data
Counter Tables
Maintain counters in separate tables Use multiple slots for concurrency
Oracle Stored Procedures
CREATE, EXECUTE, parameter passing, modification, deletion
Stored Procedure Best Practices
- Proper variable declaration
- Exception handling
- Transaction management
- Performance optimization
- Permission control
- Logging capabilities
Stored Procedure Optimization
- Minimize loops
- Use temporary tables
- Reduce cursor usage
- Short transactions
- Error handling
- Optimize queries
- Batch processing
Stored Procedures vs Functions
Procedures: Multiple parameters, DML operations, transaction control Functions: Single return value, read-only, no transaction control
Triggers
Automatic execution on INSERT/UPDATE/DELETE Used for consistency, auditing, validation
Index Limitations
Generally limit to 5 indexes per table Index overhead affects write performance
Index Structure
B-Tree structures provide ordered data storage Leaf nodes contain data or pointers
Gender Indexing
Low selectivity makes gender poor index candidate
Indexing Guidelines
- Avoid frequent update fields
- Skip small tables
- Skip high duplicate fields
- Avoid BLOB/TEXT fields
- Avoid LIKE patterns starting with %
Index Failure Prevention
- Use composite indexes properly
- Avoid functions on indexed columns
- Avoid != and IS NULL conditions
- Use covering indexes
- Avoid leading % in LIKE
- Avoid type conversions
- Avoid OR conditions
Foreign Key Considerations
Foreign keys add overhead during updates Potential for deadlocks
OR Condition Issues
May cause index failures Full table scans
Deadlock Resolution
Check SHOW ENGINE INNODB STATUS Reduce transaction duration Adjust isolation levels Ensure consistent lock ordering Use appropriate lock granularity
Duplicate Data Detection
SELECT column_name, COUNT(column_name)
FROM table_name
GROUP BY column_name
HAVING COUNT(column_name) > 1;
Index Types
- Regular indexes
- Unique indexes
- Primary key indexes
- Full-text indexes
- Clustered indexes
- Non-clustered indexes
Composite Index Usage
- Full index match triggers index
- Leftmost prefix required
- OR conditions disable index
Full-Text Index
Designed for text searching Created with FULLTEXT KEY
Clustered Index
Data physically stored in index order Only one per table
Clustered Index Pros/Cons
Pros: Fast range queries, efficient data access Cons: Insert performance, fragmentation, secondary index overhead
Non-Clustered Index
Separate data storage Leaf nodes contain pointers to data
Covering Index
All required columns in index Avoids table lookups
Covering Index Benefits
Reduced I/O, better caching, improved range queries
Index Creation Principles
- Leftmost prefix matching
- High selectivity columns
- Columns in WHERE, JOIN, ORDER BY
Composite Index Left Matching
Index (k1,k2,k3) creates (k1), (k1,k2), (k1,k2,k3)
Index Order Importance
Queries must start from leftmost column
Index Data Structures
B+ trees are most common Leaf nodes contain data or pointers
B+ Tree Advantages
Ordered data, efficient range queries, reduced I/O
B+ Tree Leaf Connections
Chain structure enables sequential range access
B+ Tree Non-Leaf Connections
Pointer links to child nodes
Student Score Query
SELECT student.name
FROM student
WHERE NOT EXISTS (
SELECT course_name
FROM score
WHERE score.student_id = student.id AND score.score <= 80
);
Alternative:
SELECT student_name
FROM scores
GROUP BY student_name
HAVING MIN(score) > 80;
Duplicate Record Removal
SELECT name, age, class, COUNT(*) as count
FROM students
GROUP BY name, age, class
HAVING COUNT(*) > 1;
Transaction Isolation Levels
- READ UNCOMMITTED: Dirty reads possible
- READ COMMITTED: No dirty reads
- REPEATABLE READ: No non-repeatable reads
- SERIALIZABLE: Complete isolation
Choosing Isolation Level
Balance performance vs consistency Consider system requirements
Concurrent Transactions
Issues: Lost updates, dirty reads, non-repeatable reads, phantom reads Solutions: Locking mechanisms, isolation levels
MySQL Lock Types
Table locks, row locks, page locks Optimistic vs pessimistic locking
Page-Level Locks
Used in large transactions and partitioned tables
InnoDB Locks
Shared/exclusive locks, intention locks, gap locks, auto-inc locks
Data Pages vs Rows
Pages: 16KB storage units Rows: Actual data records
Data Page Purpose
Reduce I/O, cache optimization, linked structure
Slow Query Optimization
- Analyze execution plans
- Create/modify indexes
- Optimize data types
- Replace subqueries with JOINs
- Limit result sets
- Avoid functions in WHERE clauses
- Reduce JOIN operations
- Use temporary tables
- Avoid leading wildcards in LIKE
- Batch operations
- Adjust MySQL settings
- Partition tables
- Maintain tables regularly
- Use summary statistics
Large Table Optimization
- Normalize structure
- Optimize indexes
- Use partitioning
- Adjust configuration
- Archive old data
Order By Optimization
Single vs double pass sorting Adjust buffer sizes Avoid SELECT * Use proper index order
Join Optimization
Small table drives large table Proper indexing
Query Cost Metrics
Response time, scanned rows, returned rows Access methods: ALL, INDEX, RANGE, UNIQUE, CONST
Explain Analysis
ID, SELECT_TYPE, TABLE, TYPE, POSSIBLE_KEYS, KEY, KEY_LEN, REF, ROWS, EXTRA
Explain Optimization
Use indexes effectively Minimize scanned rows Optimize key lengths
Large Dataset Optimization
Indexing, sharding, read-write separation, caching, SQL tuning
Sharding Strategies
UUID, DB auto-increment, Redis IDs, Snowflake algorithm, shard splitting
Sharding Problems
Join limitations, transaction complexity, sorting, consistency
Concurrent Updates
Pessimistic locking with SELECT FOR UPDATE Optimistic locking with version fields
SQL Execution Process
Write order: SELECT DISTINCT ... FROM ... JOIN ... ON ... WHERE ... GROUP BY ... HAVING ... ORDER BY ... LIMIT ... Parse order: FROM ... ON ... JOIN ... WHERE ... GROUP BY ... HAVING ... SELECT DISTINCT ... ORDER BY ... LIMIT ...
Explain Understanding
Provides execution plan details Helps identify performance bottlenecks
Explain Usage
Driver table identification Index triggering analysis
Explain Optimization
Reduce ROWS count Optimize KEY_LEN
Million Record Table Optimization
Indexing, sharding, read-write separation, caching, SQL tuning
Sharding Middleware
ShardingSphere, MyCat
Master-Slave Delay
Monitor with SHOW SLAVE STATUS Optimize queries Use multi-threaded replication Implement read-write separation Use semi-synchronous replication
Performance Analysis Tools
EXPLAIN, PROFILING, SHOW STATUS, slow query log, mysqldumpslow, pt-query-digest
MySQL Architecture
Connection layer, service layer, storage engine layer
Binlog Formats
STATEMENT: Logs SQL statements ROW: Logs row changes MIXED: Automatic format selection
Oracle Partitioning
Divide large tables into manageable pieces
Partitioning Benefits
Improved performance, fault tolerance, maintenance
Partitioning Types
Range, list, hash, composite
Partitioning Issues
Index mismatch, high partition costs, table locking overhead
Partitioning Guidelines
Same storage engine, avoid unsupported features