Comprehensive Database Knowledge Points Summarized From Years of Development Practice
Storage Engines
InnoDB
InnoDB is the default transactional storage engine for MySQL, and you only need to switch to other engines when you require features it does not support. It uses MVCC to enable high concurrency, and implements all four standard SQL isolation levels: read uncommitted, read committed, repeatable read, and serializable. Its default isolation level is repeatable read, where it prevents phantom reads through the combination of MVCC and Next-Key Locking.
It uses clustered primary indexes, storing actual data directly in index nodes to avoid random disk reads, which delivers excellent performance for primary key-based queries. Multiple internal optimizations are built into InnoDB, including predictive read-ahead when fetching data from disk, adaptive hash indexes that automatically build hash indexes in memory to speed up read operations, and insert buffers that accelerate bulk write operations. It also supports true online hot backups, a feature no other built-in MySQL storage engine offers; other engines require stopping all write operations to get a consistent view, which can also block read operations in mixed read-write workloads.
MyISAM
This engine has a simple design, storing data in a compact format. It is still suitable for use cases with read-only data, small table sizes, or scenarios where you can tolerate manual repair operations after crashes. It supports multiple specialized features including compressed tables and spatial data indexes.
It does not support transactions or row-level locking, only full-table level locking: read operations add shared locks to all accessed tables, while write operations add exclusive locks. However, it supports concurrent insert, which allows new records to be added to the table even while read operations are running.
You can run manual or automatic check and repair operations on MyISAM tables, but unlike transactional crash recovery, these operations may lead to partial data loss and are extremely slow for large tables. If the DELAY_KEY_WRITE option is enabled, modified index data is not written directly to disk after each update, but stored in an in-memory key buffer first. Index blocks are only flushed to disk when the key buffer is cleaned up or the table is closed. This greatly improves write performance, but index corruption can occur if the database or host crashes, requiring manual repair.
InnoDB vs MyISAM
- Transaction support: InnoDB is a transactional engine, supporting
COMMITandROLLBACKstatements, while MyISAM has no transaction capability. - Concurrency handling: MyISAM only supports table-level locking, while InnoDB also supports fine-grained row-level locking for higher concurrency.
- Foreign key constraints: InnoDB fully supports foreign key constraints to guarantee referential integrity, which MyISAM does not.
- Backup capability: InnoDB supports online hot backups without stopping service, while MyISAM requires stopping writes to take consistent backups.
- Crash recovery: MyISAM tables have a much higher probability of corruption after a crash, and recovery takes far longer than InnoDB.
- Specialized features: MyISAM supports compressed tables and spatial data indexes, which InnoDB added support for in newer versions but are still more mature on MyISAM for legacy use cases.
Index Fundamentals
B+ Tree Principles
Data Structure
B-Tree refers to balanced multi-way search trees, where all leaf nodes are at the same level to guarantee consistent lookup performance. B+ Tree is a variant of B-Tree, implemented by adding sequential access pointers between leaf nodes, and is widely used in database and operating system file systems.
It has two types of nodes: internal (index) nodes and leaf nodes. Internal nodes only store index keys, not actual data, which is stored entirely in leaf nodes. Keys in internal nodes are sorted in ascending order: for any key in an internal node, all keys in its left subtree are smaller than it, and all keys in the right subtree are greater than or equal to it. Leaf node records are also sorted in ascending order, and each leaf node stores pointers to adjacent leaf nodes to support fast range traversal.
Core Operations
Lookup: The lookup process is similar to binary search trees, starting from the root node and traversing downwards. For each node, binary search is used to find the matching sub-pointer, until the target leaf node is reached.
Insert:
- First perform a lookup to find the leaf node (bucket) where the new record should be inserted.
- If the leaf node has remaining space (can hold at most b-1 entries after insertion, where b is the maximum number of elements per node, usually a multiple of the operating system page size), add the record directly.
- If the leaf node is full:
- Split the node into two: the original node retains the first
ceil((L+1)/2)entries, and the new node stores the remainingfloor((L+1)/2)entries, where L is the original number of entries in the full node. - Push the middle key to the parent internal node, and add a pointer to the new node in the parent.
- Repeat the split process upwards if the parent node also becomes full after insertion.
- Split the node into two: the original node retains the first
- If the root node is split, create a new root node pointing to the two split nodes, so B-Trees grow from the root rather than the leaves.
Delete: Similar to the insert process, but uses bottom-up node merging instead of splitting when node occupancy falls below the minimum threshold.
Common Tree Structure Characteristics
AVL Tree: A strictly balanced binary search tree, where the height difference between left and right subtrees of any node cannot exceed 1. Balance is maintained through rotation operations when insertion or deletion breaks the balance condition. Since rotation operations have high overhead, AVL trees are suitable for scenarios with few insert/delete operations but frequent lookups.
Red-Black Tree: A quasi-balanced binary search tree that uses color constraints on nodes to ensure that no path from root to leaf is more than twice as long as any other path. It requires fewer rotation operations to maintain balance than AVL trees, making it more suitable for scenairos with frequent insert/delete operations. Skip lists are used as an alternative to red-black trees in some systems such as Redis, as they are easier to implement and have comparable performance.
B/B+ Tree: Multi-way search trees with high fanout, which greatly reduce the number of disk I/O operations required for lookups, making them ideal for database storage systems.
B+ Tree vs Red-Black Tree for Indexes
While red-black trees and other balanced binary trees can be used to implement indexes, B+ Trees are universally adopted in file systems and database systems for two main reasons:
- Lower disk I/O count: Each B+ Tree node can store multiple elements, resulting in a much smaller tree height than binary trees, so fewer disk accesses are needed to reach the target leaf node.
- Compatibility with disk read-ahead: Disks use pre-fetching to reduce I/O operations, reading consecutive blocks of data (multiples of page size) in a single sequential read without additional seek overhead. Database systems set the size of each B+ Tree node to match the operating system page size, so a single I/O operation can load an entire node into memory.
B+ Tree vs B Tree
- Lower disk I/O overhead: B+ Tree internal nodes do not store pointers to actual data, so they are smaller than B Tree internal nodes. A single disk block can hold more index keys, reducing the number of I/O operations needed to load relevant index entries.
- More stable query performance: All lookups in B+ Trees must traverse from root to leaf node, so the path length for all key lookups is the same, leading to consistent query latency. In B Trees, internal nodes can store data, so some lookups can end earlier, leading to variable performance.
- Higher range query efficiency: B+ Trees support full traversal of all data by simply traversing the linked list of leaf nodes, which is extremely efficient for range queries common in database workloads. B Trees require complex in-order traversal to achieve the same result, which has much lower performance.
MySQL Index Implementation
Indexes are implemented at the storage engine layer rather than the MySQL server layer, so different storage engines support different index types and implementations.
B+ Tree Index
This is the default index type for most MySQL storage engines. It eliminates the need for full table scans, replacing them with tree searches that have much lower latency. The ordered nature of B+ Trees also makes them suitable for sorting and grouping operations. You can define multiple columns as index keys to form a composite index. It supports full key value lookups, key range lookups, and leftmost prefix lookups for composite indexes; it cannot be used if the query does not follow the order of index columns.
InnoDB B+ Tree indexes are divided into primary indexes and secondary indexes. The leaf nodes of primary indexes store complete data rows, which is called a clustered index. A table can only have one clustered index, as data rows cannot be stored in two locations. The leaf nodes of secondary indexes store primary key values, so lookups using secondary indexes require first fetching the primary key, then looking up the corresponding data row in the primary index, a process called a table return lookup.
Hash Index
Hash indexes support O(1) time lookups, but lose the ordered property of tree indexes: they cannot be used for sorting or grouping operations, and only support exact match lookups, not partial or range queries. InnoDB has a built-in feature called adaptive hash index, which automatically creates a hash index on top of frequently accessed B+ Tree index entries to get the benefit of fast hash lookups while retaining the benefits of B+ Trees.
Full-Text Index
MyISAM supports full-text indexes for keyword searches in text content, rather than exact equality comparisons. Queries use MATCH AGAINST syntax instead of standard WHERE clauses. Full-text indexes are implemented using inverted indexes, which map keywords to the documents that contain them. InnoDB added support for full-text indexes in version 5.6.4.
Spatial Data Index
MyISAM supports R-Tree spatial data indexes for geographic data storage, indexing data from all dimensions to support efficient combined queries across any dimension. GIS-related functions are required to maintain spatial data.
Index Optimization Practices
Independent Column Rule
Index columns cannot be part of an expression or parameter of a function in a query, otherwise the index cannot be used. For example, the following query cannot use the index on the user_id column:
SELECT user_id FROM user WHERE user_id + 2 = 7;
Composite Indexes
When queries use multiple columns as filter conditions, using a composite multi-column index delivers better performance than multiple separate single-column indexes. For example, the following query works best with a composite index on (user_id, order_id):
SELECT order_id, user_id FROM user_order WHERE user_id = 42 AND order_id = 1098;
Index Column Order
Place columns with the highest selectivity first in composite indexes. Index selectivity is the ratio of distinct index values to the total number of records, with a maximum value of 1 for unique indexes. Higher selectivity means better distinguishability of records, leading to higher query efficiency. For example, in the following result, buyer_id has higher selectivity than seller_id, so buyer_id should be placed first in a composite index:
SELECT
COUNT(DISTINCT seller_id)/COUNT(*) AS seller_selectivity,
COUNT(DISTINCT buyer_id)/COUNT(*) AS buyer_selectivity,
COUNT(*) AS total_records
FROM transaction;
Sample output:
seller_selectivity: 0.0002
buyer_selectivity: 0.0412
total_records: 19872
Prefix Indexes
For BLOB, TEXT, and VARCHAR columns, you can use prefix indexes that only index the first N characters of the column. The prefix length should be chosen based on the index selectivity to balance performance and storage overhead.
Covering Indexes
An index that contains all fields required for a query is called a covering index, which eliminates the need for table return lookups. It has the following advantages:
- Indexes are much smaller than full data rows, so reading only indexes greatly reduces data access volume.
- Some storage engines like MyISAM only cache indexes in memory, while data is cached by the operating system, so accessing only indexes avoids expensive system calls.
- For InnoDB, if a secondary index covers the query, there is no need to access the clustered primary index, reducing I/O overhead.
Index Tradeoffs
Advantages of Indexes
- Greatly reduce the number of rows the database server needs to scan.
- Avoid sorting and grouping operations that require creating temporary tables, as B+ Tree indexes are ordered and can be used directly for
ORDER BYandGROUP BY. - Convert random I/O to sequential I/O, as ordered B+ Tree indexes store adjacent data together, reducing disk seek overhead.
Applicable Scenarios
- For very small tables, full table scans are usually more efficient than building and using indexes, as the overhead of index traversal and possible table returns outweighs the benefit of avoiding full scans. This assumes the query does not use a covering index or primary key lookup, which would avoid table return operations and may still be faster.
- Indexes deliver significant performance benefits for medium to large tables.
- For extremely large tables, the cost of building and maintaining indexes grows rapidly. In such cases, techniques like partitioning can be used to directly isolate queryable data subsets instead of matching records one by one.
Query Performance Optimization
EXPLAIN Analysis
Use EXPLAIN to analyze SELECT queries, to identify optimization points in query statements. Key fields in EXPLAIN output:
select_type: Indicates the query type, common values includeSIMPLEfor simple queries,UNIONfor union queries,SUBQUERYfor subqueries, etc.table: The table the current query row accesses.possible_keys: The list of indexes that MySQL can choose from for the query.key: The actual index MySQL selected for the query.rows: The estimated number of rows MySQL needs to scan to execute the query.type: The index access type, ordered from best to worst performance:system: The table has only one row, a special case ofconst.const: Query using primary key or unique index that matches exactly one row. Example:
SELECT * FROM product WHERE product_id = 256;eq_ref: Used in join queries, when joining using a primary key or unique index that matches exactly one row per join.ref: Query using a non-unique index for exact match lookups.range: Range scan using primary key, single-column secondary index, or the last column of a composite secondary index, for operations likeBETWEEN,IN,>,<. Example:
SELECT * FROM order WHERE create_time BETWEEN '2024-01-01' AND '2024-01-31';index: Scans the entire index tree, which is faster than full table scan as indexes are smaller than full data, and is used for covering index queries or primary key sorting.all: Full table scan, no index used, worst performance.
Optimize Data Access
Reduce requested data volume:
- Only fetch required columns, avoid using
SELECT *statements. - Limit the number of returned rows using
LIMITclauses. - Cache frequently accessed repeated query results to avoid repeated database queries, especially for static data that rarely changes.
Reduce scanned rows: The most effective way is to use covering indexes to avoid table return operations.
Refactor Query Patterns
Split large queries: Running a single large query can lock a large number of rows, fill up transaction logs, exhaust system resources, and block smaller critical queries. For example, instead of running a single bulk delete:
DELETE FROM operation_log WHERE create_time < DATE_SUB(NOW(), INTERVAL 6 MONTH);
Split it into batches:
affected_rows = 0
while True:
affected_rows = execute_sql(
"DELETE FROM operation_log WHERE create_time < DATE_SUB(NOW(), INTERVAL 6 MONTH) LIMIT 5000"
)
if affected_rows == 0:
break
Decompose large join queries: Split a large multi-table join into multiple single-table queries, then associate the results in the application layer. Benefits include:
- More efficient cache usage: For join queries, if any of the joined tables changes, the entire query cache is invalidated. For split queries, changes to one table only invalidate the cache for that query, while other query caches remain valid.
- The results of single-table queries are more likely to be reused by other queries, reducing redundant data fetching.
- Reduces lock contention on database tables.
- Makes database sharding easier, improving system scalability.
- Can improve query performance by using
INqueries that fetch records in ID order, which is more efficient than random join lookups.
For example, instead of:
SELECT * FROM category
JOIN post_category ON post_category.category_id = category.id
JOIN post ON post_category.post_id = post.id
WHERE category.name = 'database';
Use:
SELECT * FROM category WHERE name = 'database';
-- Assume category_id 17 is returned
SELECT * FROM post_category WHERE category_id = 17;
-- Assume post IDs [12,45,78,93,112] are returned
SELECT * FROM post WHERE id IN (12,45,78,93,112);
Transaction Fundamentals
A transaction is a set of operations that satisfy ACID properties, which can be committed using COMMIT or rolled back using ROLLBACK.
ACID Properties
- Atomicity: All operations in a transaction are treated as an indivisible unit, either all succeed or all fail and are rolled back.
- Consistency: The database remains in a consistent state before and after transaction execution, with all transactions reading the same value for the same data.
- Isolation: Modifications made by a transaction are invisible to other transactions until it is committed.
- Durability: Once a transaction is committed, its modifications are permanently stored in the database, even if the system crashes.
ACID Relationship
- Consistency is the ultimate goal of transaction execution, and the result of a transaction is correct only if consistency is satisfied.
- In non-concurrent scenarios where transactions are executed serially, isolation is automatically satisfied. In this case, satisfying atomicity guarantees consistency. In concurrent scenarios, transactions must satisfy both atomicity and isolation to guarantee consistency.
- Durability ensures that transaction results are not lost in case of database crashes.
Isolation Levels
- Read Uncommitted: Modifications made by a transaction are visible to other transactions even before it is committed.
- Read Committed: A transaction can only read modifications made by already committed transactions, so uncommitted changes are invisible to other transactions.
- Repeatable Read: Guarantees that multiple reads of the same data within the same transaction return the same result.
- Serializable: Forces all transactions to execute serially, requiring locking to implement, while other isolation levels usually do not require explicit locking for read operations.
| Isolation Level | Dirty Read | Non-Repeatable Read | Phantom Read |
|---|---|---|---|
| Read Uncommitted | Yes | Yes | Yes |
| Read Committed | No | Yes | Yes |
| Repeatable Read | No | No | Yes |
| Serializable | No | No | No |
Locking Mechanisms
Locking is a key feature that distinguishes database systems from file systems, used to manage concurrent access to shared resources.
Lock Types
- Shared Lock (S Lock): Allows a transaction to read a row of data.
- Exclusive Lock (X Lock): Allows a transaction to modify or delete a row of data.
- Intention Shared Lock (IS Lock): Indicates that a transaction intends to acquire shared locks on some rows of a table.
- Intention Exclusive Lock (IX Lock): Indicates that a transaction intends to acquire exclusive locks on some rows of a table.
MVCC
Multi-Version Concurrency Control is the mechanism used by InnoDB to implemetn the Read Committed and Repeatable Read isolation levels. Read Uncommitted always reads the latest data row and does not require MVCC, while Serializable requires locking all read rows and cannot be implemented with MVCC alone.
Core Concepts
- Version Numbers: System version number, an incrementing number that increases automatically each time a new transaction starts. Transaction version number, the system version number at the start of the transaction.
- Hidden Columns: Each data row has two hidden columns to store version information: create version number, the system version number when the data row snapshot was created; delete version number, the system version number when the data row was marked as deleted (if any).
- Undo Log: Snapshots used by MVCC are stored in Undo logs, which form a linked list of all snapshots of a data row through rollback pointers.
Implementation for Repeatable Read Isolation Level
- SELECT: A transaction can only read data row snapshots where the create version number is less than or equal to the transaction version number, and the delete version number is either undefined or greater than the transaction version number. This ensures that the transaction only reads data that existed before it started, and has not been deleted before or during the transaction.
- INSERT: Set the create version number of the new data row to the current system version number.
- DELETE: Set the delete version number of the target data row to the current system version number.
- UPDATE: Mark the existing data row as deleted by setting its delete version number to the current system version number, then insert a new data row with the create version number set to the current system version number.
Snapshot Read vs Current Read
- Snapshot read: The default SELECT operation in MVCC reads data from snapshots without requiring locking, improving concurrency.
- Current read: Operations that modify data (INSERT, UPDATE, DELETE) read the latest version of data and require locking. You can also force a SELECT to use current read by adding locking clauses:
-- Add shared lock
SELECT * FROM product WHERE id = 12 LOCK IN SHARE MODE;
-- Add exclusive lock
SELECT * FROM product WHERE id = 12 FOR UPDATE;
Isolation levels define the behavior of current reads, and MySQL introduces snapshot reads to avoid locking for SELECT operations, reducing lock wait time and improving concurrency.
Locking Algorithms
- Record Lock: Locks the index entry for a single record, not the record itself. If a table has no explicit index, InnoDB automatically creates a hidden clustered index on the primary key, so Record Locks still work.
- Gap Lock: Locks the gap between index entries, not the index entries themselves, to prevent insertions into the gap. For example, if a transaction executes
SELECT c FROM t WHERE c BETWEEN 10 AND 20 FOR UPDATE, other transactions cannot insert a value of 15 into column c. - Next-Key Lock: A combination of Record Lock and Gap Lock, which locks both index entries and the gaps before them. For an index with values 10, 11, 13, 20, Next-Key Locks cover the intervals: (-∞,10], (10,11], (11,13], (13,20], (20, +∞).
In InnoDB, non-repeatable read for SELECT operations is solved by MVCC, non-repeatable read for UPDATE/DELETE is solved by Record Lock, and phantom reads are solved by Next-Key Lock.
Common Lock Problems
- Dirty Read: A transaction reads uncommitted modifications made by another transaction. If the other transaction rolls back, the read data is invalid.
- Non-Repeatable Read: A transaction reads the same data multiple times within its execution, but gets different results because another transaction modified and committed the data between reads.
- Phantom Read: A special case of non-repeatable read, where a transaction executes the same query twice, and the second query returns rows that did not exist in the first result, because another transaction inserted matching rows between the two queries.
- Lost Update: The modification of one transaction is overwritten by the modification of another transaction, because both transactions read the same value, modify it, and commit sequentially. This can be solved by adding exclusive locks to SELECT operations, but this may introduce performance overhead and should be used based on business requirements.
Database Sharding
Horizontal Sharding
Also called sharding, splits records from the same table into multiple tables with identical schema, distributed across multiple cluster nodes to reduce the load on a single database, suitable for tables with very large data volumes.
Vertical Sharding
Splits a table into multiple tables by columns, usually grouping frequently accessed columns and infrequently accessed columns into separate tables, or splitting tables by business domain into separate databases (e.g., splitting an e-commerce database into product database, user database, order database).
Sharding Strategies
- Hash modulus:
hash(shard_key) % number_of_shards, evenly distributes data across shards. - Range sharding: Shards data by range of shard key, such as ID range or time range, suitable for frequent range queries.
- Mapping table: Uses a separate mapping table to store the relationship between shard keys and shard nodes, providing maximum flexibility.
Sharding Challenges
- Distributed transactions: Can be solved using protocols like XA, or eventual consistency patterns.
- Cross-shard joins: Can be decomposed into single-shard queries and joined at the application layer.
- Unique ID generation: Use globally unique IDs such as GUID, assign ID ranges per shard, or use distributed ID generators like Twitter's Snowflake algorithm.
Replication and Read-Write Separation
Master-Slave Replication
Relies on three core threads:
- Binlog thread on the master: Writes all data modification operations to the binary log.
- I/O thread on the slave: Reads the binary log from the master and writes it to the slave's relay log.
- SQL thread on the slave: Reads the relay log, parses the modification operations, and replays them on the slave to keep data consistent with the master.
Read-Write Separation
The master node handles write operations and read operations with high real-time requirements, while slave nodes handle non-real-time read operations. Benefits:
- Separates read and write workloads, greatly reducing lock contention.
- Slave nodes can use MyISAM storage engine to improve query performance and reduce system overhead.
- Increases data redundancy and improves service availability.
Read-write separation is usually implemented using a proxy layer that routes read and write requests to the appropriate nodes.
JSON Data Type Operations
Common requirements for JSON type columns include filtering queries by nested JSON fields, and fetching only partial nested fields from JSON columns to reduce memory overhead.
JSON_CONTAINS
JSON_CONTAINS(target, candidate[, path]) returns 1 if the candidate value exists at the specified path in the target JSON document, otherwise returns 0. Use JSON_CONTAINS_PATH to check if a path exists in the JSON document.
Example:
SET @data = '{"id": 1, "name": "test", "ext": {"score": 90}}';
SET @val = '90';
SELECT JSON_CONTAINS(@data, @val, '$.ext.score');
-- Returns 1
SELECT JSON_CONTAINS(@data, @val, '$.id');
-- Returns 0
JSON_CONTAINS_PATH
JSON_CONTAINS_PATH(json_doc, one_or_all, path[, path] ...) returns 1 if data exists at any (one) or all (all) specified paths, otherwise returns 0.
Example:
SET @data = '{"id": 1, "name": "test", "ext": {"score": 90}}';
SELECT JSON_CONTAINS_PATH(@data, 'one', '$.id', '$.ext.rank');
-- Returns 1 (id exists)
SELECT JSON_CONTAINS_PATH(@data, 'all', '$.id', '$.ext.rank');
-- Returns 0 (rank does not exist)
Usage in query conditions:
$criteria = new QueryCriteria();
$criteria->andWhere('org_code', 'in', $orgList);
if (!empty($regionId)) {
$regionCond = new QueryCriteria();
$regionCond->orWhere("JSON_CONTAINS_PATH(new_region_ids,'one', '$.\"$regionId\"')", '=', 1);
$regionCond->orWhere("JSON_CONTAINS_PATH(old_region_ids,'one', '$.\"$regionId\"')", '=', 1);
$criteria->andWhere($regionCond);
}
JSON Path Operators
-> extracts the value at the specified path, while ->> extracts and unquotes the value.
Example:
SELECT ext->'$.score' AS score FROM student;
-- Returns "90" (quoted string)
SELECT ext->>'$.score' AS score FROM student;
-- Returns 90 (unquoted number)
Usage in select clauses:
$taskList = RegionTaskModel::findRows(
['status', 'ext_info->>"$.new_region_list" as new_regions', 'ext_info->>"$.old_region_list" as old_regions'],
$criteria
);
Relational Database Design Theory
Functional Dependency
A -> B indicates that A functionally determines B, or B is functionally dependent on A. A minimal set of attributes that functionally determines all other attributes in a relation is called a candidate key. For A -> B, if there exists a proper subset A' of A such that A' -> B, then B is partially functionally dependent on A; otherwise it is fully functionally dependent. For A -> B and B -> C, C is transitively functionally dependent on A.
Anomalies in Non-Normalized Schemas
Non-normalized relational schemas suffer from four types of anomalies:
- Redundant data: The same data is stored multiple times across different rows.
- Update anomaly: Modifying data in one row leads to inconsistency if other rows with the same data are not updated.
- Delete anomaly: Deleting one record leads to unintended loss of other unrelated data.
- Insert anomaly: Cannot insert new data because it is missing required attributes that are part of the key.
Normal Forms
Normal forms are used to eliminate the above anomalies, with higher normal forms dependent on lower ones, starting with 1NF as the base.
- First Normal Form (1NF): All attributes are atomic and cannot be split further.
- Second Normal Form (2NF): All non-primary attributes are fully functionally dependent on the candidate key, eliminating partial functional dependencies. This can be achieved by decomposing the relation into smaller relations.
- Third Normal Form (3NF): No non-primary attribute is transitively functionally dependent on the candidate key, eliminating transitive dependencies.
ER Diagrams
Entity-Relationship diagrams are used for conceptual design of relational databases, consisting of three core components: entities, attributes, and relationships.
- Relationship types between entities: One-to-one (two directed arrows between entities), one-to-many (single directed arrow pointing to the "one" side), many-to-many (two undirected lines between entities).
- Recursive relationships (e.g., course prerequisites) are represented by multiple lines connecting the same entity.
- Multi-way relationships (e.g., teacher, student, course) represent relationships between more than two entities.
- Subclass entities are represented using a triangle connecting the parent class to the subclass, with shared attributes connected to the parent class and subclass-specific attributes connected to the subclass.