Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Core Principles of Database Systems: A Technical Overview

Tech 3

Foundational Concepts

  1. Data: Symbolic records describing entities. Types include text, graphics, images, audio, and video. A key characteristic is the inseparability of data from its semantics, and data inherently possesses structure.
  2. Database (DB): A long-term, organized, and shared collection of data stored within a computer system.
  3. Database Management System (DBMS): Software for storing and managing data within a database. Examples include Oracle, MySQL, and SQL Server. Core functionalities encompass:
    • Data Definition via Data Definition Language (DDL).
    • Data organization, storage, and management.
    • Data Manipulation via Data Manipulation Language (DML).
    • Transaction management and runtime operations.
  4. Database System (DBS): The complete system comprising the database, DBMS, application software, supporting hardware/software enviroment, database administrators, and end-users.

Evolution of Data Management

  • Manual Stage: Calculations were possible, but results could not be persistently stored due to the absence of disk storage and operating systems.
  • File System Stage: The advent of operating systems with file management allowed for the storage of computational results.
  • Database Stage: Characterized by large capacity, scale, and data sharing capabilities.

Data Independence

  • Logical Independence: Changes to the information content's logical structure (e.g., adding/deleting attributes) do not affect application programs.
  • Physical Independence: Changes to the physical storage location or structure (e.g., file organization) do not affect application programs.

Database System Architecture

Internal Architecture (Three-Schema Architecture)

  1. External Schema (Subschema/User View): Describes the logical structure of the local data visible to and usable by specific database users. Multiple external schemas can exist, each being a subset of the conceptual schema.
  2. Conceptual Schema (Logical Schema): Describes the global logical structure and characteristics of all data in the database. A database has exactly one conceptual schema, which is independent of physical storage details and specific applications.
  3. Internal Schema (Storage Schema): Describes the physical structure and storage methods of data—how data is internally represented. A database has exactly one internal schema.

Data Abstraction and Mapping

Mappings between these levels ensure data independence.

  • External/Conceptual Mapping: Enables logical independence. If the conceptual schema changes, this mapping can be modified to keep the external schemas unchanged, shielding applications.
  • Conceptual/Internal Mapping: Enables physical independence. If the storage structure changes, this mapping can be modified to keep the conceptual schema unchanged.

Data Modeling

Conceptual Data Models

Models data from a semantic perspective, based on the user's viewpoint. It is independent of any specific DBMS. A common model is the Entity-Relationship (E-R) Model.

  • Entity: A distinguishable object (e.g., a Student, a Course). Represented by a rectangle.
  • Attribute: A characteristic of an entity (e.g., StudentID, Name). Represented by an oval.
  • Relationship: An association among entities. Represented by a diamond. Types include: One-to-One (1:1), One-to-Many (1:N), and Many-to-Many (M:N).

Logical Data Models

Describes data from an organizational perspective. The predominant model is the Relational Model.

  • Relation: A table with these properties: each column contains atomic values, and the order of rows/columns is insignificant.
  • Tuple: A row in a table (a record).
  • Attribute: A column in a table.
  • Domain: The set of permissible values for an attribute.
  • Candidate Key: A minimal set of attributes that uniquely identifies a tuple.
  • Primary Key: A selected candidate key.
  • Foreign Key: An attribute (or set) in one relation that serves as the primary key of another relation, establishing a link.

Data Integrity Constraints

Rules to insure stored data is meaningful and correct.

  1. Entity Integrity: Every tuple must be uniquely identifiable (Primary Key cannot be NULL).
  2. Referential Integrity: A foreign key value must either be NULL or match an existing primary key value in the referenced relation.
  3. User-Defined Integrity: Custom rules defined for specific applications (e.g., salary > 0).

Structured Query Language (SQL)

SQL is the standard language for relational databases.

SQL Function Category Key Verbs
Data Definition (DDL) CREATE, DROP, ALTER
Data Query (DQL) SELECT
Data Manipulation (DML) INSERT, UPDATE, DELETE
Data Control (DCL) GRANT, REVOKE

Core SQL Operations

Data Definition:

-- Create a table with constraints
CREATE TABLE Employee (
    EmpID CHAR(7) PRIMARY KEY,
    Name VARCHAR(50) NOT NULL,
    DeptID INT,
    Salary DECIMAL(10,2) DEFAULT 5000 CHECK (Salary >= 0),
    CONSTRAINT FK_Dept FOREIGN KEY (DeptID) REFERENCES Department(DeptID)
);

Data Query:

-- Basic SELECT with WHERE, GROUP BY, HAVING
SELECT DeptID, COUNT(*) AS StaffCount, AVG(Salary) AS AvgSalary
FROM Employee
WHERE HireDate > '2020-01-01'
GROUP BY DeptID
HAVING AVG(Salary) > 6000
ORDER BY AvgSalary DESC;

-- JOIN example
SELECT e.Name, d.DeptName, p.ProjectName
FROM Employee e
INNER JOIN Department d ON e.DeptID = d.DeptID
LEFT JOIN ProjectAssignment pa ON e.EmpID = pa.EmpID
LEFT JOIN Project p ON pa.ProjectID = p.ProjectID;

-- Subquery example
SELECT Name FROM Employee
WHERE Salary > (SELECT AVG(Salary) FROM Employee);

Data Manipulation:

INSERT INTO Employee (EmpID, Name, DeptID) VALUES ('E001', 'Alice', 10);
UPDATE Employee SET Salary = Salary * 1.05 WHERE DeptID = 10;
DELETE FROM Employee WHERE EmpID = 'E999';

Views

A view is a virtual table derived from one or more base tables.

-- Create a view
CREATE VIEW HighEarners AS
SELECT EmpID, Name, Salary FROM Employee WHERE Salary > 8000;

-- Use the view
SELECT * FROM HighEarners ORDER BY Salary DESC;

Purposes: Simplify complex queries, provide customized data perspectives, enhance security, and improve logical data independence.

Database Normalization

A process to eliminate undesirable data dependencies, reducing redundancy and anomalies (Update, Insertion, Deletion).

Functional Dependency

A constraint where the value of one attribute set determines the value of another (X -> Y).

  • Full Functional Dependency: Y is functionally dependent on X but not on any proper subset of X.
  • Partial Functional Dependency: Y is dependent on a proper subset of X.
  • Transitive Dependency: X -> Y, Y -> Z, and Y does not determine X.

Normal Forms

  1. First Normal Form (1NF): All attribute values are atomic (indivisible).
  2. Second Normal Form (2NF): In 1NF, and every non-primary-key attribute is fully functionally dependent on the primary key (eliminates partial dependencies).
  3. Third Normal Form (3NF): In 2NF, and no non-primary-key attribute is transitively dependent on the primary key (eliminates transitive dependencies).
  4. Boyce-Codd Normal Form (BCNF): A stronger form where every determinant is a cendidate key.

Transaction Management

A Transaction is a logical unit of work comprising a sequence of operations that must execute entirely or not at all.

ACID Properties

  • Atomicity: All operations in a transaction are completed; if any part fails, the entire transaction is rolled back.
  • Consistency: A transaction transitions the database from one valid state to another.
  • Isolation: Concurrent transactions execute without interfering with each other.
  • Durability: Once committed, a transaction's changes are permanent.

SQL Transaction Control

-- Explicit transaction in T-SQL style
BEGIN TRANSACTION;
    UPDATE Account SET Balance = Balance - 100 WHERE AccountID = 1;
    UPDATE Account SET Balance = Balance + 100 WHERE AccountID = 2;
    -- If both updates succeed
    COMMIT TRANSACTION;
-- If an error occurs, rollback
-- ROLLBACK TRANSACTION;

Concurrency Control

Manages simultaneous transaction execution to prevent data inconsistencies.

Problems from Uncontrolled Concurrency

  • Lost Update: Two transactions read and update the same data, one overwriting the other's change.
  • Dirty Read: A transaction reads uncomitted data from another transaction, which is later rolled back.
  • Non-Repeatable Read: A transaction reads the same row twice and gets different values because another transaction modified it in between.
  • Phantom Read: A transaction re-executes a query returning a set of rows and finds new rows inserted by another committed transaction.

Lock-Based Protocols

  • Shared Lock (S Lock): For reading; allows other transactions to acquire S locks but not X locks.
  • Exclusive Lock (X Lock): For writing; prevents any other locks on the data item.
  • Two-Phase Locking (2PL) Protocol: Guarantees serializability. It has a growing phase (acquire locks) and a shrinking phase (release locks).

Deadlock Handling

  • Prevention: Methods like requiring all locks at transaction start.
  • Detection & Resolution: Using wait-for graphs; resolving by aborting a chosen transaction.

Database Backup and Recovery

Failure Types & Recovery

  1. Transaction Failure: Logical errors within a transaction. Recovery uses the log to UNDO the transaction's effects.
  2. System Failure (Soft Crash): Power loss, OS crash. Recovery involves UNDOing uncommitted transactions and REDOing committed ones whose changes may not have been written to disk.
  3. Media Failure (Hard Crash): Disk head crash. Recovery requires restoring from a backup and REDOing transactions from the log.

Backup Strategies

  • Data Dumping: Full (complete copy) vs. Incremental (only changed data since last backup).
  • Log File: A chronological record of all updates. Essential for recovery. The Write-Ahead Logging (WAL) rule states: log records must be written to stable storage before the corresponding data page is updated.

Database Design Process

A structured approach typically involving these phases:

  1. Requirements Analysis: Gather and specify data and functional needs, often using Data Flow Diagrams (DFDs) and a Data Dictionary.
  2. Conceptual Design: Create an abstract, DBMS-independent model (e.g., an E-R Diagram) representing entities, attributes, and relationships.
  3. Logical Design: Transform the conceptual model into a specific data model (e.g., relational schema).
  4. Physical Design: Define physical storage structures (file organization, indexing) for the logical schema on a given DBMS.
  5. Implementation & Tuning: Create the database, load data, and optimize performance.

E-R to Relational Mapping Rules

  • Entity: Becomes a relation (table).
  • Relationship:
    • M:N: Becomes a separate relation with the primary keys of the participating entities as its composite primary key.
    • 1:N: Add the primary key of the '1' side as a foreign key in the relation for the 'N' side.
    • 1:1: Add the primary key of one side as a foreign key in the other relation.

Physical Design Considerations

  • Indexing: Speeds up queries but slows down updates. Create indexes on frequently searched columns, used in JOINs, WHERE, ORDER BY, and GROUP BY clauses.
  • Clustering: Storing tuples from one or more relations together on disk basedd on a common attribute (cluster key) to optimize range queries and joins.
  • Hashing: A direct access method effective for equality searches.
Tags: Databasesql

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.