Home > Tech > Content

Managing 400 Million Rows in a Single MySQL Table

Tech 1

Database Design

To store 400 million records efficiently, start with an optimized table structure. For a user profile dataset, we can define the following schema:

erDiagram
    USER_PROFILES {
        int profile_id
        varchar full_name
        int age
        varchar residence
    }

This structure uses concise, appropriate fields to avoid unnecessary overhead.

Database Optimization

When handling massive datasets, performance tuning is critical. Implement these key optimizations:

Optimal Data Types: Choose the smallest data type that can accommodate your data. Use INT for profile_id (sequential identifiers), VARCHAR with explicit length limits for text fields like full_name, and avoid overusing large types like TEXT unless necessary.
Targeted Indexing: Create indexes on columns used frequently in queries, filters, or joins. Set profile_id as the primary key (automatically indexed) and add secondary indexes on columns like age if they’re common in WHERE clauses.
Table Partitioning: Split the table into logical partitions using range-based criteria (e.g., profile_id intervals of 50 million records). Partitioning reduces the volume of data scanned during queries and maintenance operations.

Querying Large Datasets

For efficient queries, focus on filtering only necessary data and leveraging indexes. Example query to retrieve users over 30 residing in Beijing:

SELECT profile_id, full_name, age, residence 
FROM user_profiles 
WHERE age > 30 AND residence = 'Beijing'
LIMIT 1000; -- Use LIMIT to avoid returning excessive rows at once

Bulk Data Insertion

Inserting millions of records is faster with batch inserts, which minimize network round-trips. Here’s an example of a bulk insert statement:

INSERT INTO user_profiles (profile_id, full_name, age, residence)
VALUES
(100001, 'Alice M. Carter', 28, 'Shanghai'),
(100002, 'Robert K. Smith', 36, 'Beijing'),
(100003, 'Catherine L. Jones', 31, 'Guangzhou'),
-- Add up to thousands of additional records per statement
;

For extreme large datasets, consider using LOAD DATA INFILE for even higher throughput.

Batch Updates

Updating the entire table at once can cause locks and performance degradation. Instead, split updates into smaller batches using rangee-based filters:

-- Update 1 million records in each batch
UPDATE user_profiles 
SET residence = 'Shenzhen' 
WHERE profile_id BETWEEN 1 AND 1000000;

UPDATE user_profiles 
SET residence = 'Hangzhou' 
WHERE profile_id BETWEEN 1000001 AND 2000000;
-- Repeat for subsequent profile_id ranges

This approach reduces lock contention and allows the database to process updates incrementally.

Back to List

Prev: Understanding the Diff Algorithm in Vue

Next: Linux Process Management: Concepts, Commands, and Programming Interfaces

Fading Coder

Managing 400 Million Rows in a Single MySQL Table

Database Design

Database Optimization

Querying Large Datasets

Bulk Data Insertion

Batch Updates

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Managing 400 Million Rows in a Single MySQL Table

Database Design

Database Optimization

Querying Large Datasets

Bulk Data Insertion

Batch Updates

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment