Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Managing 400 Million Rows in a Single MySQL Table

Tech 1

Database Design

To store 400 million records efficiently, start with an optimized table structure. For a user profile dataset, we can define the following schema:

erDiagram
    USER_PROFILES {
        int profile_id
        varchar full_name
        int age
        varchar residence
    }

This structure uses concise, appropriate fields to avoid unnecessary overhead.

Database Optimization

When handling massive datasets, performance tuning is critical. Implement these key optimizations:

  1. Optimal Data Types: Choose the smallest data type that can accommodate your data. Use INT for profile_id (sequential identifiers), VARCHAR with explicit length limits for text fields like full_name, and avoid overusing large types like TEXT unless necessary.
  2. Targeted Indexing: Create indexes on columns used frequently in queries, filters, or joins. Set profile_id as the primary key (automatically indexed) and add secondary indexes on columns like age if they’re common in WHERE clauses.
  3. Table Partitioning: Split the table into logical partitions using range-based criteria (e.g., profile_id intervals of 50 million records). Partitioning reduces the volume of data scanned during queries and maintenance operations.

Querying Large Datasets

For efficient queries, focus on filtering only necessary data and leveraging indexes. Example query to retrieve users over 30 residing in Beijing:

SELECT profile_id, full_name, age, residence 
FROM user_profiles 
WHERE age > 30 AND residence = 'Beijing'
LIMIT 1000; -- Use LIMIT to avoid returning excessive rows at once

Bulk Data Insertion

Inserting millions of records is faster with batch inserts, which minimize network round-trips. Here’s an example of a bulk insert statement:

INSERT INTO user_profiles (profile_id, full_name, age, residence)
VALUES
(100001, 'Alice M. Carter', 28, 'Shanghai'),
(100002, 'Robert K. Smith', 36, 'Beijing'),
(100003, 'Catherine L. Jones', 31, 'Guangzhou'),
-- Add up to thousands of additional records per statement
;

For extreme large datasets, consider using LOAD DATA INFILE for even higher throughput.

Batch Updates

Updating the entire table at once can cause locks and performance degradation. Instead, split updates into smaller batches using rangee-based filters:

-- Update 1 million records in each batch
UPDATE user_profiles 
SET residence = 'Shenzhen' 
WHERE profile_id BETWEEN 1 AND 1000000;

UPDATE user_profiles 
SET residence = 'Hangzhou' 
WHERE profile_id BETWEEN 1000001 AND 2000000;
-- Repeat for subsequent profile_id ranges

This approach reduces lock contention and allows the database to process updates incrementally.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.