Home > Tech > Content

Five Effective Approaches for Generating Large-Scale Test Data

Tech May 18 18

In software testing, many scenarios require the creation of substantial data sets to facilitate the testing process. Common situations include:

Performance testing that demands large volumes of data
Functional testing, such as verifying search functionality with adequate test data
Data consistency validation across the system
Testing the accuracy of statistical tables and graphs that require extensive data sets

How can testers efficiently generate the required test data? There are multiple approaches with varying levels of complexity and technical depth, which can be selected based on the data volume requirements.

Method 1: Manual Data Entry through UI

For scenarios requiring only a limited amount of data (dozens of records), manual entry through the user interface can be sufficient. While this approach is basic and requires no technical expertise, it becomes impractical as the data volume increases.

Method 2: Automated API Requests

This method offers improved efficiency over menual UI operations. Using API testing tools like JMeter or Postman, testers can configure CSV data sources to send batch requests, thereby generating test data programmatically. Alternatively, Python's requests library can be utilized for similar purposes.

However, this approach requires familiarity with API testing tools and may encounter challenges with API dependencies, local bugs, or performance limitations.

Since both UI and API operations ultimately insert data into the database, a more direct approach would be database manipulation.

Method 3: Direct SQL Insertion

Using SQL statements such as INSERT INTO users (name, contact) VALUES ("name","13444444444") allows direct database manipulation. Despite its direct approach, this method suffers from poor efficiency when inserting records individually, making it impractical for large-scale data generation.

Method 4: Excel Data Import

First, prepare an Excel file on your local machine with data structured according to the database table fields. Use Excel's drag functionality to quickly duplicate data rows.

Next, using a database management tool like Navicat, right-click the target table and select the import wizard. Choose the Excel file type and follow the import steps to transfer all data from the Excel file to the database table.

While this method can efficiently generate thousands of records through Excel duplication, it becomes cumbersome when dealing with larger data volumes, necessitating more efficient approaches.

Method 5: Database Stored Procedures

Stored procedures enable database programming to control data insertion iterations. The following example demonstrates how to generate millions of records:

DELIMITER //
DROP PROCEDURE IF EXISTS generate_bulk_data;
CREATE PROCEDURE generate_bulk_data()
BEGIN 
    DECLARE counter INT;
    DECLARE user_name VARCHAR(25);
    DECLARE phone_number CHAR(11);
    SET counter = 1;
    WHILE counter <= 1000000 DO
        SET user_name = CONCAT('user-', counter);
        SET phone_number = 13000000000 + counter;
        INSERT INTO users(full_name, contact) VALUES(user_name, phone_number);
        SET counter = counter + 1;
    END WHILE;
END //
DELIMITER ;

CALL generate_bulk_data();

Method 6: Python Script for Data Generation

For those with programming experience, Python can be used to automate the data generation process:

import random
import string
import mysql.connector

# Database configuration
db_config = {
    'host': '139.224.61.195',
    'user': 'root',
    'port': 3307,
    'password': '123456',
    'database': 'test'
}

# Establish database connection
connection = mysql.connector.connect(**db_config)
cursor = connection.cursor()

# Generate large-scale data
batch_limit = 10000  # Records per batch
total_needed = 1000000  # Total records to generate

# Retrieve current maximum ID
cursor.execute("SELECT MAX(id) FROM users")
current_max = cursor.fetchone()[0] or 0

for _ in range(total_needed // batch_limit):
    # Prepare batch data
    data_batch = []
    for __ in range(batch_limit):
        current_max += 1
        record_id = current_max
        name = ''.join(random.choices(string.ascii_letters, k=8))
        contact = ''.join(random.choices(string.digits, k=11))
        data_batch.append((record_id, name, contact))
    
    # Execute batch insertion
    query = "INSERT INTO users (id, full_name, contact) VALUES (%s, %s, %s)"
    cursor.executemany(query, data_batch)
    connection.commit()

# Clean up
cursor.close()
connection.close()

Back to List

Prev: Design and Implementation of a Cross-Platform Online Tutoring Platform Using Spring Boot and Vue.js

Next: Java Object Serialization and Print Streams

Fading Coder

Five Effective Approaches for Generating Large-Scale Test Data

Method 1: Manual Data Entry through UI

Method 2: Automated API Requests

Method 3: Direct SQL Insertion

Method 4: Excel Data Import

Method 5: Database Stored Procedures

Method 6: Python Script for Data Generation

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Five Effective Approaches for Generating Large-Scale Test Data

Method 1: Manual Data Entry through UI

Method 2: Automated API Requests

Method 3: Direct SQL Insertion

Method 4: Excel Data Import

Method 5: Database Stored Procedures

Method 6: Python Script for Data Generation

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment