Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Five Effective Approaches for Generating Large-Scale Test Data

Tech May 18 2

In software testing, many scenarios require the creation of substantial data sets to facilitate the testing process. Common situations include:

  • Performance testing that demands large volumes of data
  • Functional testing, such as verifying search functionality with adequate test data
  • Data consistency validation across the system
  • Testing the accuracy of statistical tables and graphs that require extensive data sets

How can testers efficiently generate the required test data? There are multiple approaches with varying levels of complexity and technical depth, which can be selected based on the data volume requirements.

Method 1: Manual Data Entry through UI

For scenarios requiring only a limited amount of data (dozens of records), manual entry through the user interface can be sufficient. While this approach is basic and requires no technical expertise, it becomes impractical as the data volume increases.

Method 2: Automated API Requests

This method offers improved efficiency over menual UI operations. Using API testing tools like JMeter or Postman, testers can configure CSV data sources to send batch requests, thereby generating test data programmatically. Alternatively, Python's requests library can be utilized for similar purposes.

However, this approach requires familiarity with API testing tools and may encounter challenges with API dependencies, local bugs, or performance limitations.

Since both UI and API operations ultimately insert data into the database, a more direct approach would be database manipulation.

Method 3: Direct SQL Insertion

Using SQL statements such as INSERT INTO users (name, contact) VALUES ("name","13444444444") allows direct database manipulation. Despite its direct approach, this method suffers from poor efficiency when inserting records individually, making it impractical for large-scale data generation.

Method 4: Excel Data Import

First, prepare an Excel file on your local machine with data structured according to the database table fields. Use Excel's drag functionality to quickly duplicate data rows.

Next, using a database management tool like Navicat, right-click the target table and select the import wizard. Choose the Excel file type and follow the import steps to transfer all data from the Excel file to the database table.

While this method can efficiently generate thousands of records through Excel duplication, it becomes cumbersome when dealing with larger data volumes, necessitating more efficient approaches.

Method 5: Database Stored Procedures

Stored procedures enable database programming to control data insertion iterations. The following example demonstrates how to generate millions of records:

DELIMITER //
DROP PROCEDURE IF EXISTS generate_bulk_data;
CREATE PROCEDURE generate_bulk_data()
BEGIN 
    DECLARE counter INT;
    DECLARE user_name VARCHAR(25);
    DECLARE phone_number CHAR(11);
    SET counter = 1;
    WHILE counter <= 1000000 DO
        SET user_name = CONCAT('user-', counter);
        SET phone_number = 13000000000 + counter;
        INSERT INTO users(full_name, contact) VALUES(user_name, phone_number);
        SET counter = counter + 1;
    END WHILE;
END //
DELIMITER ;

CALL generate_bulk_data();

Method 6: Python Script for Data Generation

For those with programming experience, Python can be used to automate the data generation process:

import random
import string
import mysql.connector

# Database configuration
db_config = {
    'host': '139.224.61.195',
    'user': 'root',
    'port': 3307,
    'password': '123456',
    'database': 'test'
}

# Establish database connection
connection = mysql.connector.connect(**db_config)
cursor = connection.cursor()

# Generate large-scale data
batch_limit = 10000  # Records per batch
total_needed = 1000000  # Total records to generate

# Retrieve current maximum ID
cursor.execute("SELECT MAX(id) FROM users")
current_max = cursor.fetchone()[0] or 0

for _ in range(total_needed // batch_limit):
    # Prepare batch data
    data_batch = []
    for __ in range(batch_limit):
        current_max += 1
        record_id = current_max
        name = ''.join(random.choices(string.ascii_letters, k=8))
        contact = ''.join(random.choices(string.digits, k=11))
        data_batch.append((record_id, name, contact))
    
    # Execute batch insertion
    query = "INSERT INTO users (id, full_name, contact) VALUES (%s, %s, %s)"
    cursor.executemany(query, data_batch)
    connection.commit()

# Clean up
cursor.close()
connection.close()

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.