Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Generating Realistic Test Data in Python with Faker

Tech May 15 1

The Faker library enables the creation of realistic synthetic data for testing, prototyping, and anonymization tasks in Python applications.

Installation

Installl via pip:

pip install Faker

Basic Usage

Instantiate a generator using either the Faker class or the legacy Factory:

from faker import Faker
fake = Faker()

print(fake.name())        # e.g., 'Li Wei'
print(fake.address())     # e.g., 'Room 802, No. 15 Beijing Road, Shanghai'
print(fake.text())        # Random paragraph in Chinese

Each method call returns a new random value. The generator supports localization—use 'zh_CN' for Chinese data:

fake_cn = Faker('zh_CN')

Data Categories and Examples

Addresses

fake.city()              # 'Hangzhou'
fake.street_address()    # 'No. 205 Nanjing Road'
fake.postcode()          # '200001'
fake.latitude()          # Decimal('31.2304')

Person Details

fake.name()              # 'Zhang Min'
fake.first_name_male()   # 'Jian'
fake.last_name_female()  # 'Wang'

Barcodes

fake.ean13()             # '6901234567892'
fake.ean(length=8)       # '69012345'

Colors

fake.hex_color()         # '#a3c14b'
fake.color_name()        # 'Crimson'

Companies

fake.company()           # 'FutureLink Digital Technology Ltd.'
fake.company_suffix()    # 'Group Co., Ltd.'

Credit Cards

fake.credit_card_number()      # '4532123456789012'
fake.credit_card_expire()      # '09/28'
fake.credit_card_full()        # Full formatted card info

Dates and Times

fake.date_this_year()          # datetime.date(2024, 5, 12)
fake.iso8601()                 # '2008-07-14T13:45:22'
fake.unix_time()               # 1256789012

Internet Data

fake.ipv4()                    # '192.168.1.105'
fake.email()                   # 'li.xiaoming@example.com'
fake.url()                     # 'http://www.chen.org/'
fake.user_agent()              # Browser user agent string

Text Generation (Lorem Ipsum)

fake.sentence()                # '系统支持用户登录功能。'
fake.paragraph()               # Multi-sentence Chinese paragraph
fake.words(5)                  # ['数据', '分析', '模型', '结果', '验证']

Miscellaneous

fake.password()                # 'Kx!9@mQz#Lp2'
fake.uuid4()                   # 'f47ac10b-58cc-4372-a567-0e02b2c3d479'
fake.boolean()                 # True or False
fake.language_code()           # 'zh'

Phone Numbers

fake.phone_number()            # '13812345678'

Python Objects

fake.pyint()                   # 42
fake.pystr(max_chars=10)       # 'aB3xY9qLmN'
fake.pylist(nb_elements=3)     # Mixed-type list
fake.pydict(nb_elements=2)     # {'key1': 'value', 'key2': 123}

User Profiles

fake.profile()                 # Dict with name, address, job, etc.
fake.simple_profile(sex='F')   # Minimal profile with gender constraint

Chinese ID Numbers (SSN)

fake.ssn()                     # '310101199003072316' (18-digit)

Browser and Platform Strings

fake.chrome()                  # Chrome UA string
fake.windows_platform_token()  # 'Windows 10'
fake.mac_processor()           # 'Intel'

Custom Providers

Extand functionality by creating custom providers:

from faker import Faker
from faker.providers import BaseProvider

class BookProvider(BaseProvider):
    def book_title(self):
        titles = ['Data Engineering', 'Python Tricks', 'Cloud Architecture']
        return self.random_element(titles)

fake = Faker()
fake.add_provider(BookProvider)
print(fake.book_title())  # e.g., 'Cloud Architecture'

Reproducible Results

Set a seed to generate consistant output across runs:

fake = Faker()
fake.seed_instance(12345)
print(fake.name())  # Always 'Chen Yifan' with this seed

Command-Line Interface

Generate data directly in the terminal:

# Generate one Chinese address
faker -l zh_CN address

# Output three names separated by semicolons
faker -r 3 -s ';' name

# Get only ssn and name from profile
faker profile ssn name

Common CLI options:

  • -l <locale>: Set language (e.g., zh_CN)
  • -r <count>: Repeat output N times
  • -s <sep>: Append separator after each result
  • -o <file>: Write output to file
Tags: Python

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.