Generating Realistic Test Data in Python with Faker
The Faker library enables the creation of realistic synthetic data for testing, prototyping, and anonymization tasks in Python applications.
Installation
Installl via pip:
pip install Faker
Basic Usage
Instantiate a generator using either the Faker class or the legacy Factory:
from faker import Faker
fake = Faker()
print(fake.name()) # e.g., 'Li Wei'
print(fake.address()) # e.g., 'Room 802, No. 15 Beijing Road, Shanghai'
print(fake.text()) # Random paragraph in Chinese
Each method call returns a new random value. The generator supports localization—use 'zh_CN' for Chinese data:
fake_cn = Faker('zh_CN')
Data Categories and Examples
Addresses
fake.city() # 'Hangzhou'
fake.street_address() # 'No. 205 Nanjing Road'
fake.postcode() # '200001'
fake.latitude() # Decimal('31.2304')
Person Details
fake.name() # 'Zhang Min'
fake.first_name_male() # 'Jian'
fake.last_name_female() # 'Wang'
Barcodes
fake.ean13() # '6901234567892'
fake.ean(length=8) # '69012345'
Colors
fake.hex_color() # '#a3c14b'
fake.color_name() # 'Crimson'
Companies
fake.company() # 'FutureLink Digital Technology Ltd.'
fake.company_suffix() # 'Group Co., Ltd.'
Credit Cards
fake.credit_card_number() # '4532123456789012'
fake.credit_card_expire() # '09/28'
fake.credit_card_full() # Full formatted card info
Dates and Times
fake.date_this_year() # datetime.date(2024, 5, 12)
fake.iso8601() # '2008-07-14T13:45:22'
fake.unix_time() # 1256789012
Internet Data
fake.ipv4() # '192.168.1.105'
fake.email() # 'li.xiaoming@example.com'
fake.url() # 'http://www.chen.org/'
fake.user_agent() # Browser user agent string
Text Generation (Lorem Ipsum)
fake.sentence() # '系统支持用户登录功能。'
fake.paragraph() # Multi-sentence Chinese paragraph
fake.words(5) # ['数据', '分析', '模型', '结果', '验证']
Miscellaneous
fake.password() # 'Kx!9@mQz#Lp2'
fake.uuid4() # 'f47ac10b-58cc-4372-a567-0e02b2c3d479'
fake.boolean() # True or False
fake.language_code() # 'zh'
Phone Numbers
fake.phone_number() # '13812345678'
Python Objects
fake.pyint() # 42
fake.pystr(max_chars=10) # 'aB3xY9qLmN'
fake.pylist(nb_elements=3) # Mixed-type list
fake.pydict(nb_elements=2) # {'key1': 'value', 'key2': 123}
User Profiles
fake.profile() # Dict with name, address, job, etc.
fake.simple_profile(sex='F') # Minimal profile with gender constraint
Chinese ID Numbers (SSN)
fake.ssn() # '310101199003072316' (18-digit)
Browser and Platform Strings
fake.chrome() # Chrome UA string
fake.windows_platform_token() # 'Windows 10'
fake.mac_processor() # 'Intel'
Custom Providers
Extand functionality by creating custom providers:
from faker import Faker
from faker.providers import BaseProvider
class BookProvider(BaseProvider):
def book_title(self):
titles = ['Data Engineering', 'Python Tricks', 'Cloud Architecture']
return self.random_element(titles)
fake = Faker()
fake.add_provider(BookProvider)
print(fake.book_title()) # e.g., 'Cloud Architecture'
Reproducible Results
Set a seed to generate consistant output across runs:
fake = Faker()
fake.seed_instance(12345)
print(fake.name()) # Always 'Chen Yifan' with this seed
Command-Line Interface
Generate data directly in the terminal:
# Generate one Chinese address
faker -l zh_CN address
# Output three names separated by semicolons
faker -r 3 -s ';' name
# Get only ssn and name from profile
faker profile ssn name
Common CLI options:
-l <locale>: Set language (e.g.,zh_CN)-r <count>: Repeat output N times-s <sep>: Append separator after each result-o <file>: Write output to file