How to generate realistic Django test data without bloating your relationships
The Problem
One of the most common issues when populating a Django dev/test database is relationship inflation.
Here's what typically happens:
# Creating test data the "normal" way
from faker import Faker
fake = Faker()
for _ in range(50):
user = User.objects.create(
username=fake.user_name(),
email=fake.email()
)
Profile.objects.create(
user=user,
bio=fake.text()
)
Result: 50 Profile objects... and 50 unique User objects.
Your database feels empty and unrealistic. In production, you'd typically see natural clustering: one user with multiple profiles, posts, or orders. Instead, you have a perfectly distributed 1:1 relationship that never happens in real applications.
Why This Matters
When testing features like:
- User dashboards (showing "your" content)
- Search and filtering (seeing realistic distributions)
- Performance issues (joins across actual relationships)
- Admin interfaces (pagination with realistic data)
...you need data that looks and behaves like production data. The relationship structure matters as much as the field values.
The Traditional Solutions (And Their Issues)
Option 1: Manual Loops with Random Selection
users = list(User.objects.all())
for _ in range(50):
Profile.objects.create(
user=random.choice(users),
bio=fake.text()
)
Problems:
- Requires pre-existing users
- Manual management of relationships
- No guarantee of realistic distributions
- Breaks with unique constraints
Option 2: Factory Boy / Model Mommy
Great tools, but:
- Still creates new related objects by default
- Requires extensive configuration for relationship reuse
- More boilerplate for complex models
Option 3: Fixtures
# fixtures/data.json
[
{"model": "auth.user", "pk": 1, "fields": {...}},
{"model": "profiles.profile", "pk": 1, "fields": {"user": 1, ...}}
]
Problems:
- Static data (not random/varied)
- Brittle (breaks with model changes)
- Hard to maintain
- Not scalable
The Solution: django-model-populator
We built django-model-populator as a thin, intelligent wrapper on top of Faker.
Key insight: It still uses Faker to generate field data, but adds logic for relationship reuse instead of always creating new related objects.
Installation
pip install django-model-populator
Add to INSTALLED_APPS:
INSTALLED_APPS = [
# ...
'model_populator',
]
Usage
Basic: Generate 50 objects with intelligent relationship handling
python manage.py populate myapp --num 50
That's it. The package:
- ✅ Analyzes your models
- ✅ Generates appropriate fake data for each field
- ✅ Reuses existing ForeignKey relationships
- ✅ Randomly assigns ManyToMany relationships
- ✅ Shows progress bars for large datasets
Advanced Options
# Populate specific models only
python manage.py populate myapp --models User,Profile --num 100
# Populate all apps in project
python manage.py populate --all --num 50
# Control M2M relationship density
python manage.py populate myapp --num 50 --m2m 5
How It Works
Smart Field Mapping
The package recognizes common field patterns and generates appropriate data:
# Your model
class Author(models.Model):
first_name = models.CharField(max_length=50)
last_name = models.CharField(max_length=50)
email = models.EmailField()
phone_number = models.CharField(max_length=20)
bio = models.TextField()
Generated data:
-
first_name→ Real first name (via Faker) -
email→ Valid email address -
phone_number→ Formatted phone number -
bio→ Realistic paragraph text
Intelligent Relationship Handling
class Book(models.Model):
title = models.CharField(max_length=200)
author = models.ForeignKey(Author, on_delete=models.CASCADE)
publisher = models.ForeignKey(Publisher, on_delete=models.CASCADE)
genres = models.ManyToManyField(Genre)
When you run:
python manage.py populate books --num 100
The package:
- Checks if
Authorobjects exist - Reuses existing authors instead of creating 100 new ones
- Same for
Publisher - Randomly assigns 1-5
Genreobjects per book - Creates realistic clustering (some authors have many books, others few)
Progress Visualization
For large datasets:
python manage.py populate myapp --num 10000
Generating Author objects: 100%|████████| 10000/10000
Generating Book objects: 100%|████████| 10000/10000
Setting up relationships: 100%|████████| 10000/10000
Real-World Example
Let's say you're building an e-commerce platform:
# models.py
class Customer(models.Model):
email = models.EmailField(unique=True)
name = models.CharField(max_length=100)
class Order(models.Model):
customer = models.ForeignKey(Customer, on_delete=models.CASCADE)
total = models.DecimalField(max_digits=10, decimal_places=2)
created_at = models.DateTimeField(auto_now_add=True)
class OrderItem(models.Model):
order = models.ForeignKey(Order, on_delete=models.CASCADE)
product = models.ForeignKey(Product, on_delete=models.CASCADE)
quantity = models.IntegerField()
Traditional approach:
# Creates 1000 customers, 1000 orders - each customer has exactly 1 order
With django-model-populator:
# First, create some customers
python manage.py populate myapp --models Customer --num 50
# Then create orders (reuses those 50 customers)
python manage.py populate myapp --models Order --num 500
# Now you have: 50 customers with varying numbers of orders (0-30+)
# Much more realistic!
Configuration
Customize field generation in your Django settings:
# settings.py
MODEL_POPULATOR = {
'FIELD_MAPPINGS': {
'company_name': 'company',
'website': 'url',
}
}
What It Doesn't Do
Being transparent about limitations:
- ❌ Doesn't handle complex validation logic automatically
- ❌ Doesn't guarantee unique values for non-unique fields
- ❌ Won't populate fields requiring external services/APIs
- ❌ Not a replacement for proper test fixtures in unit tests
Use case: Development databases, integration testing, demos, performance testing.
Technical Approach
Under the hood, django-model-populator:
- Uses Django's app registry to discover models
- Analyzes field types and relationships
- Leverages Faker's extensive fake data generators
- Implements a
SafeUniqueProxyfor handling unique constraints - Tracks object creation to enable relationship reuse
- Uses
tqdmfor progress visualization
It's intentionally lightweight (< 500 lines of core code) and relies heavily on Django's ORM and Faker's ecosystem.
Try It Out
pip install django-model-populator
# Quick test with your project
python manage.py populate yourapp --num 10
Links
- GitHub: mathiasag7/django_model_populator
- PyPI: pypi.org/project/django-model-populator
- License: MIT
- Python: 3.9-3.13
- Django: 3.2-5.2
Built With Gratitude
This package wouldn't exist without the incredible Faker library by Daniele Faraglia. django-model-populator is simply adding Django-aware relationship logic on top of Faker's excellent fake data generation.
Feedback Welcome
This is a v0.1.0 release. If you encounter issues with specific field types, relationships, or have ideas for improvement, I'd love to hear about it!
What problems have you faced when generating Django test data? How do you currently handle it?
Drop a comment below or open an issue on GitHub. 🚀
Keywords: Django testing, test data generation, fake data, Faker, database seeding, fixtures, development database
Top comments (0)