mathiasag7

Posted on Jan 1

Stop Creating 50 Users When You Only Need 5: Solving Django's Relationship Inflation Problem

#django #python #testing #opensource

How to generate realistic Django test data without bloating your relationships

GitHub: mathiasag7/django_model_populator
PyPI: pypi.org/project/django-model-populator

The Problem

One of the most common issues when populating a Django dev/test database is relationship inflation.

Here's what typically happens:

# Creating test data the "normal" way
from faker import Faker
fake = Faker()

for _ in range(50):
    user = User.objects.create(
        username=fake.user_name(),
        email=fake.email()
    )
    Profile.objects.create(
        user=user,
        bio=fake.text()
    )

Result: 50 Profile objects... and 50 unique User objects.

Your database feels empty and unrealistic. In production, you'd typically see natural clustering: one user with multiple profiles, posts, or orders. Instead, you have a perfectly distributed 1:1 relationship that never happens in real applications.

Why This Matters

When testing features like:

User dashboards (showing "your" content)
Search and filtering (seeing realistic distributions)
Performance issues (joins across actual relationships)
Admin interfaces (pagination with realistic data)

...you need data that looks and behaves like production data. The relationship structure matters as much as the field values.

The Traditional Solutions (And Their Issues)

Option 1: Manual Loops with Random Selection

users = list(User.objects.all())
for _ in range(50):
    Profile.objects.create(
        user=random.choice(users),
        bio=fake.text()
    )

Problems:

Requires pre-existing users
Manual management of relationships
No guarantee of realistic distributions
Breaks with unique constraints

Option 2: Factory Boy / Model Mommy

Great tools, but:

Still creates new related objects by default
Requires extensive configuration for relationship reuse
More boilerplate for complex models

Option 3: Fixtures

# fixtures/data.json
[
  {"model": "auth.user", "pk": 1, "fields": {...}},
  {"model": "profiles.profile", "pk": 1, "fields": {"user": 1, ...}}
]

Problems:

Static data (not random/varied)
Brittle (breaks with model changes)
Hard to maintain
Not scalable

The Solution: django-model-populator

We built django-model-populator as a thin, intelligent wrapper on top of Faker.

Key insight: It still uses Faker to generate field data, but adds logic for relationship reuse instead of always creating new related objects.

Installation

pip install django-model-populator

Add to INSTALLED_APPS:

INSTALLED_APPS = [
    # ...
    'model_populator',
]

Usage

Basic: Generate 50 objects with intelligent relationship handling

python manage.py populate myapp --num 50

That's it. The package:

✅ Analyzes your models
✅ Generates appropriate fake data for each field
✅ Reuses existing ForeignKey relationships
✅ Randomly assigns ManyToMany relationships
✅ Shows progress bars for large datasets

Advanced Options

# Populate specific models only
python manage.py populate myapp --models User,Profile --num 100

# Populate all apps in project
python manage.py populate --all --num 50

# Control M2M relationship density
python manage.py populate myapp --num 50 --m2m 5

How It Works

Smart Field Mapping

The package recognizes common field patterns and generates appropriate data:

# Your model
class Author(models.Model):
    first_name = models.CharField(max_length=50)
    last_name = models.CharField(max_length=50)
    email = models.EmailField()
    phone_number = models.CharField(max_length=20)
    bio = models.TextField()

Generated data:

first_name → Real first name (via Faker)
email → Valid email address
phone_number → Formatted phone number
bio → Realistic paragraph text

Intelligent Relationship Handling

class Book(models.Model):
    title = models.CharField(max_length=200)
    author = models.ForeignKey(Author, on_delete=models.CASCADE)
    publisher = models.ForeignKey(Publisher, on_delete=models.CASCADE)
    genres = models.ManyToManyField(Genre)

When you run:

python manage.py populate books --num 100

The package:

Checks if Author objects exist
Reuses existing authors instead of creating 100 new ones
Same for Publisher
Randomly assigns 1-5 Genre objects per book
Creates realistic clustering (some authors have many books, others few)

Progress Visualization

For large datasets:

python manage.py populate myapp --num 10000

Generating Author objects: 100%|████████| 10000/10000
Generating Book objects: 100%|████████| 10000/10000
Setting up relationships: 100%|████████| 10000/10000

Real-World Example

Let's say you're building an e-commerce platform:

# models.py
class Customer(models.Model):
    email = models.EmailField(unique=True)
    name = models.CharField(max_length=100)

class Order(models.Model):
    customer = models.ForeignKey(Customer, on_delete=models.CASCADE)
    total = models.DecimalField(max_digits=10, decimal_places=2)
    created_at = models.DateTimeField(auto_now_add=True)

class OrderItem(models.Model):
    order = models.ForeignKey(Order, on_delete=models.CASCADE)
    product = models.ForeignKey(Product, on_delete=models.CASCADE)
    quantity = models.IntegerField()

Traditional approach:

# Creates 1000 customers, 1000 orders - each customer has exactly 1 order

With django-model-populator:

# First, create some customers
python manage.py populate myapp --models Customer --num 50

# Then create orders (reuses those 50 customers)
python manage.py populate myapp --models Order --num 500

# Now you have: 50 customers with varying numbers of orders (0-30+)
# Much more realistic!

Configuration

Customize field generation in your Django settings:

# settings.py
MODEL_POPULATOR = {
    'FIELD_MAPPINGS': {
        'company_name': 'company',
        'website': 'url',
    }
}

What It Doesn't Do

Being transparent about limitations:

❌ Doesn't handle complex validation logic automatically
❌ Doesn't guarantee unique values for non-unique fields
❌ Won't populate fields requiring external services/APIs
❌ Not a replacement for proper test fixtures in unit tests

Use case: Development databases, integration testing, demos, performance testing.

Technical Approach

Under the hood, django-model-populator:

Uses Django's app registry to discover models
Analyzes field types and relationships
Leverages Faker's extensive fake data generators
Implements a SafeUniqueProxy for handling unique constraints
Tracks object creation to enable relationship reuse
Uses tqdm for progress visualization

It's intentionally lightweight (< 500 lines of core code) and relies heavily on Django's ORM and Faker's ecosystem.

Try It Out

pip install django-model-populator

# Quick test with your project
python manage.py populate yourapp --num 10

Built With Gratitude

This package wouldn't exist without the incredible Faker library by Daniele Faraglia. django-model-populator is simply adding Django-aware relationship logic on top of Faker's excellent fake data generation.

Feedback Welcome

This is a v0.1.0 release. If you encounter issues with specific field types, relationships, or have ideas for improvement, I'd love to hear about it!

What problems have you faced when generating Django test data? How do you currently handle it?

Drop a comment below or open an issue on GitHub. 🚀

Keywords: Django testing, test data generation, fake data, Faker, database seeding, fixtures, development database

DEV Community