DEV Community

Cover image for Stop Creating 50 Users When You Only Need 5: Solving Django's Relationship Inflation Problem
mathiasag7
mathiasag7

Posted on

Stop Creating 50 Users When You Only Need 5: Solving Django's Relationship Inflation Problem

How to generate realistic Django test data without bloating your relationships

The Problem

One of the most common issues when populating a Django dev/test database is relationship inflation.

Here's what typically happens:

# Creating test data the "normal" way
from faker import Faker
fake = Faker()

for _ in range(50):
    user = User.objects.create(
        username=fake.user_name(),
        email=fake.email()
    )
    Profile.objects.create(
        user=user,
        bio=fake.text()
    )
Enter fullscreen mode Exit fullscreen mode

Result: 50 Profile objects... and 50 unique User objects.

Your database feels empty and unrealistic. In production, you'd typically see natural clustering: one user with multiple profiles, posts, or orders. Instead, you have a perfectly distributed 1:1 relationship that never happens in real applications.

Why This Matters

When testing features like:

  • User dashboards (showing "your" content)
  • Search and filtering (seeing realistic distributions)
  • Performance issues (joins across actual relationships)
  • Admin interfaces (pagination with realistic data)

...you need data that looks and behaves like production data. The relationship structure matters as much as the field values.

The Traditional Solutions (And Their Issues)

Option 1: Manual Loops with Random Selection

users = list(User.objects.all())
for _ in range(50):
    Profile.objects.create(
        user=random.choice(users),
        bio=fake.text()
    )
Enter fullscreen mode Exit fullscreen mode

Problems:

  • Requires pre-existing users
  • Manual management of relationships
  • No guarantee of realistic distributions
  • Breaks with unique constraints

Option 2: Factory Boy / Model Mommy

Great tools, but:

  • Still creates new related objects by default
  • Requires extensive configuration for relationship reuse
  • More boilerplate for complex models

Option 3: Fixtures

# fixtures/data.json
[
  {"model": "auth.user", "pk": 1, "fields": {...}},
  {"model": "profiles.profile", "pk": 1, "fields": {"user": 1, ...}}
]
Enter fullscreen mode Exit fullscreen mode

Problems:

  • Static data (not random/varied)
  • Brittle (breaks with model changes)
  • Hard to maintain
  • Not scalable

The Solution: django-model-populator

We built django-model-populator as a thin, intelligent wrapper on top of Faker.

Key insight: It still uses Faker to generate field data, but adds logic for relationship reuse instead of always creating new related objects.

Installation

pip install django-model-populator
Enter fullscreen mode Exit fullscreen mode

Add to INSTALLED_APPS:

INSTALLED_APPS = [
    # ...
    'model_populator',
]
Enter fullscreen mode Exit fullscreen mode

Usage

Basic: Generate 50 objects with intelligent relationship handling

python manage.py populate myapp --num 50
Enter fullscreen mode Exit fullscreen mode

That's it. The package:

  • ✅ Analyzes your models
  • ✅ Generates appropriate fake data for each field
  • ✅ Reuses existing ForeignKey relationships
  • ✅ Randomly assigns ManyToMany relationships
  • ✅ Shows progress bars for large datasets

Advanced Options

# Populate specific models only
python manage.py populate myapp --models User,Profile --num 100

# Populate all apps in project
python manage.py populate --all --num 50

# Control M2M relationship density
python manage.py populate myapp --num 50 --m2m 5
Enter fullscreen mode Exit fullscreen mode

How It Works

Smart Field Mapping

The package recognizes common field patterns and generates appropriate data:

# Your model
class Author(models.Model):
    first_name = models.CharField(max_length=50)
    last_name = models.CharField(max_length=50)
    email = models.EmailField()
    phone_number = models.CharField(max_length=20)
    bio = models.TextField()
Enter fullscreen mode Exit fullscreen mode

Generated data:

  • first_name → Real first name (via Faker)
  • email → Valid email address
  • phone_number → Formatted phone number
  • bio → Realistic paragraph text

Intelligent Relationship Handling

class Book(models.Model):
    title = models.CharField(max_length=200)
    author = models.ForeignKey(Author, on_delete=models.CASCADE)
    publisher = models.ForeignKey(Publisher, on_delete=models.CASCADE)
    genres = models.ManyToManyField(Genre)
Enter fullscreen mode Exit fullscreen mode

When you run:

python manage.py populate books --num 100
Enter fullscreen mode Exit fullscreen mode

The package:

  1. Checks if Author objects exist
  2. Reuses existing authors instead of creating 100 new ones
  3. Same for Publisher
  4. Randomly assigns 1-5 Genre objects per book
  5. Creates realistic clustering (some authors have many books, others few)

Progress Visualization

For large datasets:

python manage.py populate myapp --num 10000

Generating Author objects: 100%|████████| 10000/10000
Generating Book objects: 100%|████████| 10000/10000
Setting up relationships: 100%|████████| 10000/10000
Enter fullscreen mode Exit fullscreen mode

Real-World Example

Let's say you're building an e-commerce platform:

# models.py
class Customer(models.Model):
    email = models.EmailField(unique=True)
    name = models.CharField(max_length=100)

class Order(models.Model):
    customer = models.ForeignKey(Customer, on_delete=models.CASCADE)
    total = models.DecimalField(max_digits=10, decimal_places=2)
    created_at = models.DateTimeField(auto_now_add=True)

class OrderItem(models.Model):
    order = models.ForeignKey(Order, on_delete=models.CASCADE)
    product = models.ForeignKey(Product, on_delete=models.CASCADE)
    quantity = models.IntegerField()
Enter fullscreen mode Exit fullscreen mode

Traditional approach:

# Creates 1000 customers, 1000 orders - each customer has exactly 1 order
Enter fullscreen mode Exit fullscreen mode

With django-model-populator:

# First, create some customers
python manage.py populate myapp --models Customer --num 50

# Then create orders (reuses those 50 customers)
python manage.py populate myapp --models Order --num 500

# Now you have: 50 customers with varying numbers of orders (0-30+)
# Much more realistic!
Enter fullscreen mode Exit fullscreen mode

Configuration

Customize field generation in your Django settings:

# settings.py
MODEL_POPULATOR = {
    'FIELD_MAPPINGS': {
        'company_name': 'company',
        'website': 'url',
    }
}
Enter fullscreen mode Exit fullscreen mode

What It Doesn't Do

Being transparent about limitations:

  • ❌ Doesn't handle complex validation logic automatically
  • ❌ Doesn't guarantee unique values for non-unique fields
  • ❌ Won't populate fields requiring external services/APIs
  • ❌ Not a replacement for proper test fixtures in unit tests

Use case: Development databases, integration testing, demos, performance testing.

Technical Approach

Under the hood, django-model-populator:

  1. Uses Django's app registry to discover models
  2. Analyzes field types and relationships
  3. Leverages Faker's extensive fake data generators
  4. Implements a SafeUniqueProxy for handling unique constraints
  5. Tracks object creation to enable relationship reuse
  6. Uses tqdm for progress visualization

It's intentionally lightweight (< 500 lines of core code) and relies heavily on Django's ORM and Faker's ecosystem.

Try It Out

pip install django-model-populator

# Quick test with your project
python manage.py populate yourapp --num 10
Enter fullscreen mode Exit fullscreen mode

Links

Built With Gratitude

This package wouldn't exist without the incredible Faker library by Daniele Faraglia. django-model-populator is simply adding Django-aware relationship logic on top of Faker's excellent fake data generation.


Feedback Welcome

This is a v0.1.0 release. If you encounter issues with specific field types, relationships, or have ideas for improvement, I'd love to hear about it!

What problems have you faced when generating Django test data? How do you currently handle it?

Drop a comment below or open an issue on GitHub. 🚀


Keywords: Django testing, test data generation, fake data, Faker, database seeding, fixtures, development database

Top comments (0)