DEV Community

Cover image for Why I Built an AI-Powered Test Data Generator (and When You Shouldn't Use AI for Fixtures)
Yaniv
Yaniv

Posted on

Why I Built an AI-Powered Test Data Generator (and When You Shouldn't Use AI for Fixtures)

Every test suite has the same dirty secret: name="Test User", email="test@test.com", bio="Lorem ipsum". Copy-pasted across 50 tests, never catching real edge cases, never feeling like production data.

I built FixtureForge to fix this — but along the way, I learned that AI is the wrong tool for most of the problem. Here's what I mean.

The Problem With "Just Use Faker"

Faker is great for structured fields — names, emails, phone numbers, addresses. But it can't generate a realistic user bio, a convincing product review, or an angry customer complaint that actually tests your edge cases.

# This is what most test data looks like:
user = User(name="Test User", email="test@test.com", bio="Lorem ipsum...")

# It doesn't catch real-world edge cases.
# It doesn't feel like production data.
# Writing 500 of them by hand? Not happening.
Enter fullscreen mode Exit fullscreen mode

The obvious answer in 2026 is "use AI." But sending every field to an LLM is expensive, slow, and unnecessary. An email address doesn't need AI. An auto-incrementing ID definitely doesn't need AI.

The Insight: Only Semantic Fields Need AI

FixtureForge splits every model field into four tiers:

Tier Examples Generator API Cost
Structural id, user_id, created_at Internal counters / FK registry Free
Standard name, email, phone Faker Free
Computed @computed_field properties Pydantic Free
Semantic bio, description, review LLM (batched) API tokens

The key: 100 users with 2 semantic fields = 2 API calls, not 200. FixtureForge batches all semantic values into a single prompt and asks the LLM to return a JSON array.

from fixtureforge import Forge
from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str
    bio: str

forge = Forge()
users = forge.create_batch(User, count=50, context="SaaS platform users")
Enter fullscreen mode Exit fullscreen mode

FixtureForge routes id to a counter, name and email to Faker, and only bio hits the AI — once, for all 50 records.

CI Mode: No AI, No Network, No Flakiness

This is the part that matters most. In CI, you don't want non-deterministic AI calls making your pipeline flaky. FixtureForge has a deterministic mode:

forge = Forge(use_ai=False, seed=42)
users = forge.create_batch(User, count=100)
# Identical output every run — no network calls
Enter fullscreen mode Exit fullscreen mode

seed=42 guarantees byte-identical output across every run, every machine. Faker handles the standard fields deterministically, and semantic fields fall back to template-based generation. No API key required.

The Context Parameter Is Where It Gets Interesting

The real power isn't generating random data — it's generating data that tests specific scenarios:

angry_users = forge.create_batch(
    Review,
    count=20,
    context="1-star reviews from angry holiday shoppers"
)
Enter fullscreen mode Exit fullscreen mode

Each bio or review field comes back with realistic frustration, specific complaints, edge-case formatting (ALL CAPS, emoji, long rants). This is the kind of data that catches bugs in text processing, truncation, rendering, and content moderation — bugs that "Lorem ipsum" never finds.

pytest Integration

In conftest.py:

from fixtureforge import forge_fixture
from myapp.models import User, Order

forge_fixture(User, count=50)
forge_fixture(Order, count=200)
Enter fullscreen mode Exit fullscreen mode

In your tests:

def test_users_have_emails(users):
    assert all(u.email for u in users)

def test_order_count(orders):
    assert len(orders) == 200
Enter fullscreen mode Exit fullscreen mode

The forge fixture is auto-available. No factory classes to maintain, no fixture files to update.

When You Should NOT Use This

I want to be honest about the limitations:

Don't use FixtureForge if:

  • Your tests only need IDs and emails — Faker alone is sufficient and simpler
  • You're in a strict air-gapped environment with no API access — CI mode works, but you lose the AI-generated quality
  • Your test data needs to match a specific production database schema exactly — use database dumps or migrations instead

Do use FixtureForge if:

  • You need realistic text content (bios, reviews, descriptions) at scale
  • You want to test edge cases in text processing without writing them by hand
  • You need deterministic CI with realistic dev-time data from one tool
  • You're tired of maintaining factory_boy factory classes for every model change

How It Compares

FixtureForge factory_boy faker hypothesis
AI-generated content Yes No No No
Deterministic seed Yes Yes Yes Yes
FK relationships Auto Manual No No
pytest plugin Yes Via pytest-factoryboy No Yes
Large datasets (100k+) Yes Manual loops Manual loops No
Zero config Yes Factory classes needed Provider setup Strategy setup

FixtureForge isn't a replacement for Faker — it uses Faker internally. It's the layer between "I need data" and "I need it to feel real."

Try It

pip install fixtureforge
Enter fullscreen mode Exit fullscreen mode

GitHub: Yaniv2809/fixtureforge
Docs: yaniv2809.github.io/fixtureforge

If you've built something similar or have opinions on AI-generated test data vs traditional fixtures, I'd like to hear about it.


Yaniv Metuku (yaniv2809) — QA Automation Engineer. Also building Financial-Integrity-Ecosystem and Failscope.

Top comments (0)