Yaniv

Posted on Apr 17

Why I Built an AI-Powered Test Data Generator (and When You Shouldn't Use AI for Fixtures)

#ai #python #testing #opensource

Every test suite has the same dirty secret: name="Test User", email="test@test.com", bio="Lorem ipsum". Copy-pasted across 50 tests, never catching real edge cases, never feeling like production data.

I built FixtureForge to fix this — but along the way, I learned that AI is the wrong tool for most of the problem. Here's what I mean.

The Problem With "Just Use Faker"

Faker is great for structured fields — names, emails, phone numbers, addresses. But it can't generate a realistic user bio, a convincing product review, or an angry customer complaint that actually tests your edge cases.

# This is what most test data looks like:
user = User(name="Test User", email="test@test.com", bio="Lorem ipsum...")

# It doesn't catch real-world edge cases.
# It doesn't feel like production data.
# Writing 500 of them by hand? Not happening.

The obvious answer in 2026 is "use AI." But sending every field to an LLM is expensive, slow, and unnecessary. An email address doesn't need AI. An auto-incrementing ID definitely doesn't need AI.

The Insight: Only Semantic Fields Need AI

FixtureForge splits every model field into four tiers:

Tier	Examples	Generator	API Cost
Structural	`id`, `user_id`, `created_at`	Internal counters / FK registry	Free
Standard	`name`, `email`, `phone`	Faker	Free
Computed	`@computed_field` properties	Pydantic	Free
Semantic	`bio`, `description`, `review`	LLM (batched)	API tokens

The key: 100 users with 2 semantic fields = 2 API calls, not 200. FixtureForge batches all semantic values into a single prompt and asks the LLM to return a JSON array.

from fixtureforge import Forge
from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str
    bio: str

forge = Forge()
users = forge.create_batch(User, count=50, context="SaaS platform users")

FixtureForge routes id to a counter, name and email to Faker, and only bio hits the AI — once, for all 50 records.

CI Mode: No AI, No Network, No Flakiness

This is the part that matters most. In CI, you don't want non-deterministic AI calls making your pipeline flaky. FixtureForge has a deterministic mode:

forge = Forge(use_ai=False, seed=42)
users = forge.create_batch(User, count=100)
# Identical output every run — no network calls

seed=42 guarantees byte-identical output across every run, every machine. Faker handles the standard fields deterministically, and semantic fields fall back to template-based generation. No API key required.

The Context Parameter Is Where It Gets Interesting

The real power isn't generating random data — it's generating data that tests specific scenarios:

angry_users = forge.create_batch(
    Review,
    count=20,
    context="1-star reviews from angry holiday shoppers"
)

Each bio or review field comes back with realistic frustration, specific complaints, edge-case formatting (ALL CAPS, emoji, long rants). This is the kind of data that catches bugs in text processing, truncation, rendering, and content moderation — bugs that "Lorem ipsum" never finds.

pytest Integration

In conftest.py:

from fixtureforge import forge_fixture
from myapp.models import User, Order

forge_fixture(User, count=50)
forge_fixture(Order, count=200)

In your tests:

def test_users_have_emails(users):
    assert all(u.email for u in users)

def test_order_count(orders):
    assert len(orders) == 200

The forge fixture is auto-available. No factory classes to maintain, no fixture files to update.

When You Should NOT Use This

I want to be honest about the limitations:

Don't use FixtureForge if:

Your tests only need IDs and emails — Faker alone is sufficient and simpler
You're in a strict air-gapped environment with no API access — CI mode works, but you lose the AI-generated quality
Your test data needs to match a specific production database schema exactly — use database dumps or migrations instead

Do use FixtureForge if:

You need realistic text content (bios, reviews, descriptions) at scale
You want to test edge cases in text processing without writing them by hand
You need deterministic CI with realistic dev-time data from one tool
You're tired of maintaining factory_boy factory classes for every model change

How It Compares

	FixtureForge	factory_boy	faker	hypothesis
AI-generated content	Yes	No	No	No
Deterministic seed	Yes	Yes	Yes	Yes
FK relationships	Auto	Manual	No	No
pytest plugin	Yes	Via pytest-factoryboy	No	Yes
Large datasets (100k+)	Yes	Manual loops	Manual loops	No
Zero config	Yes	Factory classes needed	Provider setup	Strategy setup

FixtureForge isn't a replacement for Faker — it uses Faker internally. It's the layer between "I need data" and "I need it to feel real."

Try It

pip install fixtureforge

GitHub: Yaniv2809/fixtureforge
Docs: yaniv2809.github.io/fixtureforge

If you've built something similar or have opinions on AI-generated test data vs traditional fixtures, I'd like to hear about it.

Yaniv Metuku (yaniv2809) — QA Automation Engineer. Also building Financial-Integrity-Ecosystem and Failscope.

Top comments (2)

AI Bug Slayer 🐞 • Apr 18

Smart approach — using Faker for deterministic fields and AI only for semantic/complex ones keeps costs low while still making test data feel realistic. The batching strategy is clever.

Yaniv • Apr 20

Thanks! The batching was actually the hardest part to get right. The naive approach (one API call per record per field) made 50 users cost 100 calls — completely impractical. The trick was restructuring the prompt to ask for a JSON array of all values at once, so 50 bios = 1 call regardless of count.
The next thing I want to improve is caching — if you generate 50 users today and 50 tomorrow with the same context, the AI calls repeat unnecessarily. Thinking about hashing the context + model schema as a cache key so repeated runs in dev skip the API entirely.
Have you tried any AI-based data generation in your workflow, or mostly sticking with Faker/factory_boy?