Stop writing fake test data by hand — I built a library that generates it for you

#python #testing #pydantic #pytest

Every Python project I've worked on has the same problem in the test suite:

user = User(
    name="Test User",
    email="test@test.com",
    age=25,
    bio="Lorem ipsum dolor sit amet",
)

It's not realistic. It doesn't catch edge cases. And when you need 200 of them,
nobody writes them — you just copy-paste the same record and pretend it's a dataset.

I got tired of this and built FixtureForge.

The idea

Define a Pydantic model. Get realistic data.

from fixtureforge import Forge
from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str
    bio: str

forge = Forge()
users = forge.create_batch(User, count=50, context="SaaS platform users")

FixtureForge routes each field to the right generator:

Field	Generator	Cost
`id`	Sequential counter	Free
`name`, `email`	Faker	Free
`bio`	LLM (batched)	1 API call for all 50

Only semantic fields — descriptions, bios, reviews, messages — hit the AI.
Everything else is free.

CI mode: zero AI, fully deterministic

forge = Forge(use_ai=False, seed=42)
users = forge.create_batch(User, count=100)
# Same output on every machine, every run, forever

The seed= parameter controls both Faker and random generation at the instance level —
two Forge(seed=42) instances produce identical data without interfering with each other.

pytest plugin

# conftest.py
from fixtureforge import forge_fixture
from myapp.models import User, Order

forge_fixture(User, count=50)
forge_fixture(Order, count=200)

# test_users.py
def test_all_users_have_emails(users):
    assert all(u.email for u in users)

def test_order_count(orders):
    assert len(orders) == 200

No boilerplate. Fixtures are named automatically from the model
(User → users, OrderItem → order_items).

Verbose mode

Not sure where a value came from? Turn on verbose:

forge = Forge(use_ai=False, seed=42, verbose=True)
user = forge.create(User)

# [structural] id    = 1
# [faker]      name  = 'Allison Hill'
# [faker]      email = 'donaldgarcia@example.net'
# [ai]         bio   = 'Passionate developer with 8 years of experience...'

Foreign keys

customers = forge.create_batch(Customer, count=10)
orders = forge.create_batch(Order, count=100)
# order.customer_id always points to a real customer.id — automatically

Provider-agnostic

export GROQ_API_KEY=gsk_...      # Groq (free tier — 14,400 req/day)
export ANTHROPIC_API_KEY=sk-...  # Claude
export OPENAI_API_KEY=sk-...     # GPT
# No key? Falls back to Faker-only mode. CI never breaks.

What it's not

This isn't a replacement for faker — it uses faker internally.
It's not a replacement for hypothesis — different problem.

It's the layer between "I need realistic data" and
"I need it to feel like production."

How to get it

pip install fixtureforge
pip install "fixtureforge[groq]"  # + AI support via Groq free tier

Docs: yaniv2809.github.io/fixtureforge
GitHub: github.com/Yaniv2809/fixtureforge

I'd genuinely like to hear: what's your current approach to test data?
factory_boy? raw Faker? just hardcoded dicts?
And is there a use case this doesn't cover that you'd want it to?