Every Python project I've worked on has the same problem in the test suite:
user = User(
name="Test User",
email="test@test.com",
age=25,
bio="Lorem ipsum dolor sit amet",
)
It's not realistic. It doesn't catch edge cases. And when you need 200 of them,
nobody writes them — you just copy-paste the same record and pretend it's a dataset.
I got tired of this and built FixtureForge.
The idea
Define a Pydantic model. Get realistic data.
from fixtureforge import Forge
from pydantic import BaseModel
class User(BaseModel):
id: int
name: str
email: str
bio: str
forge = Forge()
users = forge.create_batch(User, count=50, context="SaaS platform users")
FixtureForge routes each field to the right generator:
| Field | Generator | Cost |
|---|---|---|
id |
Sequential counter | Free |
name, email
|
Faker | Free |
bio |
LLM (batched) | 1 API call for all 50 |
Only semantic fields — descriptions, bios, reviews, messages — hit the AI.
Everything else is free.
CI mode: zero AI, fully deterministic
forge = Forge(use_ai=False, seed=42)
users = forge.create_batch(User, count=100)
# Same output on every machine, every run, forever
The seed= parameter controls both Faker and random generation at the instance level —
two Forge(seed=42) instances produce identical data without interfering with each other.
pytest plugin
# conftest.py
from fixtureforge import forge_fixture
from myapp.models import User, Order
forge_fixture(User, count=50)
forge_fixture(Order, count=200)
# test_users.py
def test_all_users_have_emails(users):
assert all(u.email for u in users)
def test_order_count(orders):
assert len(orders) == 200
No boilerplate. Fixtures are named automatically from the model
(User → users, OrderItem → order_items).
Verbose mode
Not sure where a value came from? Turn on verbose:
forge = Forge(use_ai=False, seed=42, verbose=True)
user = forge.create(User)
# [structural] id = 1
# [faker] name = 'Allison Hill'
# [faker] email = 'donaldgarcia@example.net'
# [ai] bio = 'Passionate developer with 8 years of experience...'
Foreign keys
customers = forge.create_batch(Customer, count=10)
orders = forge.create_batch(Order, count=100)
# order.customer_id always points to a real customer.id — automatically
Provider-agnostic
export GROQ_API_KEY=gsk_... # Groq (free tier — 14,400 req/day)
export ANTHROPIC_API_KEY=sk-... # Claude
export OPENAI_API_KEY=sk-... # GPT
# No key? Falls back to Faker-only mode. CI never breaks.
What it's not
This isn't a replacement for faker — it uses faker internally.
It's not a replacement for hypothesis — different problem.
It's the layer between "I need realistic data" and
"I need it to feel like production."
How to get it
pip install fixtureforge
pip install "fixtureforge[groq]" # + AI support via Groq free tier
Docs: yaniv2809.github.io/fixtureforge
GitHub: github.com/Yaniv2809/fixtureforge
I'd genuinely like to hear: what's your current approach to test data?
factory_boy? raw Faker? just hardcoded dicts?
And is there a use case this doesn't cover that you'd want it to?
Top comments (0)