Originally published at recca0120.github.io
Preparing fake data for tests is its own kind of overhead.
To test a user system, you need to build a User object — fill in id, email, name, created_at. If you're testing orders, you need an Order, which contains a list of OrderItem. Hand-crafting all of that often takes longer than writing the actual test logic.
polyfactory solves this: give it a class with type hints, and it generates conforming fake data automatically.
Install
pip install polyfactory
With Pydantic:
pip install polyfactory pydantic
Basic Usage: dataclass
from dataclasses import dataclass
from polyfactory.factories import DataclassFactory
@dataclass
class User:
id: int
name: str
email: str
is_active: bool
class UserFactory(DataclassFactory):
__model__ = User
user = UserFactory.build()
# User(id=42, name='vDjhqXt', email='KpLmn@example.com', is_active=True)
Every build() call produces different values — id is a random int, email is a random string, is_active is random True/False.
Pydantic v2
from pydantic import BaseModel
from polyfactory.factories.pydantic_factory import ModelFactory
class Order(BaseModel):
id: int
amount: float
status: str
items: list[str]
class OrderFactory(ModelFactory):
__model__ = Order
order = OrderFactory.build()
# Order(id=7, amount=3.14, status='aBcD', items=['x', 'y'])
Pydantic validators still run — polyfactory won't generate data that fails validation.
Overriding Specific Fields
Most of the time you only care about a few fields — just pass them in:
# only set status, everything else is auto-filled
order = OrderFactory.build(status="paid")
# or set defaults in the factory class
class PaidOrderFactory(OrderFactory):
status = "paid"
amount = 100.0
This is the pattern I use most: base factory handles the bulk of the data, override the fields that matter for the specific test. No need to hardcode the whole object every time.
Batch Generation
users = UserFactory.batch(10)
# gives you 10 different User objects
Useful when testing a list or pagination logic.
Combining With pytest Fixtures
# conftest.py
import pytest
from polyfactory.factories import DataclassFactory
@pytest.fixture
def user():
return UserFactory.build()
@pytest.fixture
def active_user():
return UserFactory.build(is_active=True)
@pytest.fixture
def users():
return UserFactory.batch(5)
Fixtures return factory-generated objects. The test doesn't need to know what fields User has — only the fields actually relevant to that test need to be specified.
def test_deactivate_user(active_user):
deactivate(active_user)
assert not active_user.is_active
def test_list_users(users):
result = get_user_list(users)
assert len(result) == 5
TypedDict and attrs
from typing import TypedDict
from polyfactory.factories import TypedDictFactory
class Config(TypedDict):
host: str
port: int
debug: bool
class ConfigFactory(TypedDictFactory):
__model__ = Config
config = ConfigFactory.build()
# {'host': 'abc', 'port': 8080, 'debug': False}
Nested Objects
polyfactory handles nested types recursively:
@dataclass
class Address:
city: str
country: str
@dataclass
class User:
name: str
address: Address # nested, auto-generated
class UserFactory(DataclassFactory):
__model__ = User
user = UserFactory.build()
# user.address is also auto-generated
No need to build AddressFactory separately and pass it in.
polyfactory vs Faker
Faker also generates fake data, but you tell it what you want:
from faker import Faker
fake = Faker()
email = fake.email()
name = fake.name()
polyfactory is "give me a class, I'll fill it":
user = UserFactory.build()
They're not competing — they serve different needs:
- Faker: You care what each field looks like (realistic emails, real city names, etc.)
- polyfactory: You just need the types to be correct, the specific values don't matter
Test logic usually doesn't care if an email looks real — it just needs to be a string. For that case, polyfactory is simpler.
Summary
polyfactory's core idea is one thing: types are the specification, factories generate data according to types.
Combined with pytest fixtures, getting test data down to one or two lines is straightforward — you can focus on what the test is actually checking.
Top comments (0)