DEV Community

Recca Tsai
Recca Tsai

Posted on • Originally published at recca0120.github.io

polyfactory: Stop Hand-Writing Test Fixtures, Let Type Hints Do It

Originally published at recca0120.github.io

Preparing fake data for tests is its own kind of overhead.

To test a user system, you need to build a User object — fill in id, email, name, created_at. If you're testing orders, you need an Order, which contains a list of OrderItem. Hand-crafting all of that often takes longer than writing the actual test logic.

polyfactory solves this: give it a class with type hints, and it generates conforming fake data automatically.

Install

pip install polyfactory
Enter fullscreen mode Exit fullscreen mode

With Pydantic:

pip install polyfactory pydantic
Enter fullscreen mode Exit fullscreen mode

Basic Usage: dataclass

from dataclasses import dataclass
from polyfactory.factories import DataclassFactory

@dataclass
class User:
    id: int
    name: str
    email: str
    is_active: bool

class UserFactory(DataclassFactory):
    __model__ = User

user = UserFactory.build()
# User(id=42, name='vDjhqXt', email='KpLmn@example.com', is_active=True)
Enter fullscreen mode Exit fullscreen mode

Every build() call produces different values — id is a random int, email is a random string, is_active is random True/False.

Pydantic v2

from pydantic import BaseModel
from polyfactory.factories.pydantic_factory import ModelFactory

class Order(BaseModel):
    id: int
    amount: float
    status: str
    items: list[str]

class OrderFactory(ModelFactory):
    __model__ = Order

order = OrderFactory.build()
# Order(id=7, amount=3.14, status='aBcD', items=['x', 'y'])
Enter fullscreen mode Exit fullscreen mode

Pydantic validators still run — polyfactory won't generate data that fails validation.

Overriding Specific Fields

Most of the time you only care about a few fields — just pass them in:

# only set status, everything else is auto-filled
order = OrderFactory.build(status="paid")

# or set defaults in the factory class
class PaidOrderFactory(OrderFactory):
    status = "paid"
    amount = 100.0
Enter fullscreen mode Exit fullscreen mode

This is the pattern I use most: base factory handles the bulk of the data, override the fields that matter for the specific test. No need to hardcode the whole object every time.

Batch Generation

users = UserFactory.batch(10)
# gives you 10 different User objects
Enter fullscreen mode Exit fullscreen mode

Useful when testing a list or pagination logic.

Combining With pytest Fixtures

# conftest.py
import pytest
from polyfactory.factories import DataclassFactory

@pytest.fixture
def user():
    return UserFactory.build()

@pytest.fixture
def active_user():
    return UserFactory.build(is_active=True)

@pytest.fixture
def users():
    return UserFactory.batch(5)
Enter fullscreen mode Exit fullscreen mode

Fixtures return factory-generated objects. The test doesn't need to know what fields User has — only the fields actually relevant to that test need to be specified.

def test_deactivate_user(active_user):
    deactivate(active_user)
    assert not active_user.is_active

def test_list_users(users):
    result = get_user_list(users)
    assert len(result) == 5
Enter fullscreen mode Exit fullscreen mode

TypedDict and attrs

from typing import TypedDict
from polyfactory.factories import TypedDictFactory

class Config(TypedDict):
    host: str
    port: int
    debug: bool

class ConfigFactory(TypedDictFactory):
    __model__ = Config

config = ConfigFactory.build()
# {'host': 'abc', 'port': 8080, 'debug': False}
Enter fullscreen mode Exit fullscreen mode

Nested Objects

polyfactory handles nested types recursively:

@dataclass
class Address:
    city: str
    country: str

@dataclass
class User:
    name: str
    address: Address  # nested, auto-generated

class UserFactory(DataclassFactory):
    __model__ = User

user = UserFactory.build()
# user.address is also auto-generated
Enter fullscreen mode Exit fullscreen mode

No need to build AddressFactory separately and pass it in.

polyfactory vs Faker

Faker also generates fake data, but you tell it what you want:

from faker import Faker
fake = Faker()
email = fake.email()
name = fake.name()
Enter fullscreen mode Exit fullscreen mode

polyfactory is "give me a class, I'll fill it":

user = UserFactory.build()
Enter fullscreen mode Exit fullscreen mode

They're not competing — they serve different needs:

  • Faker: You care what each field looks like (realistic emails, real city names, etc.)
  • polyfactory: You just need the types to be correct, the specific values don't matter

Test logic usually doesn't care if an email looks real — it just needs to be a string. For that case, polyfactory is simpler.

Summary

polyfactory's core idea is one thing: types are the specification, factories generate data according to types.

Combined with pytest fixtures, getting test data down to one or two lines is straightforward — you can focus on what the test is actually checking.

References

Top comments (0)