Three years ago I wrote 47 edge case tests for a URL normalizer. Every test followed the same pattern: I thought of a weird input, wrote the test, confirmed the code handled it.
Hypothesis wrote 10,000 edge cases for the same function in 30 seconds and found a bug I'd never have thought to write a test for.
The bug: the normalizer silently returned an empty string when given a URL consisting entirely of whitespace. My 47 tests all used inputs that looked like URLs. Hypothesis found the whitespace case because it's not trying to think of inputs — it's trying to break your function by generating inputs at the boundary of your constraints.
That's the core shift. You stop describing examples and start describing properties.
What property-based testing actually is
A conventional test says: "given this specific input, I expect this specific output."
A property-based test says: "for any input matching this description, this invariant should always hold."
With Hypothesis, you write that invariant, and Hypothesis generates hundreds or thousands of inputs to try to falsify it. If it finds a failing case, it shrinks the input to the minimal example that still fails — so you get a small, debuggable failing case, not just "it broke on a 5,000-character string."
from hypothesis import given, strategies as st
# Conventional test
def test_round_trip_specific():
assert decode(encode("hello world")) == "hello world"
# Property-based test
@given(st.text())
def test_round_trip_any_string(s):
assert decode(encode(s)) == s
The second test runs hundreds of times with different inputs. If encode/decode has a bug with unicode characters, empty strings, null bytes, or strings that are exactly 256 characters long, Hypothesis will find it.
Installing Hypothesis (no account required)
pip install hypothesis
That's it. No API key, no signup, no service to configure. Hypothesis is a pure Python library — it generates inputs locally, shrinks failures locally, stores its example database locally. The full power of property-based testing with zero external dependencies.
# If you're using pytest (which you should be):
pip install hypothesis pytest
The @given decorator integrates directly with pytest. Run your test suite the same way you always do — pytest picks up Hypothesis tests automatically.
Writing your first property
The hardest part of property-based testing is the mental shift from "what specific input should I test?" to "what should always be true?"
Three properties that apply to almost every function:
1. Round-trip invariants — encode/decode, serialize/deserialize, compress/decompress
@given(st.text())
def test_json_round_trip(data):
# Any string that survives JSON serialization should survive a round trip
import json
try:
serialized = json.dumps({"value": data})
result = json.loads(serialized)
assert result["value"] == data
except (ValueError, TypeError):
pass # Some strings can't be JSON-serialized; that's expected
2. Idempotency — applying an operation twice produces the same result as applying it once
@given(st.text())
def test_normalize_idempotent(url):
once = normalize_url(url)
twice = normalize_url(once)
assert once == twice
3. Monotonicity — a sort always produces a result no longer than the input, a filter never produces more items than it received
@given(st.lists(st.integers()))
def test_filter_shrinks_list(items):
result = [x for x in items if x > 0]
assert len(result) <= len(items)
These are starting points. As you get comfortable, you'll start finding properties specific to your domain — and those domain-specific properties are where property-based testing earns its keep.
Hypothesis strategies: describing your input space
The strategies module (st) is how you describe what kind of inputs Hypothesis should generate. Some useful ones:
from hypothesis import strategies as st
# Basic types
st.integers() # any integer
st.integers(min_value=0) # non-negative integers
st.floats(allow_nan=False) # floats, excluding NaN
st.text() # any unicode text
st.text(alphabet=st.characters(whitelist_categories=('Lu', 'Ll', 'Nd'))) # alphanumeric
st.binary() # bytes
st.booleans()
# Collections
st.lists(st.integers()) # list of integers
st.lists(st.text(), min_size=1, max_size=50) # bounded list
st.dictionaries(st.text(), st.integers()) # dict with text keys, int values
st.tuples(st.integers(), st.text()) # fixed-structure tuple
# Composing strategies
st.one_of(st.text(), st.none()) # text or None
st.builds(MyDataClass, name=st.text(), age=st.integers(min_value=0, max_value=150))
The builds strategy is particularly useful — it generates instances of your data classes or Pydantic models by generating each field separately.
from hypothesis import given, strategies as st
from pydantic import BaseModel
class User(BaseModel):
name: str
age: int
@given(st.builds(User, name=st.text(min_size=1), age=st.integers(min_value=0, max_value=150)))
def test_user_serialization_round_trip(user):
assert User.model_validate_json(user.model_dump_json()) == user
The shrinking superpower
When Hypothesis finds a failing input, it doesn't report the raw generated input. It shrinks it — tries progressively simpler inputs until it finds the minimal case that still triggers the failure.
This is the feature that makes property-based testing practical. Without shrinking, a failure report might be "failed on a 3,000-character string with these properties." With shrinking, it's "failed on the string '\x00'."
Falsifying example: test_normalize_idempotent(
url='', # <-- Hypothesis shrunk this to the minimal failing case
)
You don't need to do anything to enable shrinking — it's automatic for all built-in strategies. Custom strategies built with st.composite and standard combinators also shrink automatically.
The example database
Hypothesis stores failing examples in a local .hypothesis/ directory. The next time you run the test suite, Hypothesis tries those failing cases first, before generating new ones. This means:
- Once you've found a bug, the regression test for that bug is implicit — Hypothesis will always try the minimal failing input on future runs.
- After you fix the bug and the test passes, Hypothesis removes the example from the database.
Commit .hypothesis/ to version control to share failing examples across the team.
# .gitignore — do NOT add .hypothesis to this
# .hypothesis/ <-- leave this line out
Integrating with your existing pytest suite
Hypothesis tests look like pytest tests with an extra decorator. The integration is seamless:
# test_normalizer.py
import pytest
from hypothesis import given, settings, strategies as st
from myapp.normalizer import normalize_url
# Conventional test — still useful for documenting expected behavior
def test_normalize_removes_trailing_slash():
assert normalize_url("https://example.com/") == "https://example.com"
# Property test — finds the bugs the conventional tests miss
@given(st.text())
def test_normalize_idempotent(url):
"""Normalizing twice should produce the same result as normalizing once."""
assert normalize_url(normalize_url(url)) == normalize_url(url)
@given(st.text(min_size=1))
def test_normalize_never_empty_on_nonempty_input(url):
"""A non-empty input should never produce an empty normalized URL."""
result = normalize_url(url)
# It's okay to return a default URL, but not an empty string
assert result != ""
Run with pytest as usual. Hypothesis tests are automatically collected and run.
Settings: controlling how hard Hypothesis tries
By default Hypothesis runs each test with a varying number of examples (roughly 100). You can tune this:
from hypothesis import given, settings, strategies as st
@settings(max_examples=1000) # Try 1,000 examples instead of 100
@given(st.text())
def test_important_property(s):
...
@settings(max_examples=50) # Faster, for properties you're less worried about
@given(st.lists(st.integers()))
def test_basic_property(items):
...
In CI, you might want to run with max_examples=500 for important functions and the default elsewhere. The settings can also be configured globally via environment variable or profile.
# conftest.py — set a default for all Hypothesis tests in this suite
from hypothesis import settings
settings.register_profile("ci", max_examples=500)
settings.register_profile("local", max_examples=100)
import os
settings.load_profile(os.getenv("HYPOTHESIS_PROFILE", "local"))
Then in CI: HYPOTHESIS_PROFILE=ci pytest
Where property-based testing earns its keep
Property-based testing is not a replacement for conventional tests. The combination is stronger than either alone:
Use conventional tests for:
- Documenting expected behavior with specific examples
- Testing known edge cases you've already thought of
- Regression tests for specific bugs (the bug had a specific input — document it)
Use Hypothesis for:
- Functions that transform or process data (parsers, normalizers, serializers)
- Functions with invariants that should hold across all inputs (sort stability, round-trip correctness)
- Functions at system boundaries where the input space is large or unpredictable
- Finding the bugs you didn't know to look for
The ratio I've landed on: write the conventional tests first to document behavior, then add one or two @given tests per function that exercise an invariant. The Hypothesis tests don't replace the conventional tests — they find the failures the conventional tests can't anticipate.
The three-line addition that catches the most bugs
If you do nothing else with Hypothesis, add this pattern to your parsers and data-processing functions:
@given(st.text())
def test_does_not_crash(s):
"""The function should handle any input without raising an unexpected exception."""
try:
result = my_function(s)
# If it returns normally, the result should be a valid type
assert isinstance(result, (str, type(None)))
except ValueError:
pass # ValueError is expected for invalid inputs
except Exception as e:
# Anything else is a bug
raise AssertionError(f"Unexpected exception for input {s!r}: {e}") from e
This doesn't assert correctness — it just asserts that the function doesn't blow up with an unhandled exception on arbitrary input. It catches a surprising number of bugs in functions that were "only ever called with valid data" — right up until they weren't.
What you get: the actual coverage difference
My 47-test URL normalizer test suite covered 47 inputs. Running it with Hypothesis at the default 100 examples means roughly 150 inputs total: my 47 plus 100 generated ones.
That's not the point. The 100 generated inputs aren't random — Hypothesis specifically targets:
- Empty strings
- Strings with only whitespace
- Very long strings
- Strings with unicode characters outside ASCII
- Strings with null bytes
- Strings that look like numbers
- Strings at exact power-of-two lengths
It targets the boundary conditions that matter for a text-processing function. My 47 tests didn't include any of those, because I was writing example inputs, not adversarial ones.
The whitespace bug I mentioned at the start? Hypothesis found it in the first run. It had been there for eight months.
Getting started (30 minutes)
Pick one function in your codebase that:
- Takes a string or collection as input
- Returns a transformed version or True/False
- Has an invariant you can state in one sentence ("it should always return a non-empty string", "the output should be a subset of the input", "applying it twice should equal applying it once")
Write the Hypothesis test for that invariant. Run it. See what Hypothesis finds.
The learning curve is the mental shift from example thinking to property thinking. Once you've done it once, you start seeing properties everywhere.
pip install hypothesis
No account. No API key. No service to configure. Local, fast, and it catches the bugs you didn't know to look for.
Part 1 of this series: LocalStack Now Requires an Account — Here's How to Test AWS in Python Without One
Part 2 of this series: pytest fixtures that actually scale — coming April 7
The Automation Cookbook ($39) includes a companion section on test automation patterns — pytest fixtures, mocking strategies, and CI pipeline setup. Available on Gumroad.
Top comments (0)