Mike Lane

Posted on Jan 28

Your Tests Pass. But Would They Catch This Bug?

#python #testing #pytest #opensource

You have 90% code coverage, green CI, and you ship. A user reports that >= should have been >. Your tests executed that line but never verified the boundary mattered.

Code coverage counts executed lines. Mutation testing injects small bugs and checks whether your tests detect them. If tests still pass after changing >= to >, you found a gap.

Why Mutation Testing Has Been Impractical

Traditional tools (mutmut, cosmic-ray) rewrite source files, reload modules, and run the full test suite per mutation. A codebase with 100 mutations and a 10-second test suite takes 17+ minutes. That runtime kills feedback loops.

pytest-gremlins Architecture

pytest-gremlins achieves 13.8x speedup through three mechanisms:

Mutation Switching: All mutations are embedded during a single instrumentation pass. Switching between mutations requires only an environment variable change, eliminating per-mutation file I/O and module reloads.

Coverage-Guided Test Selection: The plugin tracks which tests cover each line. When testing a mutation on line 42, it runs only the 3 tests that touch line 42 instead of all 200 tests.

Incremental Caching: Results are keyed by content hash of source and test files. Unchanged code skips mutation testing entirely on subsequent runs.

Benchmark: pytest-gremlins vs mutmut

Measured on Python 3.12 in Docker:

Configuration	Time	vs. mutmut
mutmut	14.90s	baseline
pytest-gremlins (sequential)	17.79s	0.84x
pytest-gremlins (parallel)	3.99s	3.7x faster
pytest-gremlins (parallel + cache)	1.08s	13.8x faster

Sequential mode is slower because pytest-gremlins runs additional mutation operators. Parallel mode, safe due to mutation switching (no shared mutable state), delivers the speedup. Cached runs approach instant for unchanged code.

Installation and Usage

pip install pytest-gremlins
pytest --gremlins --gremlin-parallel --gremlin-cache

Output identifies specific gaps:

================== pytest-gremlins mutation report ==================

Zapped: 142 gremlins (89%)
Survived: 18 gremlins (11%)

Top surviving gremlins:
  src/auth.py:42    >= → >     (boundary not tested)
  src/utils.py:17   + → -      (arithmetic not verified)
  src/api.py:88     True → False (return value unchecked)
=====================================================================

Each survivor is a line number, the mutation applied, and the gap it reveals. Line 42 has a boundary condition no test verifies.

Configuration

Add to pyproject.toml:

[tool.pytest-gremlins]
operators = ["comparison", "arithmetic", "boolean"]
paths = ["src"]
exclude = ["**/migrations/*"]
min_score = 80

Target specific files with --gremlin-targets=src/auth.py.

Try It On Your Code

Run this on your highest-coverage module:

pip install pytest-gremlins
pytest --gremlins --gremlin-parallel --gremlin-targets=src/your_critical_module.py

Survivors show exactly where your tests verify execution but not correctness. Fix one, run again in under 2 seconds with caching.

Links: PyPI | GitHub | Docs

DEV Community