You have 90% code coverage, green CI, and you ship. A user reports that >= should have been >. Your tests executed that line but never verified the boundary mattered.
Code coverage counts executed lines. Mutation testing injects small bugs and checks whether your tests detect them. If tests still pass after changing >= to >, you found a gap.
Why Mutation Testing Has Been Impractical
Traditional tools (mutmut, cosmic-ray) rewrite source files, reload modules, and run the full test suite per mutation. A codebase with 100 mutations and a 10-second test suite takes 17+ minutes. That runtime kills feedback loops.
pytest-gremlins Architecture
pytest-gremlins achieves 13.8x speedup through three mechanisms:
Mutation Switching: All mutations are embedded during a single instrumentation pass. Switching between mutations requires only an environment variable change, eliminating per-mutation file I/O and module reloads.
Coverage-Guided Test Selection: The plugin tracks which tests cover each line. When testing a mutation on line 42, it runs only the 3 tests that touch line 42 instead of all 200 tests.
Incremental Caching: Results are keyed by content hash of source and test files. Unchanged code skips mutation testing entirely on subsequent runs.
Benchmark: pytest-gremlins vs mutmut
Measured on Python 3.12 in Docker:
| Configuration | Time | vs. mutmut |
|---|---|---|
| mutmut | 14.90s | baseline |
| pytest-gremlins (sequential) | 17.79s | 0.84x |
| pytest-gremlins (parallel) | 3.99s | 3.7x faster |
| pytest-gremlins (parallel + cache) | 1.08s | 13.8x faster |
Sequential mode is slower because pytest-gremlins runs additional mutation operators. Parallel mode, safe due to mutation switching (no shared mutable state), delivers the speedup. Cached runs approach instant for unchanged code.
Installation and Usage
pip install pytest-gremlins
pytest --gremlins --gremlin-parallel --gremlin-cache
Output identifies specific gaps:
================== pytest-gremlins mutation report ==================
Zapped: 142 gremlins (89%)
Survived: 18 gremlins (11%)
Top surviving gremlins:
src/auth.py:42 >= → > (boundary not tested)
src/utils.py:17 + → - (arithmetic not verified)
src/api.py:88 True → False (return value unchecked)
=====================================================================
Each survivor is a line number, the mutation applied, and the gap it reveals. Line 42 has a boundary condition no test verifies.
Configuration
Add to pyproject.toml:
[tool.pytest-gremlins]
operators = ["comparison", "arithmetic", "boolean"]
paths = ["src"]
exclude = ["**/migrations/*"]
min_score = 80
Target specific files with --gremlin-targets=src/auth.py.
Try It On Your Code
Run this on your highest-coverage module:
pip install pytest-gremlins
pytest --gremlins --gremlin-parallel --gremlin-targets=src/your_critical_module.py
Survivors show exactly where your tests verify execution but not correctness. Fix one, run again in under 2 seconds with caching.
Top comments (0)