DEV Community

Cover image for Your Tests Pass. But Would They Catch This Bug?
Mike Lane
Mike Lane

Posted on

Your Tests Pass. But Would They Catch This Bug?

You have 90% code coverage, green CI, and you ship. A user reports that >= should have been >. Your tests executed that line but never verified the boundary mattered.

Code coverage counts executed lines. Mutation testing injects small bugs and checks whether your tests detect them. If tests still pass after changing >= to >, you found a gap.

Why Mutation Testing Has Been Impractical

Traditional tools (mutmut, cosmic-ray) rewrite source files, reload modules, and run the full test suite per mutation. A codebase with 100 mutations and a 10-second test suite takes 17+ minutes. That runtime kills feedback loops.

pytest-gremlins Architecture

pytest-gremlins achieves 13.8x speedup through three mechanisms:

Mutation Switching: All mutations are embedded during a single instrumentation pass. Switching between mutations requires only an environment variable change, eliminating per-mutation file I/O and module reloads.

Coverage-Guided Test Selection: The plugin tracks which tests cover each line. When testing a mutation on line 42, it runs only the 3 tests that touch line 42 instead of all 200 tests.

Incremental Caching: Results are keyed by content hash of source and test files. Unchanged code skips mutation testing entirely on subsequent runs.

Benchmark: pytest-gremlins vs mutmut

Measured on Python 3.12 in Docker:

Configuration Time vs. mutmut
mutmut 14.90s baseline
pytest-gremlins (sequential) 17.79s 0.84x
pytest-gremlins (parallel) 3.99s 3.7x faster
pytest-gremlins (parallel + cache) 1.08s 13.8x faster

Sequential mode is slower because pytest-gremlins runs additional mutation operators. Parallel mode, safe due to mutation switching (no shared mutable state), delivers the speedup. Cached runs approach instant for unchanged code.

Installation and Usage

pip install pytest-gremlins
pytest --gremlins --gremlin-parallel --gremlin-cache
Enter fullscreen mode Exit fullscreen mode

Output identifies specific gaps:

================== pytest-gremlins mutation report ==================

Zapped: 142 gremlins (89%)
Survived: 18 gremlins (11%)

Top surviving gremlins:
  src/auth.py:42    >= → >     (boundary not tested)
  src/utils.py:17   + → -      (arithmetic not verified)
  src/api.py:88     True → False (return value unchecked)
=====================================================================
Enter fullscreen mode Exit fullscreen mode

Each survivor is a line number, the mutation applied, and the gap it reveals. Line 42 has a boundary condition no test verifies.

Configuration

Add to pyproject.toml:

[tool.pytest-gremlins]
operators = ["comparison", "arithmetic", "boolean"]
paths = ["src"]
exclude = ["**/migrations/*"]
min_score = 80
Enter fullscreen mode Exit fullscreen mode

Target specific files with --gremlin-targets=src/auth.py.

Try It On Your Code

Run this on your highest-coverage module:

pip install pytest-gremlins
pytest --gremlins --gremlin-parallel --gremlin-targets=src/your_critical_module.py
Enter fullscreen mode Exit fullscreen mode

Survivors show exactly where your tests verify execution but not correctness. Fix one, run again in under 2 seconds with caching.

Links: PyPI | GitHub | Docs

Top comments (0)