r4mimu

Posted on Apr 3

Your Go Tests Pass, But Do They Actually Test Anything? An Introduction to Mutation Testing

#ai #codequality #go #testing

GitHub Copilot, Cursor, Claude Code — these tools can generate hundreds of lines of Go in seconds. But there's a problem that doesn't get enough attention: AI-generated code ships with AI-generated confidence, not correctness.

Your test suite says "all green." Your coverage report says 85%. But how many of those tests actually catch real bugs? How many are just going through the motions — executing code paths without verifying behavior?

This is where mutation testing comes in. And this is why I built mutest.

The Hidden Cost of AI-Assisted Development

AI coding assistants are good at producing code that looks correct. They generate functions with proper signatures, idiomatic error handling, and even test files. But there's a gap: AI tends to produce tests that cover code paths rather than verify boundaries.

Consider this function:

func IsEligibleForDiscount(quantity int) bool {
    if quantity >= 10 {
        return true
    }
    return false
}

An AI might generate tests like:

func TestIsEligibleForDiscount(t *testing.T) {
    if got := IsEligibleForDiscount(15); got != true {
        t.Errorf("IsEligibleForDiscount(15) = %v, want true", got)
    }
    if got := IsEligibleForDiscount(3); got != false {
        t.Errorf("IsEligibleForDiscount(3) = %v, want false", got)
    }
}

Coverage: 100%. Every line is executed. CI is green. Ship it.

But what happens if someone accidentally changes quantity >= 10 to quantity > 10? Both tests still pass. The boundary case — IsEligibleForDiscount(10) — was never tested. A customer ordering exactly 10 items would silently lose their discount, and your test suite would never notice.

This is not a contrived example. Boundary-value and equality bugs are among the most common defects in production software, and they're the kind of bug that AI-generated tests routinely miss.

Mutation Testing: The Test for Your Tests

Mutation testing flips the question. Instead of "does my code pass the tests?", it asks "do my tests catch bugs?"

The concept is straightforward:

Take your source code
Introduce a small, deliberate change (a "mutation") — like swapping >= for >
Run your test suite against the mutated code
If the tests fail, the mutant is "killed" — your tests caught the bug
If the tests pass, the mutant has "survived" — you have a test gap

Every surviving mutant points to a specific line where a bug could hide undetected. Unlike line coverage, which just tells you what code ran, this tells you whether your tests actually noticed the change.

Why This Matters Now

In the pre-AI era, developers wrote code slowly and typically had intuition about boundary conditions because they had reasoned through the logic themselves. AI-generated code skips that reasoning step. The code appears, the tests appear, and the developer reviews both — but the boundary considerations are often missing from both.

Mutation testing fills that gap. It's the automated version of a senior engineer asking, "But what happens when the value is exactly 10?"

The Problem with Existing Mutation Testing Tools

If mutation testing is so valuable, why isn't everyone doing it? Because existing tools make it impractical for real-world CI pipelines.

They're Too Slow

Traditional mutation testing tools generate thousands of mutants — mutating arithmetic operators, boolean returns, assignments, method calls, and more. For a medium-sized Go project, this easily means 5,000-10,000+ mutants. Each one requires recompilation and a full test run. Even with optimization, you're looking at 30-60 minutes of CI time.

Nobody runs a 45-minute mutation test on every pull request.

They're Too Noisy

More mutation operators mean more surviving mutants — but not all surviving mutants represent real test gaps. Mutating x + y to x - y in a logging format string is technically a surviving mutant, but it doesn't point to a meaningful test gap. When your mutation report has 200 survivors and only 15 are actionable, developers stop reading the report.

Go-Specific Pain Points

The Go ecosystem has a few mutation testing tools, but they share common limitations:

Per-mutant recompilation: Most tools modify source files and rebuild for every single mutant. Go's compilation is fast, but N mutations * (compile + test) adds up quickly across a whole project.
Source file modification: Some tools directly modify .go files, creating race conditions with IDE file watchers, breaking gopls, and risking corrupted source if the process is interrupted mid-run.
Kitchen-sink mutation strategies: Trying to mutate everything — arithmetic, assignments, returns, conditionals — generates way too many mutants, most of which aren't useful.

The result: developers try mutation testing once, wait 20 minutes for a noisy report, and never run it again.

Introducing mutest

mutest is a mutation testing tool for Go that takes a different approach. Instead of mutating everything, it focuses on the operators that actually matter — and runs fast enough for CI.

$ mutest ./...
mutest: discovered 4 mutation points
mutest: testing with 10 workers, 30s timeout per mutant

--- SURVIVED: calc.go:5:7  > to >= (0.21s)
--- SURVIVED: calc.go:21:7  > to >= (0.21s)
--- SURVIVED: calc.go:18:7  < to <= (0.21s)
--- KILLED: calc.go:13:11  > to >= (0.63s)

===== Mutation Testing Summary =====
Total:     4
Killed:    1
Survived:  3
Score:     25.0%
Duration:  633ms

633 milliseconds. Not minutes. Milliseconds.

What Makes mutest Different

1. Focused Mutation Strategy

mutest targets Relational Operator Replacement (ROR) — comparison and equality operators only:

>  →  >=
>=  →  >
<  →  <=
<=  →  <
==  →  !=
!=  →  ==

This isn't a limitation — it's a deliberate choice. Off-by-one and equality bugs are some of the most common defects in production Go code, and these six mutations target exactly those. The academic literature on mutation testing backs this up: ROR is one of the most cost-effective mutation operator subsets.

By limiting scope, mutest keeps the mutant count small — typically 10-50 per package instead of thousands. Every survivor is something you should actually look at.

2. Runtime Mutation Selection (Compile Once, Run Many)

Traditional tools follow this pattern:

For each mutant:
    1. Modify source file
    2. Compile package
    3. Run tests
    4. Restore source file

mutest does it differently:

For each package:
    1. Instrument all mutation points with generic helper functions
    2. Compile ONE test binary with all mutations embedded
For each mutant:
    3. Run the pre-built binary with MUTEST_ID=N

Under the hood, a comparison like a > b is replaced with a generic helper function:

func _mutest_cmp_1[T cmp.Ordered](a, b T) bool {
    _mutest_init()
    if _mutest_active == 1 {
        return a >= b  // mutated
    }
    return a > b  // original
}

The package is compiled once with all these helpers embedded. Each mutant is then activated by running the same binary with a different environment variable: MUTEST_ID=1, MUTEST_ID=2, and so on.

The cost goes from N mutations * (compile + run) to P packages * compile + N mutations * run. For a package with 30 mutations, that's 1 compilation instead of 30.

3. Non-Destructive by Design

mutest never modifies your source files. All instrumented code lives in temporary files, and Go's -overlay flag tells the compiler to use them instead of the originals. If the process is killed, interrupted, or crashes — your source code is untouched. Your IDE stays happy. Your git status stays clean.

4. Smart Noise Reduction

mutest automatically skips mutations that would produce false positives:

len(x) > 0 to len(x) >= 0: len() never returns a negative value, so this mutation can never change behavior. Skipped.
cap(x) > 0 to cap(x) >= 0: Same reasoning — cap() is always non-negative. Skipped.
if err != nil { return err }: Go's idiomatic error propagation. Mutating these generates noise without revealing meaningful test gaps. Skipped by default (configurable with -skip-err-propagation=false).

Comparisons against non-zero values (like len(s) > 1) are not skipped — the boundary between 1 and 2 matters and should be tested.

5. Zero External Dependencies

mutest is built entirely on the Go standard library — go/ast, go/parser, go/token, os/exec, and friends. No third-party dependencies. go install and you're done.

Getting Started with mutest

Installation

go install github.com/fchimpan/mutest@latest

Requires Go 1.24 or later. Pre-built binaries for Linux, macOS, and Windows are also available on the Releases page.

Basic Usage

# Run against all packages (same syntax as go test)
mutest ./...

# Target a specific package
mutest ./pkg/handler

# Show test output for each mutant
mutest -v ./...

# Preview mutations without running tests
mutest -dry-run ./...

The output format mirrors go test, so it feels familiar:

--- KILLED: handler.go:25:11  > to >= (0.42s)
--- SURVIVED: handler.go:44:9  < to <= (0.19s)

CI Integration

mutest is designed to work as a CI quality gate. Two practical patterns:

Pattern 1 — diff-only mutation testing. Only mutate lines changed in the current pull request:

mutest -diff origin/main -threshold 100 ./...

This says: "For every comparison operator I changed or added, there must be a test that covers the boundary." If any changed comparison survives mutation, the pipeline fails. If the diff contains no mutation targets (e.g., only comments or non-Go files changed), mutest exits 0 — no false failures.

Pattern 2 — full-project threshold. Set a minimum mutation score for the entire project:

mutest -threshold 80 ./...

GitHub Actions Example

name: Mutation Testing
on: [pull_request]

jobs:
  mutest:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history needed for -diff
      - uses: actions/setup-go@v5
        with:
          go-version: stable
      - run: go install github.com/fchimpan/mutest@latest
      - run: mutest -diff origin/main -threshold 100 ./...

That's it. Under a minute of CI time for most projects.

Controlling Scope with `//mutest:skip`

Not every comparison needs mutation testing. Use the //mutest:skip directive to exclude code:

// Skip an entire function
//mutest:skip
func legacyCompare(a, b int) bool {
    return a > b
}

// Skip a block (if/for/switch/select) — skips all nested statements too
if err != nil { //mutest:skip
    if errors.Is(err, context.Canceled) {
        return nil, ErrCanceled
    }
    return nil, fmt.Errorf("fetch: %w", err)
}

// Skip a single line
if a > b { //mutest:skip
    return 1
}

JSON Output for Tooling

# Single JSON summary
mutest -json ./...

# Streaming NDJSON (one line per mutant as results arrive)
mutest -json -v ./...

The JSON output makes it easy to integrate mutest with custom dashboards, Slack notifications, or code review tools.

Fixing a Survived Mutant

When mutest reports a survivor, it tells you exactly where to look:

--- SURVIVED: pricing.go:12:7  >= to > (0.21s)

This means mutest changed >= to > and no test noticed:

func IsEligibleForDiscount(quantity int) bool {
    if quantity >= 10 {  // mutest changed this to >, tests still passed
        return true
    }
    return false
}

The fix — add a test at the boundary:

func TestIsEligibleForDiscount_ExactlyTen(t *testing.T) {
    // This test kills the >= to > mutation because
    // IsEligibleForDiscount(10) returns true with >=, but false with >.
    if got := IsEligibleForDiscount(10); got != true {
        t.Errorf("IsEligibleForDiscount(10) = %v, want true", got)
    }
}

Re-run mutest and that mutation point will now show --- KILLED.

Real-World Impact

I've been running mutest in CI on Go projects. The most obvious thing it finds: off-by-one errors in pagination, boundary checks in rate limiting, equality comparisons that should be inequalities. Bugs that 100% line coverage misses entirely.

The less obvious effect: it changes how developers write tests. Once you start seeing surviving mutants, you naturally start adding boundary-value tests. You stop writing tests that just exercise the happy path.

Speed matters more than you'd think here. A mutation testing tool that takes 30 minutes gets disabled after the first sprint. mutest finishes in seconds, so it stays on. And with diff mode, you don't need to retroactively hit a high mutation score across your whole project — just enforce it on new and changed code.

Summary

AI coding assistants are making us more productive, but they're also making it easier to ship code with untested boundary conditions. Line coverage tells you what code ran; mutation testing tells you what code was actually verified.

Think of it this way: go vet catches code that compiles but is probably wrong. mutest catches tests that pass but probably aren't thorough enough. It's the next layer of quality assurance for your Go project.

mutest brings mutation testing to Go without the traditional pain points:

Focused scope — targets comparison and equality operators, the mutations that catch real bugs
Fast — compiles once per package, runs many; finishes in seconds, not minutes
Non-destructive — never touches your source files; uses Go's -overlay flag
CI-ready — -diff, -threshold, and -json flags built in
Zero dependencies — pure Go standard library

If your tests are passing but you're not sure they're actually testing, give mutest a try:

go install github.com/fchimpan/mutest@latest
mutest ./...

You might be surprised how many mutants survive.

mutest is open source under the MIT license. Contributions, issues, and stars are welcome at github.com/fchimpan/mutest.

DEV Community

Your Go Tests Pass, But Do They Actually Test Anything? An Introduction to Mutation Testing

The Hidden Cost of AI-Assisted Development

Mutation Testing: The Test for Your Tests

Why This Matters Now

The Problem with Existing Mutation Testing Tools

They're Too Slow

They're Too Noisy

Go-Specific Pain Points

Introducing mutest

What Makes mutest Different

1. Focused Mutation Strategy

2. Runtime Mutation Selection (Compile Once, Run Many)

3. Non-Destructive by Design

4. Smart Noise Reduction

5. Zero External Dependencies

Getting Started with mutest

Installation

Basic Usage

CI Integration

GitHub Actions Example

Controlling Scope with `//mutest:skip`

JSON Output for Tooling

Fixing a Survived Mutant

Real-World Impact

Summary

Top comments (0)

The Hidden Cost of AI-Assisted Development

Mutation Testing: The Test for Your Tests

Why This Matters Now

The Problem with Existing Mutation Testing Tools

They're Too Slow

They're Too Noisy

Go-Specific Pain Points

Introducing mutest

What Makes mutest Different

1. Focused Mutation Strategy

2. Runtime Mutation Selection (Compile Once, Run Many)

3. Non-Destructive by Design

4. Smart Noise Reduction

5. Zero External Dependencies

Getting Started with mutest

Installation

Basic Usage

CI Integration

GitHub Actions Example

Controlling Scope with //mutest:skip

JSON Output for Tooling

Fixing a Survived Mutant

Real-World Impact

Summary

Controlling Scope with `//mutest:skip`