Ariel Frischer

Posted on Jan 11

Code Coverage Best Practices for Agentic Development

#webdev #programming #ai #productivity

The Core Problem

When AI agents generate code, two opposing forces create tension:

High coverage slows down - More tests mean longer iteration cycles and higher token costs
Low coverage risks regressions - Agents lack institutional memory; they don't know if they broke something previously built

Key Insight: Tests Are Institutional Memory

Human developers remember past bugs and context. Agents don't. Tests become the primary mechanism for regression detection, not a secondary safety net.

This shifts the value proposition: testing agent-generated code is MORE important than testing human-written code, because it's the only way agents learn what to preserve.

The Middle Ground: Tiered Coverage Strategy

Risk Tier	Coverage Target	What Goes Here
Critical	85-95%	Business logic, security, data integrity, money flows, API contracts
Core	70-80%	Main workflows, state management, integrations
Low-risk	50-60%	Getters/setters, DTOs, glue code, logging

Principles

1. Intention-Based > High Percentage

Every test should answer: "What regression would this catch if it failed?"

Tests that can't name a concrete regression are coverage-chasing noise. Focus on:

Core business rules
Edge cases that have caused bugs before
Integration points between components
Security and data integrity boundaries

2. The Real Cost Is Churn, Not Writing Tests

Implementation-coupled tests that break on refactors cause agent thrashing. This is far more expensive than writing fewer, behavior-focused tests.

Behavior-focused tests verify observable outputs and side effects rather than implementation details. They test WHAT the code does, not HOW it does it - so they survive refactoring.

Behavior-Focused	Implementation-Coupled
Tests inputs → outputs	Tests internal method calls
Breaks when behavior changes	Breaks when code is refactored
Mocks external dependencies only	Mocks internal collaborators

3. Every Bug Becomes a Regression Test

When something breaks, the fix includes a test that would have caught it. This encodes institutional memory into the test suite.

4. Coverage as Diagnostic, Not KPI

Use coverage reports to find untested critical paths, not to hit arbitrary numbers.

80% with meaningful tests beats 95% with brittle ones
Low coverage in a critical module is a signal to investigate
Improving code (deleting dead branches) can legitimately lower coverage while increasing safety

5. Fast Feedback Loop Is Non-Negotiable

If your test suite runs under 3 minutes, run it on every agent iteration. The cost is worth the regression protection.

For longer suites:

Run focused tests during iteration
Run full suite before commit/merge
Consider test parallelization

Diminishing Returns After 80%

Research consistently shows diminishing returns kick in after 80-90% coverage:

Aspect	80-90% Coverage	100% Coverage
Bug-finding value	High early, then flattens	Minimal for final 10-20%
Engineering effort	Proportional to benefit	Disproportionately high
Test complexity	Moderate, behavior-focused	Often brittle, over-specified
Design impact	Encourages testable code	Can incentivize design compromises

The last 10-20% requires:

Intricate test setups
Heavy mocking
Tests tightly coupled to implementation
Covering defensive/error paths rarely hit in practice

Agent-Specific Practices

Minimum Rule

ALL agent-generated code must have at least one intention-revealing test before merge.

Not necessarily high coverage, but something that would fail if the core behavior broke.

Have Agents Write Tests Too

Since agents can generate tests, the marginal cost of test creation is low. Prompt agents to:

Write tests alongside implementation
Generate edge case tests for their own code
Create regression tests when fixing bugs

Run Tests in the Agent Loop

Include test execution as part of the agent's implementation cycle:

implement → run tests → fix failures → verify tests pass → commit

This catches regressions before they propagate.

Test Known Issue Scenarios

Maintain a list of known system failures and ensure tests cover these scenarios. Agents don't know your system's historical pain points unless tests encode them.

Practical Recommendation

Aim for 75-85% coverage on critical code with behavior-focused tests. Accept 50-70% elsewhere. Always run tests in the agent loop.

The balance between cost/duration and regression protection favors FAST, MEANINGFUL tests over comprehensive slow tests.

DEV Community