I keep seeing the same failure mode with coding agents:
The agent writes a diff.
The human notices a broken test.
The human explains the failure.
The agent retries.
That is not review. That is babysitting.
The fix is to move cheap, machine-readable feedback into the agent loop before the work reaches a human reviewer.
The problem
A coding agent is a fast producer. The human reviewer is a slow consumer.
That mismatch creates pressure. If the only meaningful feedback comes from the human, every trivial failure becomes review work. The human becomes the compiler, the test runner, the linter, the UI checker, and the reviewer.
The agent can generate code quickly, but generation without feedback is open loop. It produces a diff without knowing whether that diff is correct, useful, safe, or ready.
In Don’t waste your back pressure, Moss Banay describes this as wasted human backpressure: spending attention on mistakes the agent should have been able to detect itself, such as missing imports, broken builds, and visual errors.
Marc Brooker makes a related point in What’s Easy Now? What’s Hard Now?: agents are feedback loops around a useful but flawed model. The key shift is moving feedback from the human loop into the agent loop: build, test, inspect, repair, and iterate.
So the question is not only:
How good is the model?
The better question is:
What feedback can the agent use without asking me?
The pattern
Backpressure is feedback that reaches the agent before the agent reaches the human.
The useful loop looks like this:
generate → verify → repair → repeat
Backpressure can come from a type checker, a test, a linter, a build, a browser check, logs, traces, structural rules, or evals. The source matters less than the loop: the agent sees the failure, repairs the work, and tries again.
Intent: Move cheap correctness checks out of human review and into the agent loop.
Context: A coding agent can edit files, run commands, read failures, and retry.
Problem: Agents produce code faster than humans can validate it.
Solution: Expose fast, machine-readable feedback sensors to the agent: types, tests, linters, builds, browser checks, logs, traces, and structural rules.
Result: The agent fixes mechanical failures before review. The human focuses on intent, design, and trade-offs.
What counts as backpressure
Backpressure is any signal that can push back on bad agent output before a human spends attention on it.
| Sensor | What it catches |
|---|---|
| Type checker | Invalid shapes, missing fields, bad contracts |
| Linter | Unsafe patterns, formatting noise, local conventions |
| Focused tests | Behavior regressions |
| Build | Integration failures |
| Browser check | Broken UI, missing elements, visual drift |
| Logs and traces | Runtime behavior |
| Structural rules | Architecture boundary violations |
| Evals | Meaning-level regressions |
Thoughtworks calls these feedback sensors for coding agents: deterministic quality gates wired into agent workflows so failures trigger self-correction.
That distinction matters.
A CI failure after the agent is done is a gate.
A failure the agent sees while working is backpressure.
A green build is not enough
A green build is useful, but narrow.
It tells you the code compiled, the formatter ran, and the existing tests passed. It does not prove the product rule survived. It does not prove the generated test asserts meaningful behavior. It does not prove the refactor respected the architecture.
So the goal is not just to run CI earlier.
The goal is to make feedback fast, specific, and actionable enough for the agent to repair the work.
Bad feedback:
The code is wrong.
Better feedback:
auth/session.test.ts failed.
Expected expired sessions to redirect to /login.
Received 200 from /dashboard.
Fix the session expiration path.
Do not change the test.
The second version gives the agent something to do.
Effective feedback beats more feedback
More output is not the same as better backpressure.
A recent arXiv preprint, Scaling Laws for Agent Harnesses via Effective Feedback Compute, makes this point in research terms. The authors argue that agent harnesses scale less with raw tokens, tool calls, wall time, or cost, and more with how efficiently they convert raw budget into useful feedback.
They call this Effective Feedback Compute: feedback that is informative, valid, non-redundant, and retained for later decisions.
That maps directly to coding agents.
A test failure is useful only if the agent can understand it, trust it, avoid repeating it, and use it to change the next attempt.
Useful backpressure has four properties:
| Property | Meaning |
|---|---|
| Informative | It tells the agent what failed and where |
| Valid | It comes from a trusted checker, test, tool, or observation |
| Non-redundant | It adds signal instead of repeating noise |
| Retained | It changes the agent’s next action |
That is why a thousand lines of logs can be worse than a short, structured failure.
The goal is not more agent activity. The goal is more effective feedback.
How to implement it
Start with the boring checks.
Put the exact commands in AGENTS.md, CLAUDE.md, or the equivalent project instruction file:
After changing TypeScript:
- run npm run typecheck
- run npm run lint
- run npm test -- --changed
- fix failures before opening a PR
Then make the output efficient.
Do not dump hundreds of lines of passing test output into the agent context. Passing checks should be tiny. Failing checks should show the useful detail.
HumanLayer describes this as context-efficient backpressure: replace passing test, build, and lint output with a small success signal, but expose full output when a command fails.
Example:
✓ typecheck
✓ lint
✗ auth tests
FAIL auth/session.test.ts
Expected redirect to /login for expired session.
Received 200 from /dashboard.
That is enough signal for the agent to continue.
The rule:
Success should be compressed. Failure should be actionable.
The trade-off
Backpressure is not free.
Fast sensors improve flow. Slow sensors choke it.
Do not put every possible check inside the inner loop. Layer them.
In session:
typecheck, lint, focused tests, build, screenshots
Before PR:
full tests, integration tests, structural rules, security checks
Scheduled or explicit:
mutation testing, fuzzing, semantic evals, architecture drift review
Birgitta Böckeler’s Maintainability sensors for coding agents is useful here because it separates sensors that run during the coding session from sensors that belong in the pipeline, on a schedule, or in production.
That is the right mental model: not every check belongs in the same loop.
The inner loop should be fast enough that the agent can run it repeatedly. Slower checks belong outside the session unless the task is risky enough to justify the cost.
The rule
When you correct the same agent mistake twice, turn it into backpressure.
A test.
A type.
A linter rule.
A build check.
A browser assertion.
A better error message.
A small script the agent can run before it asks for review.
The future skill is not asking agents to try harder.
It is designing feedback loops that make bad output harder to accept.
I write about similar AI-assisted coding patterns in my newsletter.


Top comments (0)