Bilgin Ibryam

Posted on Jun 7 • Originally published at generativeprogrammer.com

Stop Babysitting Your Coding Agent. Give It Backpressure.

I keep seeing the same failure mode with coding agents:

The agent writes a diff.
The human notices a broken test.
The human explains the failure.
The agent retries.

That is not review. That is babysitting.

The fix is to move cheap, machine-readable feedback into the agent loop before the work reaches a human reviewer.

The problem

A coding agent is a fast producer. The human reviewer is a slow consumer.

That mismatch creates pressure. If the only meaningful feedback comes from the human, every trivial failure becomes review work. The human becomes the compiler, the test runner, the linter, the UI checker, and the reviewer.

The agent can generate code quickly, but generation without feedback is open loop. It produces a diff without knowing whether that diff is correct, useful, safe, or ready.

In Don’t waste your back pressure, Moss Banay describes this as wasted human backpressure: spending attention on mistakes the agent should have been able to detect itself, such as missing imports, broken builds, and visual errors.

Marc Brooker makes a related point in What’s Easy Now? What’s Hard Now?: agents are feedback loops around a useful but flawed model. The key shift is moving feedback from the human loop into the agent loop: build, test, inspect, repair, and iterate.

So the question is not only:

How good is the model?

The better question is:

What feedback can the agent use without asking me?

The pattern

Backpressure is feedback that reaches the agent before the agent reaches the human.

The useful loop looks like this:

generate → verify → repair → repeat

Backpressure can come from a type checker, a test, a linter, a build, a browser check, logs, traces, structural rules, or evals. The source matters less than the loop: the agent sees the failure, repairs the work, and tries again.

Intent: Move cheap correctness checks out of human review and into the agent loop.

Context: A coding agent can edit files, run commands, read failures, and retry.

Problem: Agents produce code faster than humans can validate it.

Solution: Expose fast, machine-readable feedback sensors to the agent: types, tests, linters, builds, browser checks, logs, traces, and structural rules.

Result: The agent fixes mechanical failures before review. The human focuses on intent, design, and trade-offs.

What counts as backpressure

Backpressure is any signal that can push back on bad agent output before a human spends attention on it.

Sensor	What it catches
Type checker	Invalid shapes, missing fields, bad contracts
Linter	Unsafe patterns, formatting noise, local conventions
Focused tests	Behavior regressions
Build	Integration failures
Browser check	Broken UI, missing elements, visual drift
Logs and traces	Runtime behavior
Structural rules	Architecture boundary violations
Evals	Meaning-level regressions

Thoughtworks calls these feedback sensors for coding agents: deterministic quality gates wired into agent workflows so failures trigger self-correction.

That distinction matters.

A CI failure after the agent is done is a gate.

A failure the agent sees while working is backpressure.

A green build is not enough

A green build is useful, but narrow.

It tells you the code compiled, the formatter ran, and the existing tests passed. It does not prove the product rule survived. It does not prove the generated test asserts meaningful behavior. It does not prove the refactor respected the architecture.

So the goal is not just to run CI earlier.

The goal is to make feedback fast, specific, and actionable enough for the agent to repair the work.

Bad feedback:

The code is wrong.

Better feedback:

auth/session.test.ts failed.

Expected expired sessions to redirect to /login.
Received 200 from /dashboard.

Fix the session expiration path.
Do not change the test.

The second version gives the agent something to do.

Effective feedback beats more feedback

More output is not the same as better backpressure.

A recent arXiv preprint, Scaling Laws for Agent Harnesses via Effective Feedback Compute, makes this point in research terms. The authors argue that agent harnesses scale less with raw tokens, tool calls, wall time, or cost, and more with how efficiently they convert raw budget into useful feedback.

They call this Effective Feedback Compute: feedback that is informative, valid, non-redundant, and retained for later decisions.

That maps directly to coding agents.

A test failure is useful only if the agent can understand it, trust it, avoid repeating it, and use it to change the next attempt.

Useful backpressure has four properties:

Property	Meaning
Informative	It tells the agent what failed and where
Valid	It comes from a trusted checker, test, tool, or observation
Non-redundant	It adds signal instead of repeating noise
Retained	It changes the agent’s next action

That is why a thousand lines of logs can be worse than a short, structured failure.

The goal is not more agent activity. The goal is more effective feedback.

How to implement it

Start with the boring checks.

Put the exact commands in AGENTS.md, CLAUDE.md, or the equivalent project instruction file:

After changing TypeScript:
- run npm run typecheck
- run npm run lint
- run npm test -- --changed
- fix failures before opening a PR

Then make the output efficient.

Do not dump hundreds of lines of passing test output into the agent context. Passing checks should be tiny. Failing checks should show the useful detail.

HumanLayer describes this as context-efficient backpressure: replace passing test, build, and lint output with a small success signal, but expose full output when a command fails.

Example:

✓ typecheck
✓ lint
✗ auth tests

FAIL auth/session.test.ts
Expected redirect to /login for expired session.
Received 200 from /dashboard.

That is enough signal for the agent to continue.

The rule:

Success should be compressed. Failure should be actionable.

The trade-off

Backpressure is not free.

Fast sensors improve flow. Slow sensors choke it.

Do not put every possible check inside the inner loop. Layer them.

In session:
typecheck, lint, focused tests, build, screenshots

Before PR:
full tests, integration tests, structural rules, security checks

Scheduled or explicit:
mutation testing, fuzzing, semantic evals, architecture drift review

Birgitta Böckeler’s Maintainability sensors for coding agents is useful here because it separates sensors that run during the coding session from sensors that belong in the pipeline, on a schedule, or in production.

That is the right mental model: not every check belongs in the same loop.

The inner loop should be fast enough that the agent can run it repeatedly. Slower checks belong outside the session unless the task is risky enough to justify the cost.

The rule

When you correct the same agent mistake twice, turn it into backpressure.

A test.
A type.
A linter rule.
A build check.
A browser assertion.
A better error message.
A small script the agent can run before it asks for review.

The future skill is not asking agents to try harder.

It is designing feedback loops that make bad output harder to accept.

I write about similar AI-assisted coding patterns in my newsletter.