The Problem
Coding agents are only as good as the context you give them.
The better your prompt, the better the output. Most advice focuses on prompt engineering — clearer instructions, better few-shot examples, smarter questions. That all helps.
But no matter how good your prompt is, there's a ceiling — because the repository itself is missing the context the agent needs.
When a human joins a new project, they get a briefing: product intent, architecture decisions, what "done" looks like, team conventions, what not to touch, who to ask when stuck. They might get a doc, a walkthrough, a Slack thread from six months ago — but they get something.
When a coding agent joins a project, it gets a task description and a pile of files.
The result is predictable: agents make assumptions. They ship something that technically works but breaks a caching layer they didn't know existed. They undo an architectural decision without understanding why it was made. They write tests that don't match what the team actually considers solid work. You spend more time reviewing and fixing than if you'd just done it yourself.
This isn't a model problem. It's a repository problem.
The Five-Layer Context Stack
Every coding context sits on a stack:
Layer 1 — Task prompt What to do right now
Layer 2 — Session context What was already decided this session
Layer 3 — Repo instructions What the repo knows about itself
Layer 4 — Decision history Why things are the way they are
Layer 5 — Validation What "good enough" looks like
Most repositories only have Layer 1. The agent's session provides Layer 2. But Layers 3, 4, and 5 — the repo-level context that took years to accumulate — are invisible to the agent.
This is why the same agent that writes clean, thoughtful code in one project produces awkward, context-blind work in another. The model didn't change. The repo did.
The Solution: A Repository Harness
A repository harness fills the missing layers. It's not a framework or a library. It's a structured layer of repo-level instructions — files that agents can read, right alongside the code.
AGENTS.md — What kind of project this is, where to start, who to ask. Think of it as the agent's first-day briefing, always available.
Feature intake — Classify work by risk before touching production. Is this a quick patch or a structural change? Does it need a design discussion first?
Architecture discovery — What the modules are, what the contracts between them are, what data flows where. The agent stops treating the codebase as undifferentiated text.
Test matrices — Not just "do the tests pass" but "what does solid look like for this kind of change?" Different work has different standards.
Decision records — Why previous choices were made. The context that lives in the lead developer's head, written down and made permanent.
Story packets — Story-sized units that fit the team's review bandwidth. The agent stops generating three-week PRs no reviewer wants to touch.
The goal isn't more agent autonomy. The goal is inspectable AI-assisted work: humans still steer, agents have enough context to make good decisions independently.
It Works With Any Agent
Claude Code, Codex, Cursor — any agent that reads files benefits from a repo that knows how to brief itself. You don't change the agent. You change the repository.
Try It
If you're running coding agents on your own repos, you already know what I'm describing.
repository-harness is a starting point — fork it and adapt the pattern to your project.
What context do you wish your agents had?

Top comments (4)
The five-layer stack matches my experience, and the "Layer 4: decision history" is the one that nobody has solved yet. AGENTS.md handles Layer 3 well. The session gives Layer 2. Tests give Layer 5 (sort of). But Layer 4 — why we picked this design, what we tried and dropped, what the next experiment was — lives in Slack threads, PR comments, and the heads of the people who have been there longest.
The thing I keep coming back to is that decisions are not static. They get re-litigated every time a new person (or agent) joins the project. The same question — "why is the cache invalidation this way?" — gets answered slightly differently every time, until eventually the code drifts from any of the original reasons.
The cheapest improvement I have found: a per-decision entry in a
decisions/folder, written the day the decision is made, with three lines: the question, the chosen answer, and the one alternative that was seriously considered. Two years later you can grep "why auth" and get back the right answer, not the rumor of the right answer.That doesn't replace the agent's context window, and it doesn't fix "the model broke a layer we didn't know existed". But it moves Layer 4 from "ask the senior engineer" to "ask the file", and that is a real win for agent work because the senior engineer is rarely in the loop when the agent is at 2am running tests on its own.
One more thing would be having another skill that writes the decision - periodically run on demand. But if aiming for the Agents, then it must be done.
感谢分享,repo harness 这一层确实是关键,最近在 Claude Code 上加 hooks 的最大教训就是「不写到 CLAUDE.md / hooks 里就等于没约束」。
Some comments may only be visible to logged-in visitors. Sign in to view all comments.