synthaicode

Posted on Apr 3

Why Your AI Agents Are Only Half as Smart as They Could Be

#ai #management #systemdesign

You hand an AI agent a GitHub Issue. It reads it, writes code, opens a PR, and passes CI. Impressive. You feel productive.

Then a new engineer joins. They read every PR for two weeks. They still don't understand why the system is shaped the way it is. They ask you. You explain. The explanation disappears into Slack.

This is not an onboarding problem. It is a structural problem. And AI agents make it worse.

The Invisible Starting Point

There is a post going around about a startup that built 21 AI agents in two months. GitHub Issue triggers a label. Label triggers an agent. Agent writes code, opens PR, passes review, merges. The human writes the Issue and goes to sleep. By morning, the PR is ready.

It reads like the future. And in many ways it is.

But one thing is missing from the entire article: where does the Issue come from?

Someone's head. Specifically, one person's head. That person holds the product strategy, the architectural decisions, the things that were tried and abandoned, the reason this feature exists at all. None of that is in the repository. All of it is in one human.

The 21 agents are fast. The one human is the bottleneck. And unlike the agents, the human gets tired. Gets older. Might leave.

This is not a bug in their system. It is the design. And it is the design of most software teams today.

Two Contexts, One Confused Repository

The common answer is: put everything in docs/.

"If it's not in the repository, it doesn't exist." Reasonable-sounding principle. Wrong diagnosis.

The problem is that there are two fundamentally different kinds of context, and they do not belong in the same place.

System context answers: how is this built?
Architecture diagrams. API contracts. Data models. Dependency graphs. This is already in the source code. Writing it again in docs/architecture.md just creates a second source of truth that will drift.

Business context answers: why does this exist?
What customer problem this solves. What was tried before this and why it failed. What constraints shaped the decision. What the team is optimizing for. What was deliberately left out.

Business context has a different lifecycle than system context. Code changes constantly. The reason a feature exists may not change for years. Putting business context inside a code repository ties it to the wrong clock. It gets treated like code — versioned when changed, ignored when not.

Worse, when you have three repositories (mobile, backend, admin panel), business context gets split across all three. The reason a feature exists is not a mobile concern or a backend concern. It belongs to the business, which is one thing, not three.

Diffs All the Way Down

The deeper problem is cultural.

Pull Request culture inherits from a specific tradition: Linus Torvalds needed a way to manage patches from thousands of distributed contributors. The unit of work was a diff. The record was a diff. The review was a diff.

This is technically elegant. It is also philosophically narrow. A diff records what changed. It does not record why the change was worth making, what alternatives were considered, or what the change assumes about the future.

When Agile arrived, it added a layer of permission: "Working software over comprehensive documentation." Which is reasonable in context. But it was interpreted as: "We don't have to write down why we made decisions." Which is not the same thing.

MBA culture reinforced this from the other direction. What can be measured can be managed. PRs can be counted. Commit frequency can be graphed. Velocity can be reported. The reason for a decision cannot be put in a dashboard. So it stopped being tracked.

Three forces — distributed version control, Agile, and management by metrics — converged on the same outcome: organizations that are very good at recording what changed, and very bad at preserving why.

The Human Pillar

Someone always knows. In every team, there is at least one person who holds the whole picture. They know why the authentication is built the way it is. They know what the database schema looked like before the migration. They know which customers drove which decisions.

This person becomes a pillar. Everyone leans on them. They answer the same questions repeatedly. They review PRs not just for correctness but to ensure nothing violates the unwritten assumptions they carry.

This is called a "bus factor" problem, but that framing is too narrow. It implies the risk is that the person gets hit by a bus. The more common risk is slower: the person gets tired. Promotion, burnout, gradual disengagement. Or simply: the codebase grows faster than one person can track, and the pillar starts to crack under the load.

AI agents accelerate this. If your pipeline can produce 600 PRs a month, the human who validates the starting point of each one is not keeping up. Speed has been optimized. The bottleneck has been ignored.

What Structured Information Actually Means

There is a difference between AI that moves fast and AI that reasons well.

Moving fast requires a queue of well-formed tasks. The agents in the article move fast. The tasks come pre-formed from one person's judgment, handed down as Issues.

Reasoning well requires a broader context. It requires knowing not just what to build but why this, why now, what constraints are real, what can be traded off, what cannot.

If you hand an AI agent a well-formed Issue stripped of context, it will execute the Issue. If the Issue is wrong — based on a misunderstood constraint, an outdated assumption, a decision that was reversed three months ago — the agent will execute the wrong thing efficiently.

The quality of the output is bounded by the quality of the input. And the input is bounded by what one human can keep in their head and translate into an Issue template.

The solution is not better prompt engineering on the Issue. The solution is a separate, structured layer that holds business context independently of the code, linkable from any PR or Issue in any repository.

When an AI agent can read: "This feature exists because of constraint X, alternative Y was rejected for reason Z, the current design assumes assumption W which should be verified before changing this" — the agent is no longer executing instructions. It is reasoning within a context. The output quality changes.

Permanence

There is a question no one in these discussions seems to ask: what does this look like in ten years?

A diff-based repository accumulates. Five years of PRs is a mountain of diffs. The mountain records the surface of every change. It does not record the shape of the reasoning that produced those changes.

New engineers do not read five years of PRs. They read the current code and the current docs and they ask the pillar. The pillar explains. The pillar's explanation reflects their current understanding, filtered through years of accumulated context that was never written down.

The organization's effective memory is the pillar's memory. When the pillar leaves, the organization loses a piece of its history that no amount of git log can recover.

Business context recorded separately, with stable identifiers that can be referenced from code commits and PR descriptions, does not have this problem. The decision made in 2024 that still shapes the architecture in 2030 is findable. The constraint that looked temporary but became permanent is visible. The engineer joining in 2031 can understand not just what was built but why.

This is not a new problem. Libraries and archives have been solving it for centuries. The software industry reinvented source control and forgot to reinvent institutional memory.

The Practitioner's Test

Here is a simple test for any AI agent pipeline:

Can a new agent, with no human explanation, understand why the most important architectural decision in your codebase was made?

Not what it is. Why.

If the answer is no — if that understanding lives only in one person's head, accessible only through conversation — then your pipeline is faster than before, but not smarter. You have automated execution. You have not automated judgment.

Automating judgment requires giving the system something to judge with. Business context, structured, stable, cross-referenced, preserved across personnel changes and repository reorganizations.

The agents are ready. The context layer is the missing piece.