In my previous article, Human Code Review Is Not the Last Frontier, I argued that human code review is not the final bottleneck.
Underneath that argument was something deeper, one of the real bottlenecks in agent-native engineering, the missing context.
Not context in the vague sense people usually mean when they say, "just give the model more information". I mean real engineering context. The kind that tells you whether a change is actually correct or just looks correct for five minutes in isolation. The kind of context that lives in old pull requests, half-forgotten migrations, team decisions, outdated assumptions, scattered docs, local workarounds, and painful experience inside a codebase that has been evolving for years.
That is usually the difference between a change that compiles and a change that actually belongs.
A repository is never just its current code. It also contains what the system is trying to move away from, what is still in progress, which ugly pattern still exists for a reason, which assumptions used to be true but should not be repeated, which workaround only makes sense in one corner of the system, and which apparently harmless area is actually fragile. Humans recover that knowledge over time because they have lived through the system, talked to the people around it, reviewed old changes, and seen things break.
Agents do not.
They only work with what is surfaced to them, and most of the time that context is incomplete, stale, too generic, or trapped in the wrong place. That is why I do not think this problem will be solved by simply adding more notes, writing a better prompt, or relying on a memory feature and hoping it behaves like judgment.
If context is one of the real bottlenecks in agent-native engineering, then it probably needs something more serious.
It needs its own system.
Context Is Not a Sidecar Problem
A lot of current AI workflows still treat context like a sidecar. Something attached to a prompt. Something kept in a rules file. Something the tool "remembers". Something dumped into documentation and hoped to stay fresh.
That can help, I use those things too, but they do not solve the core issue.
The real problem is not whether context exists somewhere. The real problem is whether that context is current, scoped correctly, relevant to the task, and surfaced at the right moment in the workflow.
That is a very different problem, it is not mainly a storage problem, it is a context quality problem, because context has lifecycles.
Context can be global and durable, or narrowly tied to a single repository. It may only matter during a migration, carry enough confidence to shape implementation, or stay weak and temporary because it comes from recent observations that could stop being true next week. Certain context matters most during planning, while other context only becomes important during debugging or validation. And once a transition is over, some of it should stop influencing future work entirely.
Once you look at it that way, context stops being just text, it becomes living information.
And living information needs to be managed, not merely stored.
Why Memory Is the Wrong Mental Model
This is why I do not think context should be treated as memory.
Memory sounds passive, it suggests something you keep around and occasionally recall, something helpful but secondary, something sitting off to the side. That is not enough.
What I am describing is an operational system.
That distinction matters because an operational system participates in the workflow. It can be maintained, revised, ranked, scoped, and delivered when needed. The point is not just to retain context somewhere in the background. The point is to make it usable.
This is also why "more memory" is not automatically better. A pile of remembered notes is still a pile. If context is stale, weak, conflicting, badly scoped, or poorly timed, then surfacing more of it may make the workflow worse, not better.
The challenge is not accumulation, the challenge is usefulness.
Why Raw Files Are Not Enough
This is where the idea starts to become more concrete, I am not talking about saving more notes, I am talking about giving engineering context a canonical system of record.
Not because markdown is bad, but because raw files alone are a weak foundation for something that has scope, freshness, confidence, lineage, version history, review state, conflict states, and natural decay over time.
If context has that kind of lifecycle, then the system behind it should be able to understand things like:
- where a piece of context applies
- how trustworthy it is
- what generated it
- whether it conflicts with something newer
- whether it is still active
- whether it should continue influencing future work
- whether it represents a preferred path or a known anti-pattern
Those are not minor details. They are part of what makes context usable.
This is why I think the missing piece is not memory, and not documentation with better search. The missing piece is an engine, a system capable of ingesting, classifying, refining, and retrieving context in a way that matches real engineering work.
The important part is not storage, it is the engine. The storage layer is not the differentiator, the engine is.
Storage matters only because it gives the system structure, but structure alone is not what solves the problem. What matters is what sits on top of that structure.
A useful context system should be able to notice when a new piece of context overlaps with something already known. It should recognize when a previous assumption has started to decay. It should understand that a repository-specific rule should not be treated as a global one. It should learn that a certain implementation path repeatedly gets corrected by humans. It should preserve not only what worked, but also what repeatedly failed or had to be revised.
That last part matters a lot.
A good system should not only remember successful patterns, it should also surface negative knowledge, what to avoid, what keeps breaking, what looks reasonable but repeatedly turns out to be wrong in this specific codebase.
That is one of the biggest gaps in current workflows. Agents are often good at producing plausible work. But plausible is not the same as correct, and it is definitely not the same as locally correct inside a messy, evolving system.
What they are missing is not always capability, they are missing the context that actually matters.
Internally Rich, Externally Simple
One of the most important design principles here is simple: internally rich, externally simple.
Internally, the system should be sophisticated. It should use structured storage, lifecycle management, ranking, revision logic, conflict handling, freshness scoring, and retrieval logic that can adapt to different stages of work.
Externally, though, it should stay simple.
Most agent workflows already consume text extremely well. Prompts, notes, handoff files, planning docs, constraint summaries, migration notes, contextual briefs. That part already works. So instead of forcing every harness, assistant, IDE, or workflow to understand a complicated internal schema, the engine can do the hard work inside and return the result as markdown.
That is the abstraction.
Internally, the system manages context properly. Externally, it delivers one or more markdown artifacts that can be consumed almost anywhere.
That could mean a brief for a quick task, a planning pack for riskier work, migration notes when a system is in transition, recent learnings for debugging, known risks during validation. The exact filenames do not matter, what matters is that the output remains simple enough for almost any workflow to adopt without friction.
That simplicity is not a compromise, it is part of the design.
If every team needs schema awareness, special adapters, and deep knowledge of how the engine thinks, then adoption becomes harder right where the system should be disappearing into the background. The workflow should not need to care how context was stored, promoted, revised, merged, or archived. It should just ask for the right context and receive a clean package.
Why This Matters Now
This matters because the real world is not going to standardize around one assistant, one IDE, one orchestration layer, or one company workflow.
Different teams use different tools, different companies have different processes, different repositories have different levels of entropy, different tasks need different amounts of context at different moments.
So if this idea only works when everything is redesigned around it, then it is already weaker than it should be.
But if the internal system stays rich and the external interface stays simple, then the same engine can plug into planning, implementation, debugging, validation, CI loops, pull request preparation, or longer autonomous workflows without forcing everyone into the same stack.
The workflow asks for context, the engine returns markdown, the agent consumes it.
That is much more realistic.
The Missing Layer
This is the missing layer I was pointing at in the previous article.
When I said human code review is not the last frontier, part of what I meant was that the real frontier is not simply whether agents can generate more code. It is whether they can operate with the kind of context that real engineering work actually depends on.
Not generic context, not bloated context, not stale context, useful context.
Context that knows where it applies, whether it is still true, how strong it is, what it conflicts with, and when it should stop influencing future work. Context that can be maintained instead of forgotten. Context that appears when it matters, instead of arriving as noise.
That is why I think this deserves its own layer.
Not another memory feature, not a prompt trick, not markdown folders pretending to be a system. A real "Context Engine", internally structured enough to manage context properly, and externally simple enough that almost nobody consuming it needs to care how it works.
They just get a usable context package.
And honestly, that is not a small implementation detail, it is the whole point.
Closing Thought
As agents get better at generating code, the bottleneck becomes easier to see. The issue is often not raw capability, it is contextual correctness.
That is why I think context needs something more deliberate behind it, not passive memory, not loose documentation, not a bigger pile of notes, an engine.
Because the difference between work that looks right and work that is right often lives in all the things the code alone does not tell you. And if those things are becoming one of the main constraints on agent-native engineering, then they should not remain scattered across prompts, docs, habits, and human memory.
They should have a real system behind them.
Top comments (1)
I think this is right.
The real problem is not whether context exists somewhere, but whether it is current, scoped correctly, and surfaced at the right moment. A pile of notes is not the same thing as usable engineering context.
So yes, calling it “memory” is probably too weak. If agents are going to work reliably inside real systems, this needs to behave more like an engine.