Paulo Victor Leite Lima Gomes

Posted on Jun 28

agents need shared memory, not another context window

#ai #agents #developertools #stackoverflow

Stack Overflow launched a beta called Stack Overflow for Agents earlier this month, and the easy headline is obvious enough: the Q&A site for developers now wants to be useful to the tools that are writing more of the code.

That is interesting.

But the product announcement is less important than the diagnosis underneath it.

The problem is not that agents cannot find an answer. They can search docs, scrape examples, read code, call tools, and generate three plausible fixes before a human has finished complaining about the build.

The problem is that agents forget too much of what they learn.

One agent hits a weird package manager issue, burns tokens for twenty minutes, discovers the workaround, gets the tests green, and then the session ends. Another agent, in another repo, hits the same issue an hour later and starts from zero. Maybe the human remembers. Maybe the fix makes it into a PR comment. Maybe it becomes an internal wiki page nobody reads.

Mostly, it evaporates.

That is the part worth paying attention to.

context is not memory

The agent industry keeps talking about context windows as if they are the whole memory story.

Bigger context is useful. I like when a tool can read more of the repo, more logs, more documentation, more previous discussion. A small context window turns every task into a little amnesia simulator.

But context is what the agent can see right now.

Memory is what the system can safely reuse later.

Those are different things.

A bigger context window helps one run. A shared memory system helps the next run avoid making the same mistake. That difference matters because a lot of agent work is not invention. It is rediscovery.

Why does this test fail only on Node 24? Which version of this SDK changed the auth flow? Why does this Docker image work locally but fail in CI? Which migration path looked correct but broke customers? Which generated fix passed unit tests and still created a production incident?

These are exactly the little scars that good engineers accumulate over time. They are also exactly the things agents are bad at retaining across isolated sessions.

the expensive part is not answering

Stack Overflow's announcement uses a phrase I like: the ephemeral intelligence gap. It is a dramatic product phrase, but the underlying issue is real.

Agents can produce answers cheaply. They cannot automatically know which answers survived contact with production.

That is the expensive part.

Anyone who has used coding agents for more than a toy demo has seen this shape. The agent finds an example. The example is old. The package has moved. The docs disagree with the type definitions. The generated code compiles after two retries. Then a runtime edge case appears because the example assumed a default that your system does not use.

The final fix is valuable, but not because it is beautiful. It is valuable because it contains evidence:

what failed
what was tried
what finally worked
which versions were involved
which assumption turned out to be wrong
which test proved the fix

That evidence is much more useful than another polished answer.

If agent knowledge systems become just another pile of confident snippets, they will make the problem worse. The internet already has enough outdated solutions with high search ranking and no warning label.

The valuable thing is not "agents can post answers."

The valuable thing is "agents can preserve verified failure traces in a form another agent can consume."

verification is the product

The most interesting detail in Stack Overflow for Agents is that verification, not creation, earns reputation.

That is the right instinct.

Creation is cheap now. Verification is scarce. The system should reward the scarce thing.

A generated answer that has never been tried is a guess with nice formatting. A generated answer that has been attempted across multiple projects, under known conditions, with humans and agents reporting back, starts to become useful operational knowledge.

This is the same reason I care about test output in agent pull requests. I do not only want the diff. I want the story of how the diff was produced and checked. Which command ran? What failed first? Did the agent change the code because the test was meaningful or because the test was annoying? What uncertainty remains?

For humans, a Stack Overflow answer used to be useful partly because of the social surface around it: votes, comments, edits, accepted answers, dates, reputation, and all the messy hints that told you whether to trust it.

Agents need an equivalent trust surface.

Not vibes. Not "the model sounded confident." A real trust surface.

private knowledge will matter more than public knowledge

The public version is fun to think about, but the enterprise version is where this gets very practical.

Most of the agent mistakes that cost companies real time are not about public APIs. They are about private systems.

The internal deploy tool has a flag that should not be used anymore. The monorepo has one package that must be built with a different cache setting. The billing service has a weird integration test because of a contract signed in 2021. The mobile app cannot upgrade a dependency until one customer finishes a migration.

None of that belongs in the public internet.

But agents need to know it.

This is why internal knowledge bases for agents are going to become more important than people expect. Not as a dumping ground for every Slack thread. As a reviewed, queryable, versioned memory of the things that keep being rediscovered.

The shape should be closer to engineering evidence than corporate documentation.

Short entries. Clear scope. Version information. Links to incidents, PRs, tests, and decisions. A visible owner. Expiration dates where the knowledge may rot. Feedback when an agent uses the entry and finds it wrong.

That sounds boring because knowledge management is always boring until you need it at 2 AM.

do not train the loop on garbage

There is a dangerous version of this future.

Agents write questionable fixes. The questionable fixes get stored. Other agents retrieve them. The pattern becomes more common because the memory system made it easier to repeat. Eventually the organization has built a very fast way to spread bad engineering habits.

This is not hypothetical. Humans already do this with wikis, copied snippets, and "this worked last time" folklore. Agents can do it faster and with more confidence.

So the memory layer needs moderation and deletion, not only accumulation.

Some entries should expire. Some should be marked as specific to one version. Some should require human approval before becoming reusable. Some should be rejected because they describe a workaround the team does not want to normalize.

The memory system is not just a cache.

It is governance.

Once an agent can retrieve a piece of knowledge and use it to change code, that knowledge becomes part of the engineering control plane. It deserves ownership, review, permissions, and audit trails. Otherwise the organization is letting stale hints become automation policy.

what i would capture first

If I were adding agent memory to a team today, I would not start with a grand knowledge graph.

I would start with the repeated pain.

Capture the things agents and humans keep rediscovering:

dependency upgrade hazards
flaky test root causes
deploy tool traps
internal API gotchas
security rules that generated code tends to violate
migration recipes that have actually worked
rejected patterns that keep coming back

Then make every entry answer a few simple questions.

What problem does this solve? Where does it apply? What versions or services are in scope? What evidence proves it? Who owns it? When should it be reviewed again?

That is enough to be useful.

The point is not to turn every engineer into a librarian. The point is to stop paying the same discovery cost forever.

the punchline

Stack Overflow for Agents is easy to frame as a brand extension. Humans had Stack Overflow, now agents get one too.

But I think the more important idea is simpler: agent work needs durable, verified memory.

Bigger models will help. Bigger context windows will help. Better search will help. None of those fully solve the problem of production knowledge that gets discovered, used once, and lost.

Software teams already know this pain. It is the bug fixed twice. The migration lesson trapped in one PR. The incident detail nobody wrote down. The "ask Alice" dependency that becomes a surprise when Alice is on vacation.

Agents make that pain more visible because they can repeat the same mistake at machine speed.

So yes, give agents tools. Give them search. Give them docs. Give them access to the repo.

But also give them a memory system that respects evidence.

One that remembers not only what answer looked plausible, but what actually worked, where it worked, who verified it, and when it may stop being true.

The future of agentic coding is not just better generation.

It is better recall.

And in software, memory without judgment is just another source of bugs.

references

To test my projects, I use Railway. If you want $20 USD to get started, use this link.

DEV Community