Two Kinds of Agent Memory: OKF Bundles vs. Codebase Knowledge Graphs

#ai #mcp #llm #programming

Half of the memory you are about to hand-write for your agent is already sitting in your codebase. The other half, no indexer will ever find.

Both gaps feel identical from the agent's side. It opens every session knowing nothing about your systems, so the instinct is to give it one memory store and move on. The two gaps are not the same. One is derivable. One is not. The tool that closes the first does nothing for the second.

Watch an agent open a repo it has seen ten times. It greps. It reads the same forty files. It rebuilds the same call graph it built yesterday, spending a few hundred thousand tokens to relearn what the code already states. Then it asks you which database is the source of truth, because the code does not say.

The part the code already knows

Most of what an agent relearns is structural. Who calls ProcessOrder. What breaks if you change this signature. Which routes are dead. That knowledge is true whether or not anyone wrote it down, because it is encoded in the source.

So derive it. A code knowledge graph parses the repo once and answers structural questions from a persistent index. The one I have been testing, codebase-memory-mcp, builds that graph with tree-sitter across 158 languages and serves it to any agent over MCP. The agent stops grepping and starts querying: trace the callers of a function, map the blast radius of a diff, list dead code. Things grep cannot answer at any speed. I run it behind a small trust gate, so an agent only queries repos I have vetted: pi-codegraph.

The token savings are real, but read the measured number. The project's preprint reports roughly 10x fewer tokens and 83% answer quality across 31 repositories. The README's "99%" comes from a hand-picked query set. The honest figure is still a strong figure. You do not need to inflate it.

The part no graph will find

Now the half that is not in any AST. Which of three users tables is canonical. Why the payments service must never touch the legacy billing API. That the staging cluster reports latency it does not actually have. None of this is structure. It is judgment, history, and consequence. It lives in people, and people leave.

OKF is the format for writing that down. Open Knowledge Format, an open spec Google published in June 2026, is a directory of markdown files with YAML frontmatter. One concept per file. A folder of concepts is a bundle. You version it in git, review it in pull requests, and serve it to any agent over MCP as resources. It is boring on purpose. If you can cat a file, you can read it. If you can git clone, you can ship it. The reader and curator I point my agents at, with the same trust gate, is pi-okf.

The mistake is using one for the other

Point a graph indexer at tribal knowledge and you get silence, because there is no edge in the AST for "deprecated, do not call." Hand-write an OKF concept for every function's callers and you are transcribing what the graph returns in a millisecond, by hand, and it is wrong by the next commit.

So stop asking which memory tool to install. Ask whether the knowledge your agent lacks is authored-only or derivable. Get that backwards and you pay twice: once to write down what the code already states, again when your hand-written copy goes stale and quietly misleads the agent you built it for.

Which one for which team

A developer dropped into a large or unfamiliar codebase needs derived knowledge. The questions are structural and the code holds the answers. Reach for the graph.

An operator whose agent reaches across many systems, the MCP-heavy setup with data platforms, internal APIs, and ops runbooks, needs authored knowledge. The value sits between the systems, not inside any one of them, and a single-repo parser is blind to it. Reach for OKF bundles.

Most real setups need both, and the two compose better than either alone. Let an enrichment agent walk the code graph and emit OKF concepts for the architecture it can derive. Then a human edits in the parts the graph cannot see: the canonical, the why, the never. The graph keeps the bundle honest about structure. The human keeps it honest about intent.

Derive what the code knows. Author what only people do. One half is nearly free. The other is the actual job.

So before you bolt another memory server onto your agent, sort the knowledge into the two piles first. That split is the first thing I set up when I build a production agent. Two questions decide it:

What does your agent keep relearning that the code already states?
What does it keep guessing because nobody ever wrote it down?

I write field notes from real builds — AI integration, cron-driven automation, and the parts that break in production. New posts every two weeks; if this one was useful, agent memory from your task manager is the companion guide.