I have been working on Vexdo for a while now, trying to build an autonomous system that can ship code with as little human intervention as possible.
Some of that work ended up in earlier write-ups:
- I built a local AI dev pipeline that reviews its own code before opening a PR
- I let agents write my code. They got stuck in a loop and argued with each other
- I needed a workflow engine for AI agents. None of them fit, so I built one
OmnethDB came out of a pretty simple thought within that broader Vexdo journey.
If I want agents to work on a codebase with less and less human supervision, it would be really useful if they could accumulate project memory in roughly the same way people do.
A person who has been on a project for a long time is usually much more effective than a newcomer. They know the weird edge cases, the old migrations, the intentional tradeoffs that look like bugs, the decisions that were reversed, and the things that are technically possible but architecturally wrong.
I wanted something closer to that.
Project link: github.com/ubcent/omnethdb
Most agent memory systems are optimized for demos.
They can retrieve semantically similar notes, summarize recent context, and make an assistant feel like it "remembers." That is enough to look impressive in a prototype.
It is not enough to build something trustworthy.
So I started building OmnethDB from a stricter premise: memory for agents should be treated as a serious system primitive, not as a vague cache wrapped around embeddings.
The bar is higher than "it retrieved something relevant."
The bar is:
- can we inspect why this memory exists?
- can we see whether it was superseded?
- can we tell whether it is a stable fact or a historical event?
- can we audit what changed and why?
- can an agent retrieve current truth without silently mixing it with stale truth?
That is the problem I want OmnethDB to solve.
The Real Problem With "Memory"
A lot of systems treat memory as one undifferentiated blob:
- architecture facts
- implementation details
- temporary incidents
- outdated decisions
- inferred patterns
- random notes from previous runs
Everything gets embedded. Everything becomes retrievable. And then the agent is expected to "figure it out."
That sounds flexible, but in practice it creates ambiguity.
When a fact changes, both the old and new versions often remain in the corpus with no explicit semantic difference. Retrieval might surface either one. Sometimes it surfaces both. The agent gets contaminated context and has to guess what is current.
That is not a retrieval problem. It is a memory semantics problem.
Agents do not just need more memory. They need memory with explicit rules around:
- versioning
- lineage
- lifecycle
- provenance
- relation semantics
- current-vs-historical truth
What OmnethDB Is
OmnethDB is a versioned, governed, inspectable memory primitive for autonomous agents.
At the architecture level, it is intentionally opinionated:
- memories have kinds such as
Static,Episodic, andDerived - memory updates are explicit, not implicit
- lineage is preserved
- old memories are not deleted
- forgetting is a lifecycle mark, not silent removal
- relations are typed
- retrieval is designed to return the current version of knowledge, not a probabilistic blend of history and present
That last part matters a lot.
In the OmnethDB architecture, if memory A updates memory B, that is not just metadata for humans to inspect later. It changes the active truth of the lineage. There is exactly one latest memory in a lineage at any point in time.
That gives agents a much stronger contract than "here are some similar snippets, good luck."
Why This Matters In Practice
The dangerous failure mode in agent systems is not forgetting.
It is remembering the wrong thing with high confidence.
If an agent is helping with debugging, migrations, architecture work, or product decisions, stale memory is often worse than missing memory. Missing memory usually creates uncertainty. Stale memory creates false certainty.
That is why I treat memory in OmnethDB as something that must be inspectable and auditable, not just searchable.
Where This Corpus Came From
The corpus behind these examples was not invented for the article.
I connected OmnethDB to Claude Code as an MCP server and used it inside a real pet project for about a week.
During that time, the memory corpus accumulated the kind of facts that actually show up in day-to-day engineering work:
- architectural boundaries
- infra edge cases
- intentional tradeoffs that look like bugs without context
- superseded plans
- implementation details that matter operationally
That matters because the interesting question is not whether a memory system can store polished examples.
The interesting question is whether it stays useful when the knowledge is messy, evolving, and grounded in real work.
That is the environment these examples came from.
Also, one small warning before the examples: names like mulder, palantir, gringotts and chronicle are just internal service names from my pet project. I have a bad habit of giving services weird names and then making future-me work harder to remember what any of them actually do.
Corpus Example 1: An Intentional Auth Decision
Here is the kind of memory that benefits from strong semantics:
rotateRefreshToken: false in an OIDC config was explicitly recorded as intentional, not a bug.
[static] gringotts: rotateRefreshToken: false in configOIDC.ts is intentional, not a bug.
Reason: default oidc-provider v8 rotates refresh tokens on every use. With
parallel refresh requests, reuse detection can revoke the whole grant, including
newly issued tokens, leading to permanent 401 failures.
The memory did not just store the final conclusion. It captured the operational reason:
- default refresh rotation marked tokens as consumed
- parallel refresh requests could trigger token reuse detection
- reuse detection revoked the whole grant
- users could receive fresh tokens that were already dead
- the result was permanent
401failures
This is exactly the kind of fact that agents routinely mishandle if memory is fuzzy.
Without disciplined memory, a future agent might see rotateRefreshToken: false and "fix" it back to true because rotating refresh tokens sounds more secure in the abstract.
With governed memory, the system can preserve the actual local truth:
- this was a deliberate tradeoff
- the rationale is known
- the memory is stable until superseded
That is much closer to how strong engineering teams actually reason.
Corpus Example 2: Nginx, Subdomains, And The Difference Between A Symptom And A Cause
Another memory in the corpus captured a subtle but high-impact behavior in nginx routing for subdomains.
The observed issue was simple: relative links were broken on artist subdomains.
[static] client/prod.nginx.conf.sigil: subdomain block rewrites location / to
/user/$username.
Critical nginx behavior: proxy_pass with URI replaces the matched location
prefix. Request /foo becomes /user/$usernamefoo, so relative links break.
Only the root / works correctly.
But the memory did not stop at the symptom. It preserved the real mechanism:
-
location /rewrote traffic to/user/$username -
proxy_passwith a URI replaces the matched location prefix - requests like
/usersbecame/user/artistusers - only the root path worked correctly
That memory then pointed to the practical fix:
- use a top-level app URL
- generate absolute internal links
- avoid relying on relative navigation from the rewritten subdomain path
This is a good example of memory that is not merely descriptive. It is operationally useful because it encodes causality, not just observed breakage.
Corpus Example 3: Why Lineage Matters More Than Similarity
One of the clearest examples in the corpus is a calendar-related architectural shift.
At one point, memory reflected a plan involving a separate chronicle service emitting calendar:event:changed.
A later memory updated that reality: calendar functionality lives inside palantir, not a standalone chronicle service.
v1:
[static] New pattern: calendar:event:changed from chronicle (port 3007)
v2:
[static] CalendarModule is implemented inside palantir (port 3005) - a separate
chronicle service is not created.
If your system only does semantic retrieval, both memories may look relevant forever.
That is the core problem.
They are both about calendar architecture.
They are both high-similarity.
They are both "useful context."
But only one is the current truth.
OmnethDB's lineage model is designed precisely for this case. The past remains auditable, but the present remains explicit. Historical memory is still available for inspection without silently driving live decisions.
That distinction is one of the main reasons we think memory needs stronger primitives than vector search alone.
Corpus Example 4: Structured Memory For Retrieval Boundaries
Another good example comes from search architecture.
The corpus records that mulder indexes profiles into OpenSearch, but the client does not query mulder directly. Public search flows still go through CMS GraphQL and PostgreSQL unless a specific public search API is added.
[static] mulder fully indexes profiles into OpenSearch, but the client never
queries mulder directly - all search goes through CMS GraphQL -> PostgreSQL.
OpenSearch is currently a "dead" index without a public search API.
That sounds like a small implementation detail, but it is actually a product and architecture boundary.
If an agent misses that distinction, it may:
- propose the wrong integration point
- wire a client directly into the wrong service
- assume OpenSearch is already serving user-facing search traffic
The memory is useful because it tells the agent not just what exists, but what role it currently plays in the system.
Again: explicit behavior beats ambient context.
Not Anti-Embedding, Anti-Hand-Waving
To be clear, this is not an anti-embedding argument.
Embeddings are useful.
Similarity search is useful.
Semantic retrieval is useful.
But embeddings alone do not give you:
- supersession semantics
- lifecycle control
- derivation provenance
- auditability
- current-version guarantees
Similarity can tell you what is related.
It cannot tell you what is canonical.
That is why we think memory systems for agents need stronger structure than "store chunks, embed them, and retrieve the top K."
Advisory As Memory Lint
Another part of the idea that I find increasingly important is the advisory layer around memory quality.
One useful way to think about it is: advisory is a bit like lint for memory.
A linter does not usually rewrite your whole program for you. It points at suspicious structure, inconsistent style, dead code, or likely mistakes and asks you to make an explicit decision.
I think memory systems need something similar.
Over time, a corpus accumulates:
- stale facts that should probably be superseded
- duplicate memories that should be merged or retired
- weak derived patterns with shaky provenance
- memories that are still retrievable but no longer belong on the hot path
That is not exactly retrieval, and it is not exactly storage either. It is memory hygiene.
So one direction I care about in OmnethDB is an advisory layer that can surface these issues the way a linter surfaces code smells: not by pretending to know product truth automatically, but by making memory quality problems visible and actionable.
That feels like an important missing piece. A serious memory system should not just remember. It should also help you keep what it remembers legible, current, and worth trusting.
The Design Standard
The standard we care about is not "good enough to demo."
It is:
- semantic correctness
- explicit behavior over hidden magic
- inspectable state transitions
- durable provenance
- retrieval that respects current truth
- enough structure that a strong engineer can trust the system under scrutiny
If agent memory is going to become core infrastructure, it should be built with the seriousness we apply to databases, queues, and auth systems.
Not as a toy.
Not as a vibe.
As infrastructure.
Where This Gets Interesting
The exciting part is not just that agents can remember more.
It is that they can remember in a way that supports disciplined reasoning:
- what is current
- what changed
- what was superseded
- what is historical but still worth inspecting
- what was derived from multiple sources
- what should remain visible without being allowed to silently control decisions
That is the path from "agent memory" as a demo feature to memory as a trustworthy primitive.
That is what I am trying to build with OmnethDB.
Closing
If we want agents that can operate safely in real codebases and real systems, memory has to become more than retrieval sugar.
It has to become something we can inspect, govern, version, and audit.
That is the bet behind OmnethDB:
memory should be queryable, but it should also be legible.
And when the truth changes, the system should know the difference between history and the present.
Top comments (0)