Dmitry Bondarchuk

Posted on Apr 7

OmnethDB: Building a Memory System Agents Can Actually Trust

#ai #webdev #memory #agents

I have been working on Vexdo for a while now, trying to build an autonomous system that can ship code with as little human intervention as possible.

Some of that work ended up in earlier write-ups:

OmnethDB came out of a pretty simple thought within that broader Vexdo journey.

If I want agents to work on a codebase with less and less human supervision, it would be really useful if they could accumulate project memory in roughly the same way people do.

A person who has been on a project for a long time is usually much more effective than a newcomer. They know the weird edge cases, the old migrations, the intentional tradeoffs that look like bugs, the decisions that were reversed, and the things that are technically possible but architecturally wrong.

I wanted something closer to that.

Project link: github.com/ubcent/omnethdb

Most agent memory systems are optimized for demos.

They can retrieve semantically similar notes, summarize recent context, and make an assistant feel like it "remembers." That is enough to look impressive in a prototype.

It is not enough to build something trustworthy.

So I started building OmnethDB from a stricter premise: memory for agents should be treated as a serious system primitive, not as a vague cache wrapped around embeddings.

The bar is higher than "it retrieved something relevant."

The bar is:

can we inspect why this memory exists?
can we see whether it was superseded?
can we tell whether it is a stable fact or a historical event?
can we audit what changed and why?
can an agent retrieve current truth without silently mixing it with stale truth?

That is the problem I want OmnethDB to solve.

The Real Problem With "Memory"

A lot of systems treat memory as one undifferentiated blob:

architecture facts
implementation details
temporary incidents
outdated decisions
inferred patterns
random notes from previous runs

Everything gets embedded. Everything becomes retrievable. And then the agent is expected to "figure it out."

That sounds flexible, but in practice it creates ambiguity.

When a fact changes, both the old and new versions often remain in the corpus with no explicit semantic difference. Retrieval might surface either one. Sometimes it surfaces both. The agent gets contaminated context and has to guess what is current.

That is not a retrieval problem. It is a memory semantics problem.

Agents do not just need more memory. They need memory with explicit rules around:

versioning
lineage
lifecycle
provenance
relation semantics
current-vs-historical truth

What OmnethDB Is

OmnethDB is a versioned, governed, inspectable memory primitive for autonomous agents.

At the architecture level, it is intentionally opinionated:

memories have kinds such as Static, Episodic, and Derived
memory updates are explicit, not implicit
lineage is preserved
old memories are not deleted
forgetting is a lifecycle mark, not silent removal
relations are typed
retrieval is designed to return the current version of knowledge, not a probabilistic blend of history and present

That last part matters a lot.

In the OmnethDB architecture, if memory A updates memory B, that is not just metadata for humans to inspect later. It changes the active truth of the lineage. There is exactly one latest memory in a lineage at any point in time.

That gives agents a much stronger contract than "here are some similar snippets, good luck."

Why This Matters In Practice

The dangerous failure mode in agent systems is not forgetting.

It is remembering the wrong thing with high confidence.

If an agent is helping with debugging, migrations, architecture work, or product decisions, stale memory is often worse than missing memory. Missing memory usually creates uncertainty. Stale memory creates false certainty.

That is why I treat memory in OmnethDB as something that must be inspectable and auditable, not just searchable.

Where This Corpus Came From

The corpus behind these examples was not invented for the article.

I connected OmnethDB to Claude Code as an MCP server and used it inside a real pet project for about a week.

During that time, the memory corpus accumulated the kind of facts that actually show up in day-to-day engineering work:

architectural boundaries
infra edge cases
intentional tradeoffs that look like bugs without context
superseded plans
implementation details that matter operationally

That matters because the interesting question is not whether a memory system can store polished examples.

The interesting question is whether it stays useful when the knowledge is messy, evolving, and grounded in real work.

That is the environment these examples came from.

Also, one small warning before the examples: names like mulder, palantir, gringotts and chronicle are just internal service names from my pet project. I have a bad habit of giving services weird names and then making future-me work harder to remember what any of them actually do.

Corpus Example 1: An Intentional Auth Decision

Here is the kind of memory that benefits from strong semantics:

rotateRefreshToken: false in an OIDC config was explicitly recorded as intentional, not a bug.

[static] gringotts: rotateRefreshToken: false in configOIDC.ts is intentional, not a bug.

Reason: default oidc-provider v8 rotates refresh tokens on every use. With
parallel refresh requests, reuse detection can revoke the whole grant, including
 newly issued tokens, leading to permanent 401 failures.

The memory did not just store the final conclusion. It captured the operational reason:

default refresh rotation marked tokens as consumed
parallel refresh requests could trigger token reuse detection
reuse detection revoked the whole grant
users could receive fresh tokens that were already dead
the result was permanent 401 failures

This is exactly the kind of fact that agents routinely mishandle if memory is fuzzy.

Without disciplined memory, a future agent might see rotateRefreshToken: false and "fix" it back to true because rotating refresh tokens sounds more secure in the abstract.

With governed memory, the system can preserve the actual local truth:

this was a deliberate tradeoff
the rationale is known
the memory is stable until superseded

That is much closer to how strong engineering teams actually reason.

Corpus Example 2: Nginx, Subdomains, And The Difference Between A Symptom And A Cause

Another memory in the corpus captured a subtle but high-impact behavior in nginx routing for subdomains.

The observed issue was simple: relative links were broken on artist subdomains.

[static] client/prod.nginx.conf.sigil: subdomain block rewrites location / to
/user/$username.

Critical nginx behavior: proxy_pass with URI replaces the matched location
prefix. Request /foo becomes /user/$usernamefoo, so relative links break.
Only the root / works correctly.

But the memory did not stop at the symptom. It preserved the real mechanism:

location / rewrote traffic to /user/$username
proxy_pass with a URI replaces the matched location prefix
requests like /users became /user/artistusers
only the root path worked correctly

That memory then pointed to the practical fix:

use a top-level app URL
generate absolute internal links
avoid relying on relative navigation from the rewritten subdomain path

This is a good example of memory that is not merely descriptive. It is operationally useful because it encodes causality, not just observed breakage.

Corpus Example 3: Why Lineage Matters More Than Similarity

One of the clearest examples in the corpus is a calendar-related architectural shift.

At one point, memory reflected a plan involving a separate chronicle service emitting calendar:event:changed.

A later memory updated that reality: calendar functionality lives inside palantir, not a standalone chronicle service.

v1:
[static] New pattern: calendar:event:changed from chronicle (port 3007)

v2:
[static] CalendarModule is implemented inside palantir (port 3005) - a separate
chronicle service is not created.

If your system only does semantic retrieval, both memories may look relevant forever.

That is the core problem.

They are both about calendar architecture.
They are both high-similarity.
They are both "useful context."

But only one is the current truth.

OmnethDB's lineage model is designed precisely for this case. The past remains auditable, but the present remains explicit. Historical memory is still available for inspection without silently driving live decisions.

That distinction is one of the main reasons we think memory needs stronger primitives than vector search alone.

Corpus Example 4: Structured Memory For Retrieval Boundaries

Another good example comes from search architecture.

The corpus records that mulder indexes profiles into OpenSearch, but the client does not query mulder directly. Public search flows still go through CMS GraphQL and PostgreSQL unless a specific public search API is added.

[static] mulder fully indexes profiles into OpenSearch, but the client never
queries mulder directly - all search goes through CMS GraphQL -> PostgreSQL.
OpenSearch is currently a "dead" index without a public search API.

That sounds like a small implementation detail, but it is actually a product and architecture boundary.

If an agent misses that distinction, it may:

propose the wrong integration point
wire a client directly into the wrong service
assume OpenSearch is already serving user-facing search traffic

The memory is useful because it tells the agent not just what exists, but what role it currently plays in the system.

Again: explicit behavior beats ambient context.

Not Anti-Embedding, Anti-Hand-Waving

To be clear, this is not an anti-embedding argument.

Embeddings are useful.
Similarity search is useful.
Semantic retrieval is useful.

But embeddings alone do not give you:

supersession semantics
lifecycle control
derivation provenance
auditability
current-version guarantees

Similarity can tell you what is related.
It cannot tell you what is canonical.

That is why we think memory systems for agents need stronger structure than "store chunks, embed them, and retrieve the top K."

Advisory As Memory Lint

Another part of the idea that I find increasingly important is the advisory layer around memory quality.

One useful way to think about it is: advisory is a bit like lint for memory.

A linter does not usually rewrite your whole program for you. It points at suspicious structure, inconsistent style, dead code, or likely mistakes and asks you to make an explicit decision.

I think memory systems need something similar.

Over time, a corpus accumulates:

stale facts that should probably be superseded
duplicate memories that should be merged or retired
weak derived patterns with shaky provenance
memories that are still retrievable but no longer belong on the hot path

That is not exactly retrieval, and it is not exactly storage either. It is memory hygiene.

So one direction I care about in OmnethDB is an advisory layer that can surface these issues the way a linter surfaces code smells: not by pretending to know product truth automatically, but by making memory quality problems visible and actionable.

That feels like an important missing piece. A serious memory system should not just remember. It should also help you keep what it remembers legible, current, and worth trusting.

The Design Standard

The standard we care about is not "good enough to demo."

It is:

semantic correctness
explicit behavior over hidden magic
inspectable state transitions
durable provenance
retrieval that respects current truth
enough structure that a strong engineer can trust the system under scrutiny

If agent memory is going to become core infrastructure, it should be built with the seriousness we apply to databases, queues, and auth systems.

Not as a toy.
Not as a vibe.
As infrastructure.

Where This Gets Interesting

The exciting part is not just that agents can remember more.

It is that they can remember in a way that supports disciplined reasoning:

what is current
what changed
what was superseded
what is historical but still worth inspecting
what was derived from multiple sources
what should remain visible without being allowed to silently control decisions

That is the path from "agent memory" as a demo feature to memory as a trustworthy primitive.

That is what I am trying to build with OmnethDB.

Closing

If we want agents that can operate safely in real codebases and real systems, memory has to become more than retrieval sugar.

It has to become something we can inspect, govern, version, and audit.

That is the bet behind OmnethDB:

memory should be queryable, but it should also be legible.

And when the truth changes, the system should know the difference between history and the present.

DEV Community