DEV Community

SAIHM-Admin
SAIHM-Admin

Posted on • Originally published at saihm.coti.global

Your database AI agent re-reads the whole catalog every step — that's the bill

The AI agent that helps you tune queries feels cheap on a toy schema and expensive on a real warehouse. The reason is the same one that makes a long tuning session forget the index it suggested ten minutes ago — and it is fixable.

Why each suggestion costs more than the last

When an AI agent helps you run or tune a database, each step is a fresh call to the model. To reason well, that call carries the catalog with it: table definitions, indexes, constraints, and the query history it has seen so far. On a small schema that is cheap. On a warehouse with thousands of tables it is not — and every additional turn re-sends the whole thing. So the cost of a single suggestion climbs with the size of your database, not the size of the question you asked.

It is also why a long session starts to wander: once the catalog plus the conversation outgrows the context window, something has to be cut, and the agent forgets the index it recommended a few prompts ago.

The dynamic, measured

Re-sending the full catalog every turn makes total context spend grow far faster than the work itself. SAIHM measured it on a reproducible, offline benchmark and saw 62.8%–85.9% fewer context tokens across a session when the agent recalls a compact memory instead of replaying the full history, with the gap widening the longer the session runs. The benchmark is open source and runs locally, so you can model your own schema size and see where the curve lands for your database.

Recall only the objects a query touches

The fix is to stop re-sending the catalog. SAIHM keeps the durable facts — table shapes, index choices, the tuning decisions already made — as separate memory cells. Each step recalls only the handful of objects the current query touches instead of replaying the whole schema, so a suggestion about one table costs about what it would on an almost-empty database. The memory persists between sessions, so the next time the agent looks at that table it already knows the history. And because the store is addressable from any model — Claude, GPT, DeepSeek, Qwen, Kimi, GLM — and through LangChain or LlamaIndex, you can change the model behind the agent without re-teaching it your schema.

Your most regulated data lives here — so hold the keys

A database is where your most regulated data sits: customer records, payment rows, anything under privacy rules. An agent’s memory of that schema, its sample rows, and its query results is sensitive in its own right. With most hosted-memory products that memory lives on a vendor’s servers under the vendor’s keys — which becomes your problem the moment an auditor asks where it is or a data subject invokes their right to be forgotten. SAIHM keeps it yours: the memory is encrypted under keys you control, and erasure is per-record and provable. When a record has to go, its cells are cryptographically destroyed with an audit trail you can show — not a row flagged deleted that still sits in a backup nobody purged.

The honest close

SAIHM is a paid product, with no free tier — that is stated up front rather than buried behind a trial. But the benchmark and all nine integration demos are open source and run locally, so you can verify the savings and try the connect path before deciding anything. The tool surface and setup steps are at /developers; pricing is at /pricing.

Join SAIHM

— Architect

Independence notice. SAIHM is an Apache-2.0 protocol authored independently. The benchmark referenced here is open source and reproducible offline; the figures are produced by the published script and depend on session length and scenario. The architecture is described at a conceptual level; the authoritative details are the open specification and the published source.


Originally published at the SAIHM blog on 2026-06-30. SAIHM is the Sovereign AI Horizontal Memory protocol — Apache 2.0, open spec at saihm.coti.global.

Top comments (0)