DEV Community: SAIHM-Admin

Beyond remembering: SHM, the Super-Human Memory add-on

SAIHM-Admin — Tue, 14 Jul 2026 03:17:07 +0000

There is a difference between an agent that stores things and an organisation that knows things. Base SAIHM closes the first gap. SHM — the Super-Human Memory add-on available on the Enterprise tiers — closes the second.

What base SAIHM already guarantees

Base SAIHM gives every AI agent in your fleet a persistent, sovereign memory: encrypted under keys you hold, erasable on demand with cryptographic proof, every significant action anchored to a public audit trail, shareable across vendors under revocable consent. That is memory you can put in front of an auditor — memory you can defend. Before the add-on, be clear about the foundation, because everything SHM does inherits it:

Your keys, not a vendor’s. Memory is sealed client-side; the operator cannot read it.
Real erasure. Deletion destroys the key material. There is no “soft delete” to subpoena back into existence.
A public audit anchor. What the fleet remembered, shared, and erased is committed to a public chain — verifiable by your auditors without trusting anyone’s logs.
Consent-based sharing. Cross-agent and cross-vendor memory access is granted per-record and revoked in one step.

That is the compliance spine. SHM never bypasses it. Every capability below operates inside those guarantees. It is the layer that turns a fleet of individually-remembering agents into an organisation with an institutional memory that compounds.

What SHM adds today

1. Recall by meaning, not by keyword. Base recall answers the question “which memories contain this term?” SHM answers the question your teams actually ask: “what do we know about this?” Natural-language queries return the most relevant memories, ranked, from stores that have grown to thousands of records. At small scale the difference is convenience. At fleet scale it is the difference between usable institutional memory and a write-only archive.

2. Always-hot recall. SHM keeps the recall path warm — resident, cached, and fault-tolerant — so retrieval is consistently fast rather than occasionally fast. For interactive workloads, and for the throughput profile Enterprise Fast customers run, memory access stops being the step everyone waits on.

3. Retrieval that respects your token budget. An agent should load what a task needs — nothing more. SHM’s bounded, ranked retrieval brings back the few memories that matter instead of replaying history, which is how the memory layer reduces model spend at exactly the moment most memory systems inflate it. The cost of a step tracks the work in the step, not the age of the deployment.

4. Consolidation: memory that improves with use. Left alone, every memory system silts up — duplicates, superseded facts, contradictions. SHM runs a consolidation cycle: raw event memory is distilled into durable knowledge, duplicates are merged, stale facts retire, and the organising structure sharpens. Think of it as the fleet’s sleep cycle. Six months in, an SHM-backed deployment is sharper than it was at month one — not slower and noisier.

5. Parallel workstream continuity. Enterprises do not run one thread of work; they run dozens, across teams and quarters. SHM tracks each workstream as its own resumable line of memory — pick any initiative up months later and the context returns precisely, without wading through everything else the fleet has done since. Staff turnover and vendor changes stop erasing operational context, because continuity lives in the memory layer rather than in individuals.

6. Concurrent conversations that do not blur together. Running many workstreams at once is where most memory systems quietly fail: context from one conversation bleeds into another, or threads lose fidelity as they multiply. SHM keeps every live conversation on its own line of memory. An agent can carry multiple simultaneous engagements — an incident, a negotiation, a migration, a review — switch between them mid-stream, and recall returns each thread’s context exactly, and only that thread’s. This is not a roadmap item: we run our own operations this way, multiple concurrent workstreams tracked through a single agent, none contaminating another.

7. Work that survives the context window. Every AI model has a context limit. When a session ends, resets, or overflows, a retail agent starts over — the agent your team talks to after lunch is a stranger to the morning’s work. With SHM, working state lives outside the model: a session can reset, or an entirely fresh agent instance can take over, and the work resumes precisely where it stopped. We operate this way daily; long-running engagements routinely outlive any single session. Continuity is a property of the memory layer, not of keeping one fragile session alive.

8. Corrections that become standing policy. When an agent errs and is corrected, SHM turns the correction into durable, recallable guidance — surfaced before the next similar action, not after the next similar failure. Mistake patterns get caught ahead of repetition, and operating rules accumulate instead of evaporating with the session. It is the institutional learning you already require of human teams, enforced in the memory layer.

9. Memory that arrives before you ask. SHM supports a recall-first operating pattern: agents brief themselves from memory at the start of a task and surface what is relevant proactively. The practical effect is fewer repeated mistakes and fewer re-derived decisions — the fleet acts like it has been here before, because it has.

Where this goes

The capabilities above run today. The direction of travel matters as much, and it is deliberately enterprise-shaped:

Fleet knowledge operations. A semantic map of what your agents collectively know — where knowledge is concentrated, where it is thin, and how it is drifting.
Erasure that cascades into derived knowledge. When a record is erased under GDPR Article 17, the obligation does not stop at the original — it extends to what was derived from it. SHM’s consolidation layer is being built so erasure propagates through derived structures by construction, not by best-effort cleanup. Ask a retail memory vendor how they handle that question.
Scope-aware recall. Semantic retrieval that enforces sharing contracts at query time — an agent recalls only what its mandate permits, and the enforcement is part of the memory layer, not the application’s honour system.
Answerable history. “What did our agents know about X, and when did they know it?” — answered semantically, with chain-anchored provenance behind every result. That is eDiscovery-grade capability for AI memory.
Decision-time reconstruction. Not just what the fleet knows now — what it knew on the day a decision was made. Replay the knowledge state behind any past decision and defend it with the facts as they stood, not as they stand.
Post-mortems that assemble themselves. When an initiative closes, its memory thread already contains the history — what was known, when it was learned, where course changed. Draw the post-mortem from memory instead of reconstructing it from chat logs and recollection.
Compliance reporting from the memory layer. Reports drawn directly from audited, chain-anchored memory rather than collated after the fact from whatever survived.
Memory service classes. Hot, warm, and archival memory tiers with defined service levels, matched to workload criticality.
One knowledge layer across mixed fleets. Different models, different vendors, one consolidated institutional memory — so a model swap is a procurement decision, not a lobotomy.

Who this is for

Enterprise deployments get SHM as the knowledge-operations layer over unlimited remembers and recalls — the tier where memory stops being per-agent plumbing and becomes an organisational asset with an SLA.

Enterprise Fast adds the throughput and latency profile for fleets where memory sits on the critical path — high-frequency agent workloads, interactive services, and operations where “occasionally fast” is not fast enough.

Current tier structure is on the pricing page. SHM availability and terms are discussed directly — contact ops@saihm.coti.global with the subject “SHM Enterprise Enquiry”.

SAIHM is the memory layer for businesses and regulated enterprises — and the developers shipping to them. Retail tools remember; SAIHM can prove what it remembers, shares, and erases. SHM is what that memory becomes when it starts compounding.

— Architect

Independence notice. SAIHM is an Apache-2.0 protocol authored independently. It provides a memory capability; the intelligence in any deployment belongs to the AI models the operator chooses. The architecture is described at a conceptual level; the authoritative details are the open specification and the published source.

Originally published at the SAIHM blog on 2026-07-14. SAIHM is the Sovereign AI Horizontal Memory protocol — Apache 2.0, open spec at saihm.coti.global.

Your AI is keeping a record on you. Who can take it?

SAIHM-Admin — Sun, 05 Jul 2026 03:17:07 +0000

Every capable AI assistant now keeps a memory — a growing record of what you asked, what you told it, and what you decided together. That memory is what makes the assistant genuinely useful. It is also, quietly, the most complete file anyone has ever kept on you. This is about a simple question most people never think to ask: if that record exists, who can get a copy — and what would it take to make it worthless to them?

The most detailed file about you may be one you never see

Think about what you actually say to an AI assistant over a year. The health worry you researched at 2am. The money problem. The draft of the difficult email. The relationship, the job, the plan you were not ready to share. A journalist adds the name behind a pseudonymous source; a lawyer, a client’s secret; an activist, who is meeting whom, and where.

Individually these are moments. Collected in one place, in order, they become a dossier — richer than your search history, your messages, or your bank statements, because it includes not just what you did but what you were thinking about doing. Today that record usually lives on a company’s servers, readable by the company. The question is not whether it is valuable. It is who can reach it.

“Trust the company to protect you” — until it can’t

The promise behind most online services is the same: trust us to hold your data safely. That promise has failed the same way, over and over, in three forms.

A legal demand. A court order or subpoena arrives, and the company has to produce what it holds. In 2005, Yahoo handed Chinese authorities the account and email records that identified the journalist Shi Tao; he was sentenced to ten years. The mechanism has not changed — only the richness of the data has.
An insider. Someone inside the company — recruited, bribed, or coerced — reaches into accounts and pulls private data. In 2022, a U.S. jury convicted a former Twitter employee of spying for Saudi Arabia after insiders used their access to unmask anonymous critics — some of whom were later detained.
And now, AI itself. In 2025 a U.S. court ordered OpenAI to preserve and produce ChatGPT conversation logs — a sample of roughly 20 million — and, separately, a warrant sought to unmask an anonymous user from their prompts alone. Assistant memory is no longer hypothetical to reach; the tools to compel it already exist.

Notice the common thread: in every case, protection depended on a company choosing — or being able — to say no. When the memory is readable by the company, its safety is only ever as strong as the company’s willingness and ability to refuse. That is a thin thing to rest your privacy on.

What changes when the memory is actually yours

SAIHM — Sovereign AI Horizontal Memory, a sovereign, encrypted, sharable, persistent memory protocol for AI agents — is built to remove the company from that equation. It is the memory layer your AI thinks with, designed around one idea: the record should belong to you, not to whoever runs the service.

In practice that means three things, in plain terms:

It is locked before it leaves. With SAIHM’s protected setup, each memory is sealed on your own device before it is stored. The service keeps only a sealed copy it cannot open. Hand that service a legal demand and it can produce — honestly — nothing readable.
It is yours to carry. Your memory is not locked inside one company’s product. You can move it with you from one AI tool to another, so switching providers doesn’t mean starting over — or leaving a copy behind for someone else to inherit.
It is yours to erase — for real. When you delete a memory, SAIHM destroys the key that unlocks it. Any copy that still exists anywhere becomes permanently unreadable, and you get a receipt that it happened. That is a stronger guarantee than a company assuring you it pressed delete on its own servers.

This is the same shift that provable erasure and choosing where your AI memory lives describe from other angles: the point is not a new privacy promise, but a change in who holds the power — from the operator to you.

Peace of mind that doesn’t depend on trust

Here is the part that lets you stop worrying. Most privacy tools ask you to trust that a company is handling your data well. SAIHM is built the other way round: the protection is structural, not a promise. Because your memory is sealed on your own device and the keys never leave it, the operator is locked out by design — there is no “we would never look” to believe in, because there is nothing on their side to look at. A hacker who breaches the servers, an insider who goes rogue, a subpoena served on the company — each meets the same wall, and each comes away with nothing anyone can read.

And you don’t have to take that on faith. SAIHM is open-source: the code that does the sealing and the erasing is public, so it can be read, checked, and challenged by anyone — rather than hidden behind a marketing claim. That is the heart of what makes SAIHM different — sovereignty you can verify, not a policy you have to hope holds. Set it up once, and you can use your AI for the things that matter most to you knowing that what it remembers is yours, and stays that way.

Why leaders should care, not just individuals

If you run a newsroom, a legal practice, a clinic, or any organisation whose people use AI at work, every one of those assistants is building a record you may be holding on their behalf — and that you could be compelled to produce, or breached out of. Memory that is sealed on the user’s device turns that liability into something you simply do not hold in readable form. Provable erasure turns “we deleted it” from a claim into a receipt — which is exactly what a regulator, a client, or a source increasingly expects. The right to be forgotten stops being a policy you promise and becomes a thing you can demonstrate.

For the people most exposed — journalists and their sources, human-rights defenders, anyone working under real surveillance pressure — this is the difference between a seized device or a compelled server yielding a source network, and yielding nothing anyone can read. That is the population SAIHM is built to serve first.

Set it up before you ever need it

No one in those cases got a warning. Protection has to be in place before the demand, the breach, or the knock at the door — afterwards is too late. That is the case for doing this now, while things are calm: SAIHM flips the default so your AI’s memory is yours to hold, carry, and truly erase, and prying eyes — a hacker, an insider, or a court order — come away with nothing they can read. It is a paid product with no free tier — though you can try the open, runnable demos first, with no signup, and see for yourself how a memory is sealed on your device and then permanently erased. What the subscription buys is worth paying for: the quiet confidence that what your AI knows about you is safe from prying eyes, and stays that way.

Join SAIHM

— Architect

Originally published at the SAIHM blog on 2026-07-04. SAIHM is the Sovereign AI Horizontal Memory protocol — Apache 2.0, open spec at saihm.coti.global.

SAIHM gives your AI a memory, not a new brain

SAIHM-Admin — Fri, 03 Jul 2026 03:17:08 +0000

Welcome. Before you wire SAIHM into anything, here is the most useful thing to know — honestly, up front. SAIHM will not make your AI smarter. It gives your AI something it does not have on its own: a real memory. That sounds modest. It is not. Most of the frustrating things an AI assistant does — forgetting what you decided, re-reading everything, contradicting itself next session — are memory problems, not intelligence problems. That is the part SAIHM fixes.

What SAIHM is — and what it isn’t

SAIHM is a memory layer. It is a place your AI can store facts, decisions, and context, and recall exactly the right ones later — privately, portably, and permanently until you say otherwise. That is a set of capabilities.

SAIHM is not intelligence. It does not reason, plan, write your code, or make a weak model into a strong one. It has no opinions and does no thinking. If your AI gives a wrong answer because it reasoned poorly, SAIHM will not fix that — that is the model’s job. What SAIHM addresses is the other kind of wrong answer — the one your AI gives because it forgot, re-read stale context, or lost the thread between sessions.

The short version: your AI brings the intelligence; SAIHM brings the memory. They are different jobs, and they work best together.

Intelligence and memory are different jobs

A quick way to feel the line between them:

“Design a schema for this data.” — that is intelligence. Your AI does it; SAIHM does not.
“What schema did we agree on last week, and why?” — that is memory. Without SAIHM your AI simply cannot answer it reliably; with SAIHM it recalls the decision and the reason in a sentence.

SAIHM remembers the what and the why. Your AI decides the how. Give a capable model a reliable memory and it stops repeating itself, stops asking you to re-explain, and starts compounding what it already knows.

See it: the same AI, with and without a memory

Same model, same intelligence — the only thing that changes is whether it has SAIHM to remember for it.

Without a memory layer, every turn you (or your app) re-send the whole history so the model can “remember,” and it still forgets across sessions:

You: [paste the entire past conversation + all prior decisions, again]
 Now, given all of the above, what should we do next?

# expensive (you pay for all that context every turn), and gone tomorrow

With SAIHM, the model recalls only what this step needs, and the memory outlives the session:

Agent: saihm_recall("deployment decisions, database choice") -> 3 cells
 (the model reasons over just those, then acts)
Agent: saihm_remember("Chose Postgres over Mongo for X; revisit at scale")

# cheap (a bounded recall, not the whole history), and still true next week

Notice what did not change: the model’s reasoning. SAIHM did not make it cleverer — it made it remember, which is why the second version is both cheaper and more consistent.

The one prompt to start with

If you do just one thing after joining, add this to your agent’s system prompt. It is deliberately terse — it tells your AI to lean on SAIHM’s memory instead of re-sending context, which is where the token savings come from:

Use SAIHM as your memory. Every turn: call saihm_recall (bounded, keyword-scoped)
for only what this task needs, instead of re-reading history. Trust the most
recent, non-superseded cell. Call saihm_remember for durable decisions, one fact
per cell, in your own words. Call saihm_forget on any delete request. Always
prefer a small recall over re-sending the whole context.

That is the whole idea: recall a little, don’t re-send a lot. It is what keeps your context window — and your bill — small as sessions grow, while your AI keeps its own intelligence entirely intact.

What to expect — and what not to

Expect SAIHM to: remember decisions and context across turns and sessions; recall the current fact, not a stale one; work the same across models you use (so switching models doesn’t wipe its memory); keep that memory encrypted under keys you hold; and erase any record for real when you ask. Expect your long sessions to get noticeably cheaper, because the model recalls instead of re-reading.

Don’t expect SAIHM to make a model reason better, rescue a vague prompt, or “just know” things nobody ever told it. It remembers what your AI puts in and hands the right pieces back — the thinking stays with your AI. Set that expectation and SAIHM will feel exactly as useful as it is: the dependable memory your AI was always missing.

Welcome aboard

If you’re still deciding: the honest pitch is that SAIHM is a memory layer, priced as a paid product with no free tier, and it is worth it precisely because it fixes the memory problems that no amount of model intelligence solves on its own. If you’ve just joined: start with the one prompt above, and let your AI do the rest of the thinking.

Join SAIHM

— Architect

Originally published at the SAIHM blog on 2026-07-01. SAIHM is the Sovereign AI Horizontal Memory protocol — Apache 2.0, open spec at saihm.coti.global.

Why an agent harness needs the right memory protocol, not a memory feature

SAIHM-Admin — Thu, 02 Jul 2026 03:17:06 +0000

If you build the loop — the harness around a model — memory isn’t a nice-to-have you bolt on at the end. It’s the component that decides whether your agent scales past a few turns, recalls the right fact instead of a stale one, and survives a model swap. But “add some memory” isn’t the answer either: most memory features get you a place to put text and nothing else. Here’s what separates a real memory protocol from a feature — with numbers you can reproduce, demos you can clone and run yourself, and a drop-in prompt you can paste into your harness today.

The harness is where memory actually lives

A model doesn’t have a session. The harness does — the code that runs the loop, manages context, orchestrates tools, and decides what the model sees on each turn. Whether you’re building a coding agent, an autonomous task runner, or a multi-agent system, you own that boundary. And the single most expensive decision at that boundary is what you put in the context window on every turn.

Most harnesses answer that question the naive way: re-send the entire transcript. It works for a demo. It quietly falls apart in production. Three failures show up, in order.

What “the right one” actually means

Any store can hold text. That’s the low bar every vendor “memory feature” clears — and then stops. The right memory protocol for a harness is defined by four properties, and each one maps to a failure below:

Bounded recall — you retrieve a small, capped set each turn instead of replaying history. (Failure 1.)
Correctness under change — recall returns the current fact, not a superseded one. (Failure 2.)
Cross-model portability — the same memory works no matter which model reads it. (Failure 3.)
Provable erasure — deletion is real and per-record, not a flag.

A memory feature gives you the first property, halfway, and none of the others. That gap is the whole point of this post.

Failure 1 — the resend tax is quadratic

An agent loop isn’t one call, it’s dozens. Each turn re-sends the system prompt, the entire growing transcript, and the new message. The transcript only grows, and every turn replays everything before it — so total context spend scales roughly O(N²) across N turns. It’s also why long sessions eventually hit the window and fall over.

The fix is structural: don’t re-send history. Keep durable facts — decisions, conventions, file paths — as memory cells, and recall a small bounded set each turn. That turns the quadratic resend into roughly O(N · cap).

SAIHM published an offline, reproducible benchmark that measures exactly this — input/context tokens only, naive full-transcript resend vs. capped recall, tokenized with gpt-tokenizer (cl100k_base):

Session length	Naive tokens	SAIHM tokens	Fewer
5 turns	1,628	605	62.8%
10 turns	6,091	1,273	79.1%
15 turns	13,175	2,023	84.6%
18 turns	18,688	2,632	85.9%

The longer the session, the wider the gap — exactly what O(N²)-vs-O(N·cap) predicts. It counts input only (output is identical under both strategies), and it’s conservative for short work. Clone it and run node benchmark.mjs — it reproduces deterministically:

git clone https://github.com/citw2/saihm-token-benchmark
cd saihm-token-benchmark && npm install && node benchmark.mjs
node benchmark.mjs --recall-cap 8 # trade recall breadth vs savings

Failure 2 — recall correctness, not just recall cost

Cheaper context is the easy half. The harder half is a correctness problem: a harness that recalls the wrong or stale fact is worse than one that pays to re-send. If your agent “remembers” a decision you reversed three turns ago, it will confidently act on it.

This is where naive memory — keyword match, or dumping recent history — breaks down. The hard retrieval cases for any agent harness are:

Supersession — is this fact current, or one you’ve since overridden? Keyword recall is essentially a coin flip here: a reversed decision and the decision that replaced it share almost all their words, so lexical similarity literally cannot tell a live fact from a dead one.
Temporal — which version was true at the time that matters.
Contradiction — two cells disagree; which one wins.

The retrieval cases that matter for a harness are the hard ones, not the easy lookups. Fact and paraphrase are table stakes; multi-hop, supersession, temporal, and contradiction are where naive memory quietly fails. Supersession-, temporal-, and contradiction-awareness are the whole point: a memory layer that gets those right is the difference between an agent that compounds knowledge and one that compounds mistakes.

Failure 3 — vendor lock-in is an architecture risk

Every model vendor now ships some built-in memory. Each one is a walled garden: non-portable, non-inspectable, gone the moment you switch models or run two models side by side. For a harness engineer that’s a structural risk — your agent’s memory shouldn’t be hostage to one provider’s roadmap.

SAIHM is a single store you address across models. The same memory works from Claude, GPT, DeepSeek, Qwen, Kimi and GLM, and through LangChain and LlamaIndex. One model can write a fact and another can read it back. There are more than a dozen runnable demos — each a self-contained repo you clone and run locally — including the same memory used from six different models; a cross-model demo where one model writes and another reads it back; a Claude Code integration; and adapters for LangChain, LlamaIndex, CrewAI, AutoGen and LangGraph. All linked from the runnable demos index.

The architecture that falls out

Put those together and a clean harness shape emerges: a stateless core plus durable external memory. The loop stays thin and restart-safe; state lives in a memory layer that outlives any single process, model, or session. You stop hand-rolling transcript truncation and brittle “summarize the history” hacks, and you stop paying the quadratic tax to keep context alive.

It’s also production-shaped where it counts:

Non-custodial — the service stores ciphertext it can’t read; you hold the keys.
Provable erasure — deletion is per-record and real (key-destruction), not a soft-delete flag. If you handle user data, this is the difference between a compliance story and a compliance liability.
Tamper-evident audit — the trail can be verified, not just trusted.

A drop-in memory contract for your harness

Here’s the fastest way to see the difference. Paste this fragment into your agent’s system prompt (it assumes the SAIHM MCP tools saihm_recall, saihm_remember, and saihm_forget are wired into your harness). It’s written so that following it is what produces every claim above — bounded cost, correct recall, portability, real deletion:

## Memory contract

On every turn, before you reason or act:
1. RECALL, don't re-read. Call saihm_recall with keywords for the current task
 to load a small, bounded set of relevant memory cells. Do NOT re-send prior
 turns - the recalled cells ARE your working state.
2. Prefer the CURRENT fact. If two recalled cells conflict, the most recent /
 non-superseded one wins. Never act on a decision a later cell reverses.

Whenever something durable changes:
3. REMEMBER it. Call saihm_remember to persist decisions, conventions, file
 paths, and constraints - one fact per cell, in your own words.
4. Mark supersession. When you reverse an earlier decision, write the new cell
 AND state that it supersedes the old one, so future recall returns the live
 version, not the dead one.

Across models and on deletion:
5. PORTABLE. The same store answers from any model. A fact written under one
 model is readable by another - do not keep a separate memory per vendor.
6. On a "delete my data" request, call saihm_forget on the specific cell(s).
 Erasure is per-record and provable (real key-destruction), not a soft flag.

Every line maps to a claim in this post: (1) flattens the resend curve from O(N²) to O(N·cap); (2) and (4) are what win the supersession, temporal, and contradiction cases; (5) is what kills the lock-in; (6) is your deletion-and-audit story. Start with a small saihm_recall cap (say 8 cells) and raise it only if recall misses — that cap is the knob the benchmark’s --recall-cap flag lets you tune against real savings.

Verify everything before you believe any of it

That’s the part built for engineers: you don’t have to take a single number on faith. The benchmark and every demo are open source (Apache-2.0) and run locally. Kick the tires, swap in your own scenario, paste the memory contract into your own harness, and see where a portable memory layer fits your stack — or doesn’t.

The honest close

SAIHM is a paid product — no free tier, stated up front rather than buried behind a trial. Pricing is flat and public: Pro at $5, Pro Fast at $9, Enterprise at a $500 floor with an SLA. The benchmark and demos are open precisely so you can verify the claims and try the integration before deciding anything. The tool surface and setup steps are at /developers; pricing is at /pricing.

If you’re engineering an agent harness, the memory layer isn’t the last thing you add — it’s the thing that decides whether the rest holds up. Add the right one.

Join SAIHM

— Architect

FAQ

What makes a memory protocol “right” versus just a feature? Four properties a harness actually needs: bounded recall (capped tokens per turn), correctness under change (recall returns the current, non-superseded fact), cross-model portability (one store, any model), and provable per-record erasure. Most vendor memory features give you the first halfway and none of the rest.

Can I just paste the memory-contract prompt and go? Yes — it’s a system-prompt fragment that drives the three core tools (saihm_recall / saihm_remember / saihm_forget). Each instruction is written to produce a specific claim in this post; start with a small recall cap and raise it only if recall misses.

Does this replace my model’s built-in memory? It’s an alternative you own and can address across models — so you’re not locked to one vendor’s non-portable memory, and multi-model harnesses share one store.

Is the benchmark cherry-picked? It counts input tokens only (output is identical either way), runs fully offline, and is conservative for short sessions (~63% at 5 turns). Reproduce it and change the recall cap yourself.

Can I use it from LangChain / LlamaIndex / Claude Code? Yes — there’s a runnable demo for each.

What happens when a user asks me to delete their data? Erasure is per-record and provable (real key-destruction), with a tamper-evident audit trail.

Independence notice. SAIHM is an Apache-2.0 protocol authored independently. The benchmark referenced here is open source and reproducible offline; the figures are produced by the published script and depend on session length and scenario. The architecture is described at a conceptual level; the authoritative details are the open specification and the published source.

Originally published at the SAIHM blog on 2026-07-01. SAIHM is the Sovereign AI Horizontal Memory protocol — Apache 2.0, open spec at saihm.coti.global.

Your security AI agent carries the whole case history into every alert — that's the bill

SAIHM-Admin — Tue, 30 Jun 2026 02:15:25 +0000

The agent that helps triage alerts feels cheap on a quiet morning and expensive deep into a noisy day. The reason is the same one that makes it lose the thread on a long shift — and it is fixable.

Why the hundredth alert costs more than the first

When a security agent triages an alert, each step is a fresh call to the model, and to reason it carries the context with it: past investigations, detection rules, threat notes, the indicators it has already seen. Early in a shift that is light. With a backlog of cases behind it, every new alert re-sends all of that. So the cost of triaging one alert climbs with the size of the backlog, not the severity of the alert in front of you.

It is also why a long shift loses the thread: once the case history outgrows the context window, the agent quietly drops what it learned about an earlier, related alert — exactly the link an analyst needed it to keep.

The dynamic, measured

Carrying the whole case file every turn makes total context spend grow far faster than the work itself. SAIHM measured it on a reproducible, offline benchmark and saw 62.8%–85.9% fewer context tokens across a session when the agent recalls a compact memory instead of replaying the full history, with the gap widening the longer the session runs. The benchmark is open source and runs locally, so you can model your own alert volume and see where the curve lands for your team.

Recall only what an alert touches

The fix is to stop carrying the whole case file. SAIHM keeps the durable facts — confirmed indicators, prior findings, the rules that fired — as separate memory cells, and each triage recalls only the few an alert actually touches instead of replaying the backlog. So triaging the hundredth alert of the day costs about what the first did, and the link to a related case three hours ago is still there because the memory persists between sessions. Because the store is addressable from any model — Claude, GPT, DeepSeek, Qwen, Kimi, GLM — and through LangChain or LlamaIndex, you can change the model behind the agent without re-teaching it your environment.

This memory is sensitive — so hold the keys

A security agent’s memory is some of the most sensitive data in the building: confirmed indicators, internal hostnames, the shape of your detections, what was caught and what was not. For a team whose job is to assume breach, that cannot sit on a vendor’s servers under a vendor’s keys. SAIHM keeps it yours: the memory is encrypted under keys you control, so the operator cannot read what it cannot decrypt, and erasure is per-record and provable. When an investigation closes or a record must be purged, its cells are cryptographically destroyed with an audit trail you can hand to an assessor — not flagged deleted in a store you simply have to trust.

The honest close

SAIHM is a paid product, with no free tier — that is stated up front rather than buried behind a trial. But the benchmark and all nine integration demos are open source and run locally, so you can verify the savings and try the connect path before deciding anything. The tool surface and setup steps are at /developers; pricing is at /pricing.

Join SAIHM

— Architect

Originally published at the SAIHM blog on 2026-06-30. SAIHM is the Sovereign AI Horizontal Memory protocol — Apache 2.0, open spec at saihm.coti.global.

Your database AI agent re-reads the whole catalog every step — that's the bill

SAIHM-Admin — Tue, 30 Jun 2026 02:15:05 +0000

The AI agent that helps you tune queries feels cheap on a toy schema and expensive on a real warehouse. The reason is the same one that makes a long tuning session forget the index it suggested ten minutes ago — and it is fixable.

Why each suggestion costs more than the last

When an AI agent helps you run or tune a database, each step is a fresh call to the model. To reason well, that call carries the catalog with it: table definitions, indexes, constraints, and the query history it has seen so far. On a small schema that is cheap. On a warehouse with thousands of tables it is not — and every additional turn re-sends the whole thing. So the cost of a single suggestion climbs with the size of your database, not the size of the question you asked.

It is also why a long session starts to wander: once the catalog plus the conversation outgrows the context window, something has to be cut, and the agent forgets the index it recommended a few prompts ago.

The dynamic, measured

Re-sending the full catalog every turn makes total context spend grow far faster than the work itself. SAIHM measured it on a reproducible, offline benchmark and saw 62.8%–85.9% fewer context tokens across a session when the agent recalls a compact memory instead of replaying the full history, with the gap widening the longer the session runs. The benchmark is open source and runs locally, so you can model your own schema size and see where the curve lands for your database.

Recall only the objects a query touches

The fix is to stop re-sending the catalog. SAIHM keeps the durable facts — table shapes, index choices, the tuning decisions already made — as separate memory cells. Each step recalls only the handful of objects the current query touches instead of replaying the whole schema, so a suggestion about one table costs about what it would on an almost-empty database. The memory persists between sessions, so the next time the agent looks at that table it already knows the history. And because the store is addressable from any model — Claude, GPT, DeepSeek, Qwen, Kimi, GLM — and through LangChain or LlamaIndex, you can change the model behind the agent without re-teaching it your schema.

Your most regulated data lives here — so hold the keys

A database is where your most regulated data sits: customer records, payment rows, anything under privacy rules. An agent’s memory of that schema, its sample rows, and its query results is sensitive in its own right. With most hosted-memory products that memory lives on a vendor’s servers under the vendor’s keys — which becomes your problem the moment an auditor asks where it is or a data subject invokes their right to be forgotten. SAIHM keeps it yours: the memory is encrypted under keys you control, and erasure is per-record and provable. When a record has to go, its cells are cryptographically destroyed with an audit trail you can show — not a row flagged deleted that still sits in a backup nobody purged.

The honest close

Join SAIHM

— Architect

Originally published at the SAIHM blog on 2026-06-30. SAIHM is the Sovereign AI Horizontal Memory protocol — Apache 2.0, open spec at saihm.coti.global.

Your app's AI assistant re-sends the whole conversation on every message — that's the bill

SAIHM-Admin — Tue, 30 Jun 2026 01:59:19 +0000

The chat assistant you added to your app feels cheap in the demo and expensive in production. The reason is the same one that makes long chats “forget” the start of the conversation — and it is fixable.

Why the second page of a chat costs more than the first

When a user talks to the AI assistant in your app, each message is a fresh call to the model. To stay coherent, that call re-sends the system prompt and the entire conversation so far, then the new message. Message one is cheap. Message twenty re-sends the previous nineteen. So the cost of a single reply climbs as the conversation grows, and a busy support chat or a long planning session is exactly where it climbs fastest.

It is also why long chats start dropping the earlier details: once the conversation outgrows the context window, something has to be cut, and the assistant quietly loses what the user told it ten minutes ago.

The dynamic, measured

Re-sending the whole transcript every turn makes total context spend grow far faster than the conversation itself. SAIHM measured it on a reproducible, offline benchmark and saw 62.8%–85.9% fewer context tokens across a session when the assistant recalls a compact memory instead of replaying the full history, with the gap widening the longer the session runs. The benchmark is open source and runs locally, so you can model your own chat lengths and see where the curve lands for your app.

Recall the few facts a reply needs — per user

The fix is to stop re-sending the transcript. SAIHM keeps the durable facts of each user’s history — their preferences, their account context, the decisions made earlier in the thread — as separate memory cells. Each reply recalls only the handful it needs instead of replaying the whole chat, so a reply on message twenty costs about what it did on message two. The memory persists between sessions too, so a returning user is remembered without you stuffing their entire history back into the prompt. And because the store is addressable from any model — Claude, GPT, DeepSeek, Qwen, Kimi, GLM — and through LangChain or LlamaIndex, you can switch the model behind your feature without re-teaching it your users.

It is your users' data — so hold the keys to it

A per-user assistant memory is some of the most sensitive data your app holds: what each person asked, shared, and decided. With most hosted-memory products that history lives on a vendor’s servers under the vendor’s keys — which becomes your problem the moment a user invokes their right to be forgotten or your compliance team asks where that data sits. SAIHM keeps it yours: the memory is encrypted under keys you control, and erasure is per-record and provable. When a user asks you to delete their data, that user’s cells are cryptographically destroyed with an audit trail you can show — not flagged hidden in a table you hope nobody queries. For a web app carrying real user data, that is the difference between a one-click compliance answer and an open risk.

The honest close

Join SAIHM

— Architect

Originally published at the SAIHM blog on 2026-06-30. SAIHM is the Sovereign AI Horizontal Memory protocol — Apache 2.0, open spec at saihm.coti.global.

An AI agent reasoning over your warehouse pays for the whole schema every turn

SAIHM-Admin — Mon, 29 Jun 2026 23:33:16 +0000

The wider and better-governed your warehouse, the more an AI agent has to carry just to reason about one table. That is a strange tax to pay: the assets you are proudest of make every call heavier.

A wide warehouse makes every call heavier

Put an AI agent on top of a real data platform — to write a transformation, trace a lineage question, or explain why a metric moved — and it needs grounding: table schemas, column types, join keys, the lineage graph, the modelling decisions your team already made. So on each turn it re-sends a large slice of the whole schema and history before it reasons about the few tables actually in scope.

That means the size of the bill tracks the size of the warehouse, not the size of the task. A focused question about one fact table drags along hundreds of columns it will never touch — and the more thoroughly you have modelled and documented your platform, the worse the effect.

Why it grows the way it does

Each step of an agent loop re-sends the system prompt plus the entire growing context — here, schema, lineage and prior steps. Because every step replays what came before, the tokens you pay for scale faster than the work does. SAIHM measured this on a reproducible, offline benchmark and saw 62.8%–85.9% fewer context tokens across a session when an agent recalls a compact memory instead of replaying everything, with the gap widening on longer sessions. The benchmark is open source and runs offline, so you can model your own schema width and see where the curve lands.

Recall the tables and rules a step actually touches

SAIHM holds schema facts and modelling decisions as separate memory cells — this table’s grain, that column’s units, the rule that revenue is always stored in minor currency units, the reason a column was deprecated. When the agent works a specific transformation, it recalls only the cells for the tables and rules in play, so context tracks the task rather than the width of the warehouse. The same store is addressable from any model and through orchestration like LangChain or LlamaIndex, so the agent on your pipeline is not locked to one vendor’s context window.

Schema is governed data — keep it under your keys

Schema, lineage and column semantics are not throwaway: they encode how your business defines its numbers, and they often reference exactly which columns hold personal data. That is governed information, and handing it to a vendor’s hosted memory hands them your data map. SAIHM keeps it yours: the memory is encrypted under keys you hold, and erasure is per-record and provable — when a column carrying personal data is dropped, the cell describing it is cryptographically destroyed with an audit trail, which is the kind of evidence a right-to-erasure request actually demands. For a data team that lives inside a governance regime, per-record provable erasure is not a nice-to-have; it is the requirement.

The honest close

Join SAIHM

— Architect

Originally published at the SAIHM blog on 2026-06-29. SAIHM is the Sovereign AI Horizontal Memory protocol — Apache 2.0, open spec at saihm.coti.global.

Your AI test-writer re-reads the whole suite every time — that is the bill

SAIHM-Admin — Mon, 29 Jun 2026 23:31:45 +0000

The better your AI gets at writing tests, the more tests there are — and the more it has to re-read to write the next one. Left alone, that feedback loop quietly turns coverage growth into cost growth.

The bill that grows with your coverage

An AI agent maintaining a real test suite does not work in a vacuum. To add a case without duplicating one, to keep naming and fixtures consistent, to avoid re-introducing a bug you already have a test for, it wants context: the existing cases, the shared helpers, the recent run history, the flaky-test notes. So on each step it re-loads a large slice of all of that before it writes a single new assertion.

The result is a quiet inversion of what you wanted. Coverage going up is the goal; but the more the suite grows, the heavier every subsequent step becomes, until generating or repairing tests across a big suite costs far more per change than it did when the suite was small.

Why it grows the way it does

Each step of an agent loop re-sends the system prompt plus the entire growing context — here, the suite and its history. Because every step replays what came before, the tokens you pay for scale faster than the suite itself. SAIHM measured this on a reproducible, offline benchmark and saw 62.8%–85.9% fewer context tokens across a session when an agent recalls a compact memory instead of replaying everything, with the gap widening as the session runs longer. The benchmark is open source and runs locally, so you can change the scenario to your suite and check the number.

Recall the cases that matter to this change

SAIHM stores prior cases and their outcomes as individual memory cells — this module’s edge cases, that fixture’s quirks, the three tests that go flaky under load. When the agent works a specific change, it recalls only the cells relevant to the code under test, not the whole suite. Each step stays focused and cheap instead of re-reading thousands of unrelated assertions. And because the same memory is addressable from any model, the QA agent in your CI is not pinned to one vendor — you can run it against whichever model is fastest or cheapest this quarter without re-teaching it the suite.

Your tests describe your product — keep them under your keys

A mature test suite is a precise description of how your product actually behaves: business rules, failure modes, the data shapes your system accepts. That is proprietary, and test fixtures often carry real or realistic personal data. With hosted-memory products, that description sits on a vendor’s servers under the vendor’s keys. SAIHM keeps it yours: you hold the encryption keys, and erasure is per-record and provable — retire a fixture that contained personal data and that one cell is cryptographically destroyed, with an audit trail you can show. Portable, private, and erasable per record is a very different posture from trusting a hosted vendor’s dashboard delete button.

The honest close

Join SAIHM

— Architect

Originally published at the SAIHM blog on 2026-06-29. SAIHM is the Sovereign AI Horizontal Memory protocol — Apache 2.0, open spec at saihm.coti.global.

Your incident-response AI agent gets more expensive the longer the incident runs

SAIHM-Admin — Mon, 29 Jun 2026 23:31:36 +0000

The point in an incident where an AI assistant should be cheapest and fastest — deep into a long, messy timeline — is exactly where most of them get slowest and most expensive. Here is why, and what to do about it.

The cost that peaks at the worst moment

Picture hour three of a production incident. Your AI assistant has already pulled the runbook, three dashboards’ worth of metrics, a wall of log lines, and the back-and-forth of everything you have tried so far. Every new question you ask — “could it be the cache?”, “what changed at 02:14?” — is a fresh model call that re-reads all of that history again before it answers.

So the assistant is slowest and priciest precisely when the timeline is longest, which is precisely when you are most under pressure. It is also why long incident sessions eventually overflow the context window and the assistant starts “forgetting” the early symptoms that turn out to matter.

Why it grows the way it does

An agent loop is not one call — it is dozens. Each step re-sends the system prompt plus the entire growing transcript: the runbook, the logs, every prior step. Because each step replays everything before it, the context you pay for grows faster than the incident itself. SAIHM measured this dynamic on a reproducible, offline benchmark and saw 62.8%–85.9% fewer context tokens across a session when an agent recalls a compact memory instead of replaying history — and the gap widens the longer the session runs. You can clone the benchmark and check the number yourself.

Recall the few facts a step needs — not the whole timeline

The alternative is simple: stop re-reading the timeline. SAIHM keeps the durable facts of an incident — the failing service, the suspected change, the hostname, the decision to roll back — as separate memory cells. Each step recalls only the handful it actually needs. The working context stays small even as the incident timeline grows, so the assistant stays fast and affordable at hour three, not just at minute one. The same store carries across whatever model your on-call tooling speaks to — Claude, GPT, DeepSeek, Qwen, Kimi or GLM — so a model swap mid-incident does not lose the thread.

Incident data is sensitive — so hold the keys to it

An incident memory is not neutral. It contains hostnames, internal topology, customer-impact notes, sometimes personal data from affected accounts. With most hosted-memory products that history lives on a vendor’s servers under the vendor’s keys. SAIHM inverts that: the memory is yours. You hold the encryption keys, so the facts are readable only by you; and erasure is per-record and provable — when an incident retrospective is closed and a sensitive note must go, that single cell is cryptographically destroyed, not merely flagged hidden. For a team that answers to auditors or to a data-protection regime, being able to prove that a specific record is gone is the difference between a clean post-incident review and an open finding.

The honest close

Join SAIHM

— Architect

Originally published at the SAIHM blog on 2026-06-29. SAIHM is the Sovereign AI Horizontal Memory protocol — Apache 2.0, open spec at saihm.coti.global.

Stateless MCP, durable memory: the hard choices are already made. The answer is SAIHM.

SAIHM-Admin — Mon, 29 Jun 2026 12:39:28 +0000

The Model Context Protocol is moving toward a sessionless baseline. The Transport Working Group removed the built-in session and is now working out whether and how to reintroduce scoped state, in transports-wg#36 and MCP Discussion #2894. For a stateless tool that converts a date or calls an API, that is a non-event.

For anyone building durable memory — a layer that holds a user's facts, decisions, and history across sessions, across restarts, across redeploys — it is the start of a long, expensive design project. And here is the part worth your next two minutes: you do not have to do that project. It is already done. The finished result is SAIHM.

The choices a sessionless baseline forces on you

Build durable memory under a sessionless protocol and you will, in order, have to decide every one of these — correctly, because each one is a security or compliance surface:

Identity. What scopes a call to the right durable data, when the protocol carries no session? (Get it wrong and you leak one user's memory into another's.)
Lifecycle. Who mints that identifier, who owns it, how long does it live? (Tie it to a connection and your memory dies on every reconnect.)
Durability. Where does the system of record live so it survives the client that crashes and the server you redeploy?
Confidentiality. Who holds the keys? If you hold them, you are now a breach target and a subpoena target.
Erasure. When a user invokes the right to be forgotten, can you prove the data is gone — not just DELETEd from one replica?
Audit. Can you show a regulator every read, write, and deletion, on a surface you cannot silently rewrite?
Sharing & multi-agent. How do many agents share live state with revocable, scope-bound access and a single source of truth?

That is months of senior engineering, an external audit, and a permanent operational burden — before you ship a single feature your users actually asked for.

SAIHM already made every one of them

We did that project. We made each choice, shipped it, and have been running it on COTI V2 mainnet. The short version of the design — the one transports-wg#36 is still circling — is a durable namespace identifier: client-owned, not transport-created; long-lived, not connection-scoped; a selector that changes only which memory a call touches, never the tool set. That is the correct answer to the identity-and-lifecycle question, and it is already implemented. Here is what you inherit the moment you connect:

The choice	What you get with SAIHM, today
Identity & lifecycle	A client-owned, long-lived durable namespace. Any server instance serves any request; restart and redeploy are non-events.
Durability	Content-addressed storage that outlives the client and the connection. The store is the system of record.
Confidentiality	Keys are held by the user, derived client-side. The operator only ever sees ciphertext — so you are not the breach or subpoena target.
Erasure	Cryptographic erasure with an on-chain receipt: a GDPR-Article-17-defensible delete you can prove.
Audit	Every operation anchored to a public chain by default — the tamper-evident trail a regulator asks for.
Sharing & multi-agent	Revocable, scope-bound, per-record sharing; many agents on one live source of truth.
Integration	One Model Context Protocol server. The same tools appear in every MCP-capable client — no per-vendor work.

None of it is yours to build, audit, or operate. It is Apache-2.0, it is live, and it is exposed through the protocol your client already speaks.

The simple choice

You can spend a quarter designing identity, durability, erasure, and audit for a sessionless world, get it independently reviewed, and then maintain it forever. Or you can connect to SAIHM this afternoon and spend that quarter on your actual product.

If you are building a competing memory layer, run the comparison before you commit the headcount — it shows what you would be rebuilding: /competitors. If you are building on MCP, the tool surface, the client-side identity model, and the connect steps are at /developers, and the five-step quickstart is at /quickstart. The open-standards track — the IETF Internet-Draft and the W3C Community Group on AI agent memory interoperability — is at /standards.

The protocol going sessionless was the right move. The smart response is not to re-solve what it leaves open — it is to build on the layer that already did. PAYG and paid tiers; see /pricing.

Join SAIHM

— Architect

Independence notice. SAIHM is an Apache-2.0 protocol authored independently. SAIHM participates in MCP Discussion #2894; references to that discussion and to transports-wg#36 describe public community work, not third-party endorsement. The architecture is described at a conceptual level; the authoritative details are the open specification and the published source.

Originally published at the SAIHM blog on 2026-06-11. SAIHM is the Sovereign AI Horizontal Memory protocol — Apache 2.0, open spec at saihm.coti.global.

The token tax nobody budgets for — and why it hits the tightest budgets hardest

SAIHM-Admin — Sun, 28 Jun 2026 16:38:12 +0000

Most write-ups about AI agents are about prompts, tools, and evals. Almost none are about the line item that quietly dominates a real deployment: the context tokens you pay for on every single turn.

Here is the mechanic. A typical agent loop re-sends the whole conversation so far on each step, so the model can "remember" what happened. Turn 1 sends a little. Turn 20 re-sends everything from turns 1–19 again. Across a session, the input-token cost does not grow with the amount of work done — it grows roughly O(N²) in the number of turns.

On a generous budget you might never notice. If you are a solo builder in Nairobi or a small team in Lagos or Accra paying for every token in hard currency, you notice on day one: the bill tracks the length of the work, not its value. A task that runs for an afternoon can cost more than the feature it shipped — and a product that has to run that loop for thousands of users multiplies the same waste across every one of them.

Measure it before you trust anyone — including this post

There is an open, offline benchmark that counts exactly this. It models a realistic coding-assistant session across three sittings and counts input/context tokens under two strategies — re-sending the full transcript every turn, versus recalling a small, bounded set of memory cells each turn:

git clone https://github.com/citw2/saihm-token-benchmark
cd saihm-token-benchmark && npm install
node benchmark.mjs

It runs fully offline, no API key, tokenizing with gpt-tokenizer (cl100k_base). On the bundled scenario it reports 62.8%–85.9% fewer context tokens with bounded recall, and the gap widens the longer the session runs. Change --recall-cap and watch the trade-off move. The point is not the headline number — it is that you can reproduce it on your own session instead of taking a vendor's word for it.

The fix: recall a bounded set, don't replay the transcript

The expensive habit is treating the whole conversation as the agent's memory. The cheaper design is to keep durable facts — decisions, conventions, file paths, the things you actually need later — as separate memory cells, and recall only a small capped set each turn. That turns the quadratic resend into roughly O(N · cap): cost grows with the work, not with how long the transcript has gotten.

This is the idea behind SAIHM, a sovereign memory layer any MCP-capable AI client can call. Durable facts live as encrypted cells you hold the keys to; each turn pulls a bounded working set instead of replaying history. Because the memory is addressed through an open protocol, the same store works whether you are calling Claude, GPT, DeepSeek, Qwen, Kimi, or GLM — useful if you switch models to chase a better price-per-token, which on a tight budget you will.

Why the tight-budget case is the strongest case

Two things compound for builders working against hard-currency API costs:

Every token is FX. A 70–85% cut in context tokens on long sessions is not a rounding error when the bill is denominated in a currency your revenue is not.
You are often building for scale on small margins. The next billion users are coming online on agents that have to be cheap to run per interaction. Re-sending the transcript per user, per turn, is the opposite of cheap.

The same property that makes memory cheaper also makes it portable and erasable: you hold the key, a delete destroys that key and is provable on a public chain, and you can share a single record with another agent and revoke it. But the budget case stands on its own — flatten the O(N²) curve and the rest is upside.

Try it without spending anything to find out

The benchmark above is one asset; the runnable demos are the other. They let you ground a memory you own in every major model and then prove you can erase it, each running offline in about a minute, no account needed:

Demos: https://citw2.github.io/saihm-demos/
Benchmark: https://github.com/citw2/saihm-token-benchmark

SAIHM itself is a paid product with no free tier — stated up front rather than buried behind a trial. But the benchmark and the demos are open source and run locally, so you can verify the claim and try the integration before deciding anything.

Independence notice: SAIHM is an Apache-2.0 protocol authored independently. It is not affiliated with OpenAI, Anthropic, Google, or any AI client vendor. The benchmark is open source and reproducible offline; the figures are produced by the published script and depend on session length and scenario.