Your security AI agent carries the whole case history into every alert — that's the bill

#aiagentmemory #cybersecurity #securityoperations #threatdetection

The agent that helps triage alerts feels cheap on a quiet morning and expensive deep into a noisy day. The reason is the same one that makes it lose the thread on a long shift — and it is fixable.

Why the hundredth alert costs more than the first

When a security agent triages an alert, each step is a fresh call to the model, and to reason it carries the context with it: past investigations, detection rules, threat notes, the indicators it has already seen. Early in a shift that is light. With a backlog of cases behind it, every new alert re-sends all of that. So the cost of triaging one alert climbs with the size of the backlog, not the severity of the alert in front of you.

It is also why a long shift loses the thread: once the case history outgrows the context window, the agent quietly drops what it learned about an earlier, related alert — exactly the link an analyst needed it to keep.

The dynamic, measured

Carrying the whole case file every turn makes total context spend grow far faster than the work itself. SAIHM measured it on a reproducible, offline benchmark and saw 62.8%–85.9% fewer context tokens across a session when the agent recalls a compact memory instead of replaying the full history, with the gap widening the longer the session runs. The benchmark is open source and runs locally, so you can model your own alert volume and see where the curve lands for your team.

Recall only what an alert touches

The fix is to stop carrying the whole case file. SAIHM keeps the durable facts — confirmed indicators, prior findings, the rules that fired — as separate memory cells, and each triage recalls only the few an alert actually touches instead of replaying the backlog. So triaging the hundredth alert of the day costs about what the first did, and the link to a related case three hours ago is still there because the memory persists between sessions. Because the store is addressable from any model — Claude, GPT, DeepSeek, Qwen, Kimi, GLM — and through LangChain or LlamaIndex, you can change the model behind the agent without re-teaching it your environment.

This memory is sensitive — so hold the keys

A security agent’s memory is some of the most sensitive data in the building: confirmed indicators, internal hostnames, the shape of your detections, what was caught and what was not. For a team whose job is to assume breach, that cannot sit on a vendor’s servers under a vendor’s keys. SAIHM keeps it yours: the memory is encrypted under keys you control, so the operator cannot read what it cannot decrypt, and erasure is per-record and provable. When an investigation closes or a record must be purged, its cells are cryptographically destroyed with an audit trail you can hand to an assessor — not flagged deleted in a store you simply have to trust.

The honest close

SAIHM is a paid product, with no free tier — that is stated up front rather than buried behind a trial. But the benchmark and all nine integration demos are open source and run locally, so you can verify the savings and try the connect path before deciding anything. The tool surface and setup steps are at /developers; pricing is at /pricing.

Join SAIHM

— Architect

Independence notice. SAIHM is an Apache-2.0 protocol authored independently. The benchmark referenced here is open source and reproducible offline; the figures are produced by the published script and depend on session length and scenario. The architecture is described at a conceptual level; the authoritative details are the open specification and the published source.

Originally published at the SAIHM blog on 2026-06-30. SAIHM is the Sovereign AI Horizontal Memory protocol — Apache 2.0, open spec at saihm.coti.global.