The better your AI gets at writing tests, the more tests there are — and the more it has to re-read to write the next one. Left alone, that feedback loop quietly turns coverage growth into cost growth.
The bill that grows with your coverage
An AI agent maintaining a real test suite does not work in a vacuum. To add a case without duplicating one, to keep naming and fixtures consistent, to avoid re-introducing a bug you already have a test for, it wants context: the existing cases, the shared helpers, the recent run history, the flaky-test notes. So on each step it re-loads a large slice of all of that before it writes a single new assertion.
The result is a quiet inversion of what you wanted. Coverage going up is the goal; but the more the suite grows, the heavier every subsequent step becomes, until generating or repairing tests across a big suite costs far more per change than it did when the suite was small.
Why it grows the way it does
Each step of an agent loop re-sends the system prompt plus the entire growing context — here, the suite and its history. Because every step replays what came before, the tokens you pay for scale faster than the suite itself. SAIHM measured this on a reproducible, offline benchmark and saw 62.8%–85.9% fewer context tokens across a session when an agent recalls a compact memory instead of replaying everything, with the gap widening as the session runs longer. The benchmark is open source and runs locally, so you can change the scenario to your suite and check the number.
Recall the cases that matter to this change
SAIHM stores prior cases and their outcomes as individual memory cells — this module’s edge cases, that fixture’s quirks, the three tests that go flaky under load. When the agent works a specific change, it recalls only the cells relevant to the code under test, not the whole suite. Each step stays focused and cheap instead of re-reading thousands of unrelated assertions. And because the same memory is addressable from any model, the QA agent in your CI is not pinned to one vendor — you can run it against whichever model is fastest or cheapest this quarter without re-teaching it the suite.
Your tests describe your product — keep them under your keys
A mature test suite is a precise description of how your product actually behaves: business rules, failure modes, the data shapes your system accepts. That is proprietary, and test fixtures often carry real or realistic personal data. With hosted-memory products, that description sits on a vendor’s servers under the vendor’s keys. SAIHM keeps it yours: you hold the encryption keys, and erasure is per-record and provable — retire a fixture that contained personal data and that one cell is cryptographically destroyed, with an audit trail you can show. Portable, private, and erasable per record is a very different posture from trusting a hosted vendor’s dashboard delete button.
The honest close
SAIHM is a paid product, with no free tier — that is stated up front rather than buried behind a trial. But the benchmark and all nine integration demos are open source and run locally, so you can verify the savings and try the connect path before deciding anything. The tool surface and setup steps are at /developers; pricing is at /pricing.
— Architect
Independence notice. SAIHM is an Apache-2.0 protocol authored independently. The benchmark referenced here is open source and reproducible offline; the figures are produced by the published script and depend on session length and scenario. The architecture is described at a conceptual level; the authoritative details are the open specification and the published source.
Originally published at the SAIHM blog on 2026-06-29. SAIHM is the Sovereign AI Horizontal Memory protocol — Apache 2.0, open spec at saihm.coti.global.
Top comments (0)