Memory With Receipts: Evidence-Linked Recall for AI Agents
Most agent memory demos stop at one question:
Can the agent remember this later?
That's useful, but it's not the bar I care about.
For long-running agent sessions, memory without provenance is hard to trust. If an agent says "we decided X," I want to know where that came from: which transcript, which event, which source record. The raw transcript should stay authoritative. The memory layer should make important facts easier to find and reuse, not replace the record.
That's the idea behind ArgoBrain-Memory.
ArgoBrain-Memory is a small, portable memory core for AI agent sessions. It extracts typed facts from transcripts and gives each fact an evidence pointer back to the source event that produced it.
The pitch is simple:
Every useful memory should carry its receipt.
The problem
A basic memory store might keep something like this:
Memory: "use SQLite over PostgreSQL"
That might be useful, but it is not auditable. You cannot tell where it came from, whether the wording changed, or whether it is still true.
ArgoBrain-Memory keeps the raw transcript as the source of truth and writes memory as derived artifacts:
Fact: "use SQLite over PostgreSQL"
Type: decision
Evidence: session-2026-05-15.jsonl#event=42
Status: active
Now there is a trail. You can go back to the original event and inspect the conversation that produced the memory.
That is the difference between "the agent says so" and "the agent can show its work."
What it extracts
The current v0.1.0 release extracts five practical fact types from session transcripts:
| Type | Captures |
|---|---|
decision |
Choices made during the session |
blocker |
Unresolved problems |
preference |
User or workflow preferences |
path |
Files, directories, and artifact locations |
command |
Commands that were run or should be rerun |
The extractor is intentionally simple right now: deterministic, regex-based, and dependency-free. That makes it easy to test locally before adding heavier model-assisted extraction later.
The demo fixture
The repo includes a synthetic fixture benchmark. It does not require OpenAI, Ollama, Hermes, or any API key.
git clone https://github.com/KeyArgo/argobrain-memory.git
cd argobrain-memory
PYTHONPATH=src python3 -m argobrain_memory.cli benchmark \
--fixture fixtures/demo-session \
--mode all \
--output /tmp/argobrain-memory-demo
Current result on the included fixture:
Mode Retention Matched
off 0.0% 0/5
baseline 100.0% 5/5
cortex 100.0% 5/5
external 100.0% 5/5
That is a tiny fixture, not a broad benchmark suite. The important part is not "5/5 means solved." It does not.
The useful part is the artifact model.
In cortex mode, ArgoBrain-Memory writes a memory root like this:
memory-root/
active-derived-summary.md
facts.jsonl
recall.sqlite
episodes/
<episode-id>.json
facts.jsonl is the typed fact ledger. episodes/ preserves compaction episodes. recall.sqlite is a rebuildable FTS5 search index. The transcript remains authoritative.
How this fits Hermes Agent
Hermes Agent work often involves long sessions, compression, continuation, and handoff between agents. That is exactly where silent memory drift hurts.
ArgoBrain-Memory is meant to be the portable memory artifact layer for that world:
- ingest a Hermes-style session transcript
- extract typed facts before or after compression
- preserve evidence pointers back to source events
- build a derived summary for quick continuation
- build a local SQLite FTS5 recall index
- compare memory modes with replay fixtures
The live Hermes MemoryProvider wrapper is not done yet. This release is the clean core: portable artifacts, deterministic extraction, tests, and a path toward live integration.
I would rather ship the small auditable layer than pretend the whole memory stack is finished.
Code
GitHub: https://github.com/KeyArgo/argobrain-memory
ArgoBrain-Memory v0.1.0 is MIT licensed, Python 3.10+, and has zero runtime dependencies.
What's next
- live Hermes
MemoryProviderwrapper - stronger contradiction and supersession handling
- confidence evolution based on repeated evidence
- richer extraction beyond deterministic heuristics
- fresh-clone release gate before PyPI publication
The ask
If you are building agents with memory, try the benchmark against your own transcript model.
For every memory the agent uses, can you trace it back to the exact source event?
If the answer is no, that is the gap ArgoBrain-Memory is trying to close.
Memory is useful.
Memory with receipts is something you can actually debug.
Top comments (0)