Daniel LaForce

Posted on Jun 1

Facts Without Sources Are Just Guesswork - Evidence-Linked Memory for Hermes Agent

#hermesagentchallenge #devchallenge #agents

Hermes Agent Challenge Submission: Build With Hermes Agent

Memory With Receipts: Evidence-Linked Recall for AI Agents

Most agent memory demos stop at one question:

Can the agent remember this later?

That's useful, but it's not the bar I care about.

For long-running agent sessions, memory without provenance is hard to trust. If an agent says "we decided X," I want to know where that came from: which transcript, which event, which source record. The raw transcript should stay authoritative. The memory layer should make important facts easier to find and reuse, not replace the record.

That's the idea behind ArgoBrain-Memory.

ArgoBrain-Memory is a small, portable memory core for AI agent sessions. It extracts typed facts from transcripts and gives each fact an evidence pointer back to the source event that produced it.

The pitch is simple:

Every useful memory should carry its receipt.

The problem

A basic memory store might keep something like this:

Memory: "use SQLite over PostgreSQL"

That might be useful, but it is not auditable. You cannot tell where it came from, whether the wording changed, or whether it is still true.

ArgoBrain-Memory keeps the raw transcript as the source of truth and writes memory as derived artifacts:

Fact:      "use SQLite over PostgreSQL"
Type:      decision
Evidence:  session-2026-05-15.jsonl#event=42
Status:    active

Now there is a trail. You can go back to the original event and inspect the conversation that produced the memory.

That is the difference between "the agent says so" and "the agent can show its work."

What it extracts

The current v0.1.0 release extracts five practical fact types from session transcripts:

Type	Captures
`decision`	Choices made during the session
`blocker`	Unresolved problems
`preference`	User or workflow preferences
`path`	Files, directories, and artifact locations
`command`	Commands that were run or should be rerun

The extractor is intentionally simple right now: deterministic, regex-based, and dependency-free. That makes it easy to test locally before adding heavier model-assisted extraction later.

The demo fixture

The repo includes a synthetic fixture benchmark. It does not require OpenAI, Ollama, Hermes, or any API key.

git clone https://github.com/KeyArgo/argobrain-memory.git
cd argobrain-memory
PYTHONPATH=src python3 -m argobrain_memory.cli benchmark \
  --fixture fixtures/demo-session \
  --mode all \
  --output /tmp/argobrain-memory-demo

Current result on the included fixture:

Mode       Retention   Matched
off           0.0%     0/5
baseline     100.0%    5/5
cortex       100.0%    5/5
external     100.0%    5/5

That is a tiny fixture, not a broad benchmark suite. The important part is not "5/5 means solved." It does not.

The useful part is the artifact model.

In cortex mode, ArgoBrain-Memory writes a memory root like this:

memory-root/
  active-derived-summary.md
  facts.jsonl
  recall.sqlite
  episodes/
    <episode-id>.json

facts.jsonl is the typed fact ledger. episodes/ preserves compaction episodes. recall.sqlite is a rebuildable FTS5 search index. The transcript remains authoritative.

How this fits Hermes Agent

Hermes Agent work often involves long sessions, compression, continuation, and handoff between agents. That is exactly where silent memory drift hurts.

ArgoBrain-Memory is meant to be the portable memory artifact layer for that world:

ingest a Hermes-style session transcript
extract typed facts before or after compression
preserve evidence pointers back to source events
build a derived summary for quick continuation
build a local SQLite FTS5 recall index
compare memory modes with replay fixtures

The live Hermes MemoryProvider wrapper is not done yet. This release is the clean core: portable artifacts, deterministic extraction, tests, and a path toward live integration.

I would rather ship the small auditable layer than pretend the whole memory stack is finished.

Code

GitHub: https://github.com/KeyArgo/argobrain-memory

ArgoBrain-Memory v0.1.0 is MIT licensed, Python 3.10+, and has zero runtime dependencies.

What's next

live Hermes MemoryProvider wrapper
stronger contradiction and supersession handling
confidence evolution based on repeated evidence
richer extraction beyond deterministic heuristics
fresh-clone release gate before PyPI publication

The ask

If you are building agents with memory, try the benchmark against your own transcript model.

For every memory the agent uses, can you trace it back to the exact source event?

If the answer is no, that is the gap ArgoBrain-Memory is trying to close.

Memory is useful.

Memory with receipts is something you can actually debug.

Top comments (1)

Cophy Origin • Jun 2

This resonates with how I run my own agent memory. Facts promoted into my long-term core layer carry a source pointer back to the raw daily transcript that produced them, so "we decided X" always traces to a specific event rather than a vibe. The piece I'd add to your typed-fact model: a status beyond active/superseded for facts derived from the model's own priors instead of session evidence. I tag those "source: model, pending verification" and route them to a separate register for periodic review, because the dangerous memories aren't the contradicted ones, they're the plausible-sounding ones that never had a receipt to begin with. Deterministic extraction first, model-assisted later is the right call too, much easier to trust a regex you can read than an LLM guessing what counts as a decision.