DEV Community

Cover image for Facts Without Sources Are Just Guesswork - Evidence-Linked Memory for Hermes Agent
Daniel LaForce
Daniel LaForce

Posted on

Facts Without Sources Are Just Guesswork - Evidence-Linked Memory for Hermes Agent

Hermes Agent Challenge Submission: Build With Hermes Agent

Memory With Receipts: Evidence-Linked Recall for AI Agents

Most agent memory demos stop at one question:

Can the agent remember this later?

That's useful, but it's not the bar I care about.

For long-running agent sessions, memory without provenance is hard to trust. If an agent says "we decided X," I want to know where that came from: which transcript, which event, which source record. The raw transcript should stay authoritative. The memory layer should make important facts easier to find and reuse, not replace the record.

That's the idea behind ArgoBrain-Memory.

ArgoBrain-Memory is a small, portable memory core for AI agent sessions. It extracts typed facts from transcripts and gives each fact an evidence pointer back to the source event that produced it.

The pitch is simple:

Every useful memory should carry its receipt.

The problem

A basic memory store might keep something like this:

Memory: "use SQLite over PostgreSQL"
Enter fullscreen mode Exit fullscreen mode

That might be useful, but it is not auditable. You cannot tell where it came from, whether the wording changed, or whether it is still true.

ArgoBrain-Memory keeps the raw transcript as the source of truth and writes memory as derived artifacts:

Fact:      "use SQLite over PostgreSQL"
Type:      decision
Evidence:  session-2026-05-15.jsonl#event=42
Status:    active
Enter fullscreen mode Exit fullscreen mode

Now there is a trail. You can go back to the original event and inspect the conversation that produced the memory.

That is the difference between "the agent says so" and "the agent can show its work."

What it extracts

The current v0.1.0 release extracts five practical fact types from session transcripts:

Type Captures
decision Choices made during the session
blocker Unresolved problems
preference User or workflow preferences
path Files, directories, and artifact locations
command Commands that were run or should be rerun

The extractor is intentionally simple right now: deterministic, regex-based, and dependency-free. That makes it easy to test locally before adding heavier model-assisted extraction later.

The demo fixture

The repo includes a synthetic fixture benchmark. It does not require OpenAI, Ollama, Hermes, or any API key.

git clone https://github.com/KeyArgo/argobrain-memory.git
cd argobrain-memory
PYTHONPATH=src python3 -m argobrain_memory.cli benchmark \
  --fixture fixtures/demo-session \
  --mode all \
  --output /tmp/argobrain-memory-demo
Enter fullscreen mode Exit fullscreen mode

Current result on the included fixture:

Mode       Retention   Matched
off           0.0%     0/5
baseline     100.0%    5/5
cortex       100.0%    5/5
external     100.0%    5/5
Enter fullscreen mode Exit fullscreen mode

That is a tiny fixture, not a broad benchmark suite. The important part is not "5/5 means solved." It does not.

The useful part is the artifact model.

In cortex mode, ArgoBrain-Memory writes a memory root like this:

memory-root/
  active-derived-summary.md
  facts.jsonl
  recall.sqlite
  episodes/
    <episode-id>.json
Enter fullscreen mode Exit fullscreen mode

facts.jsonl is the typed fact ledger. episodes/ preserves compaction episodes. recall.sqlite is a rebuildable FTS5 search index. The transcript remains authoritative.

How this fits Hermes Agent

Hermes Agent work often involves long sessions, compression, continuation, and handoff between agents. That is exactly where silent memory drift hurts.

ArgoBrain-Memory is meant to be the portable memory artifact layer for that world:

  • ingest a Hermes-style session transcript
  • extract typed facts before or after compression
  • preserve evidence pointers back to source events
  • build a derived summary for quick continuation
  • build a local SQLite FTS5 recall index
  • compare memory modes with replay fixtures

The live Hermes MemoryProvider wrapper is not done yet. This release is the clean core: portable artifacts, deterministic extraction, tests, and a path toward live integration.

I would rather ship the small auditable layer than pretend the whole memory stack is finished.

Code

GitHub: https://github.com/KeyArgo/argobrain-memory

ArgoBrain-Memory v0.1.0 is MIT licensed, Python 3.10+, and has zero runtime dependencies.

What's next

  • live Hermes MemoryProvider wrapper
  • stronger contradiction and supersession handling
  • confidence evolution based on repeated evidence
  • richer extraction beyond deterministic heuristics
  • fresh-clone release gate before PyPI publication

The ask

If you are building agents with memory, try the benchmark against your own transcript model.

For every memory the agent uses, can you trace it back to the exact source event?

If the answer is no, that is the gap ArgoBrain-Memory is trying to close.

Memory is useful.

Memory with receipts is something you can actually debug.

Top comments (0)