Agent memory and context that never leaves your machine

Thomas Connally — Tue, 30 Jun 2026 21:54:31 +0000

Most "agent memory" and "agent context" tools today require sending your data to someone else's cloud. If you operate in a regulated, air-gapped, or simply privacy-conscious environment, that rules them out before you've even tried them. I build the opposite: two MIT-licensed, local-first MCP servers that do this work entirely on your own hardware.

The problem

Agent memory and context assembly are converging on a cloud-only default. That's a non-starter for defense, healthcare, finance, legal, and any team that can't or won't let agent context leave their VPC. It's also just slower and less deterministic than it needs to be: agents re-discover the same facts about your repo and services every session, burning tokens and turns before doing any real work.

Mimir: persistent memory, fully offline

Mimir is a single ~8MB Rust binary. It encrypts everything at rest with AES-256-GCM, and it works with no API key, no model download, and no network access at all, because the embeddings used for dense search are bundled directly into the binary. It's bi-temporal: every fact carries a validity window, so you can query memory "as of" any past point and supersede facts without deleting history. 43 MCP tools, SQLite + FTS5 hybrid search under the hood.

One honest tradeoff worth naming: the FTS5 index needed for fast keyword search currently sits over plaintext, even though the underlying record is encrypted at rest. We're upfront about this in the docs rather than overstating the encryption story.

Perseus: compile-before-context

Perseus takes a different approach to context than runtime tool-call discovery. Instead of letting an agent rediscover your git state, running services, and test status through a chain of tool calls every session, it compiles all of that into a ready briefing the moment a session starts. The result is deterministic and byte-stable: the same repo state always produces the same compiled context.

Honest, reproducible benchmarks

On paraphrased queries, Mimir's bundled offline embeddings hit 91.7% recall@1 and 100% recall@5, versus 4.2% recall@1 for naive keyword search. Perseus holds full answer coverage at a fixed, deterministic context size where tuned RAG baselines start dropping facts at the same budget. Both benchmark harnesses are offline and re-runnable: run them yourself rather than taking our word for it.

Try it in two minutes

# Mimir (memory)
docker pull ghcr.io/perseus-computing-llc/mimir:2.7.0

# Perseus (context)
pip install perseus-ctx

Both work with any MCP client: Claude Code, Cursor, Cline, or a custom agent. Both are listed on the official MCP Registry.

If your team is building agents somewhere cloud-only memory is a non-starter, we take on a small number of integration pilots: we deploy Mimir and Perseus into your environment and prove recall quality on your own data in 2 to 4 weeks.

Mimir on GitHub: https://github.com/Perseus-Computing-LLC/mimir
Perseus on GitHub: https://github.com/Perseus-Computing-LLC/perseus

The Hidden Tax of AI-Assisted Development (And How I Fixed It)

Thomas Connally — Sun, 24 May 2026 19:59:45 +0000

Every AI coding session starts the same way. You open your editor, the assistant says hello, and you spend the first five minutes orienting it.

"What branch am I on?"

"What services are running?"

"Where did we leave off last session?"

"Is the test suite green?"

It's a tax you pay on every session. Multiply that by days, weeks, a whole team — it adds up to a real cost in both time and attention. And tokens, if you're paying by the token.

The Industry's Answer: Runtime Tool Calls

The standard solution is to let the assistant figure it out at runtime. MCP servers, function calling, Claude Code hooks — the assistant asks "what's running?" mid-conversation, and something answers. Repeat for every fact it needs.

This works. It's also one round-trip per fact. 50 facts = 50 round-trips. If you're paying for Claude Opus or GPT-5.5 by the token, every one of those orientation questions burns tokens. Quickly.

A Different Bet: Resolve Before They Read

I built Perseus to go the other direction. Instead of the assistant discovering facts at runtime, you resolve them at render time — before the assistant ever reads them.

You write a context file with directives:

@perseus v0.8

# Current State
@query "git status --short"
@query "git log --oneline -5"

# Services
@services

# Last Session
@waypoint ttl=86400

# Ports
@read .env key="API_PORT" fallback="3001"

Perseus runs those directives, resolves them to live values, and outputs a plain markdown document. Your assistant reads facts, not instructions to go find facts.

Without Perseus                     With Perseus
────────────────────────────────    ──────────────────────────────
"Port is 3001 (check .env)"    →   Port: 3001
"47 tests (may be stale)"      →   Tests: 597 passing (run 8s ago)
"Check docker ps first"        →   mongo-dev: Up 4h 12m
"Where did we leave off?"      →   Checkpoint: webhook done, pending test run

The Speed Story

The delta is structural, not incremental:

1 directive via runtime tool call: ~50ms (one round-trip)
10,000 directives via Perseus: 0.36 seconds (total, rendered once)
That's ~23,000× faster for large directive counts

With caching (@cache ttl=300), the warm path resolves 500 directives in 0.28 seconds — 40× faster than cold. For a typical project context file (20-50 directives), Perseus finishes before you notice it ran.

Multi-Agent: The Swarm Demo

Perseus has a coordination layer called Agora. Multiple agents can write to the same task board simultaneously using filesystem-based atomic locks.

To stress-test this, I ran a 120-agent swarm — all 120 agents writing to the same task board, 150 concurrent writes. Result: 9.7 seconds, zero collisions.

No server. No database. Just @agora and @inbox directives resolved to plain markdown.

What Ships

20 directives — @query, @services, @waypoint, @agora, @inbox, @memory, @read, @env, @skills, @session, @date, @health, @agent, @tree, @list, @include, @if/@else/@endif, @constraint, @validate, @cache
Assistant-agnostic — outputs plain markdown. Works with Claude Code, Cursor, Codex, Rovo Dev, and anything else that reads a file
CLAUDE.md / AGENTS.md targets — perseus render --format agents-md outputs AGENTS.md every tool already reads
MCP server — 13 tools for any MCP-compatible assistant: perseus mcp serve
Single file, one dependency — perseus.py (~12,000 lines) + pyyaml
Nearly 600 tests, MIT license

Why Not Just Use AGENTS.md?

AGENTS.md is your project's bio. Perseus is your project's heartbeat. One is static text you write once. The other resolves live state every time you render it.

They compose. Perseus can render to AGENTS.md — keep your static instructions, add live state, one file your assistant already reads.

Why Not Just Use MCP?

MCP is runtime. One fact per tool call. Perseus is compile-time — N facts in one file. They compose too: Perseus has its own MCP server that exposes 13 directive tools for assistants that prefer the runtime model.

The right question isn't "MCP or Perseus?" — it's "which facts should arrive before the assistant speaks, and which should it discover on demand?" Perseus handles the first category. MCP handles the second.

Quick Start

pip install perseus-ctx
perseus init                     # scaffold .perseus/context.md
perseus render --format agents-md  # your first live briefing

For Claude Code users:

perseus install --target claude-code  # auto-inject context at session start

Then set up a cron job to re-render every 5 minutes — your assistants always start briefed.

Bottom Line

I built Perseus because I was tired of every AI session starting with "what branch am I on? what's running? where were we?" The assistant should know before it says hello.

If you've felt the same frustration, give it a try. It's MIT licensed, one dependency, and takes 30 seconds to set up. If it saves you even one orientation exchange per session, it's paid for itself.

github.com/tcconnally/perseus | perseus.observer

What's your cold-start routine? Do you use AGENTS.md, Claude hooks, or just re-explain every session? I'm curious how others are solving this.

DEV Community: Thomas Connally