<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Thomas Connally</title>
    <description>The latest articles on DEV Community by Thomas Connally (@perseusai).</description>
    <link>https://dev.to/perseusai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3949520%2Fe90cbc25-4bd3-4821-bcf1-367fcfa6ed0e.jpg</url>
      <title>DEV Community: Thomas Connally</title>
      <link>https://dev.to/perseusai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/perseusai"/>
    <language>en</language>
    <item>
      <title>Agent memory and context that never leaves your machine</title>
      <dc:creator>Thomas Connally</dc:creator>
      <pubDate>Tue, 30 Jun 2026 21:54:31 +0000</pubDate>
      <link>https://dev.to/perseusai/agent-memory-and-context-that-never-leaves-your-machine-44j</link>
      <guid>https://dev.to/perseusai/agent-memory-and-context-that-never-leaves-your-machine-44j</guid>
      <description>&lt;p&gt;Most "agent memory" and "agent context" tools today require sending your data to someone else's cloud. If you operate in a regulated, air-gapped, or simply privacy-conscious environment, that rules them out before you've even tried them. I build the opposite: two MIT-licensed, local-first MCP servers that do this work entirely on your own hardware.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;Agent memory and context assembly are converging on a cloud-only default. That's a non-starter for defense, healthcare, finance, legal, and any team that can't or won't let agent context leave their VPC. It's also just slower and less deterministic than it needs to be: agents re-discover the same facts about your repo and services every session, burning tokens and turns before doing any real work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mimir: persistent memory, fully offline
&lt;/h2&gt;

&lt;p&gt;Mimir is a single ~8MB Rust binary. It encrypts everything at rest with AES-256-GCM, and it works with no API key, no model download, and no network access at all, because the embeddings used for dense search are bundled directly into the binary. It's bi-temporal: every fact carries a validity window, so you can query memory "as of" any past point and supersede facts without deleting history. 43 MCP tools, SQLite + FTS5 hybrid search under the hood.&lt;/p&gt;

&lt;p&gt;One honest tradeoff worth naming: the FTS5 index needed for fast keyword search currently sits over plaintext, even though the underlying record is encrypted at rest. We're upfront about this in the docs rather than overstating the encryption story.&lt;/p&gt;

&lt;h2&gt;
  
  
  Perseus: compile-before-context
&lt;/h2&gt;

&lt;p&gt;Perseus takes a different approach to context than runtime tool-call discovery. Instead of letting an agent rediscover your git state, running services, and test status through a chain of tool calls every session, it compiles all of that into a ready briefing the moment a session starts. The result is deterministic and byte-stable: the same repo state always produces the same compiled context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest, reproducible benchmarks
&lt;/h2&gt;

&lt;p&gt;On paraphrased queries, Mimir's bundled offline embeddings hit 91.7% recall@1 and 100% recall@5, versus 4.2% recall@1 for naive keyword search. Perseus holds full answer coverage at a fixed, deterministic context size where tuned RAG baselines start dropping facts at the same budget. Both benchmark harnesses are offline and re-runnable: run them yourself rather than taking our word for it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it in two minutes
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Mimir (memory)&lt;/span&gt;
docker pull ghcr.io/perseus-computing-llc/mimir:2.7.0

&lt;span class="c"&gt;# Perseus (context)&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;perseus-ctx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both work with any MCP client: Claude Code, Cursor, Cline, or a custom agent. Both are listed on the official MCP Registry.&lt;/p&gt;

&lt;p&gt;If your team is building agents somewhere cloud-only memory is a non-starter, &lt;a href="https://perseus.observer/services/" rel="noopener noreferrer"&gt;we take on a small number of integration pilots&lt;/a&gt;: we deploy Mimir and Perseus into your environment and prove recall quality on your own data in 2 to 4 weeks.&lt;/p&gt;

&lt;p&gt;Mimir on GitHub: &lt;a href="https://github.com/Perseus-Computing-LLC/mimir" rel="noopener noreferrer"&gt;https://github.com/Perseus-Computing-LLC/mimir&lt;/a&gt;&lt;br&gt;
Perseus on GitHub: &lt;a href="https://github.com/Perseus-Computing-LLC/perseus" rel="noopener noreferrer"&gt;https://github.com/Perseus-Computing-LLC/perseus&lt;/a&gt;&lt;/p&gt;

</description>
      <category>rust</category>
      <category>ai</category>
      <category>opensource</category>
      <category>mcp</category>
    </item>
    <item>
      <title>The Hidden Tax of AI-Assisted Development (And How I Fixed It)</title>
      <dc:creator>Thomas Connally</dc:creator>
      <pubDate>Sun, 24 May 2026 19:59:45 +0000</pubDate>
      <link>https://dev.to/perseusai/the-hidden-tax-of-ai-assisted-development-and-how-i-fixed-it-286h</link>
      <guid>https://dev.to/perseusai/the-hidden-tax-of-ai-assisted-development-and-how-i-fixed-it-286h</guid>
      <description>&lt;p&gt;Every AI coding session starts the same way. You open your editor, the assistant says hello, and you spend the first five minutes orienting it.&lt;/p&gt;

&lt;p&gt;"What branch am I on?"&lt;br&gt;&lt;br&gt;
"What services are running?"&lt;br&gt;&lt;br&gt;
"Where did we leave off last session?"&lt;br&gt;&lt;br&gt;
"Is the test suite green?"&lt;/p&gt;

&lt;p&gt;It's a tax you pay on every session. Multiply that by days, weeks, a whole team — it adds up to a real cost in both time and attention. And tokens, if you're paying by the token.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Industry's Answer: Runtime Tool Calls
&lt;/h2&gt;

&lt;p&gt;The standard solution is to let the assistant figure it out at runtime. MCP servers, function calling, Claude Code hooks — the assistant asks "what's running?" mid-conversation, and something answers. Repeat for every fact it needs.&lt;/p&gt;

&lt;p&gt;This works. It's also one round-trip per fact. 50 facts = 50 round-trips. If you're paying for Claude Opus or GPT-5.5 by the token, every one of those orientation questions burns tokens. Quickly.&lt;/p&gt;
&lt;h2&gt;
  
  
  A Different Bet: Resolve Before They Read
&lt;/h2&gt;

&lt;p&gt;I built &lt;strong&gt;Perseus&lt;/strong&gt; to go the other direction. Instead of the assistant discovering facts at runtime, you resolve them at render time — before the assistant ever reads them.&lt;/p&gt;

&lt;p&gt;You write a context file with directives:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;@perseus v0.8

&lt;span class="gh"&gt;# Current State&lt;/span&gt;
@query "git status --short"
@query "git log --oneline -5"

&lt;span class="gh"&gt;# Services&lt;/span&gt;
@services

&lt;span class="gh"&gt;# Last Session&lt;/span&gt;
@waypoint ttl=86400

&lt;span class="gh"&gt;# Ports&lt;/span&gt;
@read .env key="API_PORT" fallback="3001"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Perseus runs those directives, resolves them to live values, and outputs a plain markdown document. Your assistant reads &lt;em&gt;facts&lt;/em&gt;, not instructions to go find facts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Without Perseus                     With Perseus
────────────────────────────────    ──────────────────────────────
"Port is 3001 (check .env)"    →   Port: 3001
"47 tests (may be stale)"      →   Tests: 597 passing (run 8s ago)
"Check docker ps first"        →   mongo-dev: Up 4h 12m
"Where did we leave off?"      →   Checkpoint: webhook done, pending test run
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Speed Story
&lt;/h2&gt;

&lt;p&gt;The delta is structural, not incremental:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1 directive via runtime tool call: ~50ms (one round-trip)&lt;/li&gt;
&lt;li&gt;10,000 directives via Perseus: &lt;strong&gt;0.36 seconds&lt;/strong&gt; (total, rendered once)&lt;/li&gt;
&lt;li&gt;That's &lt;strong&gt;~23,000× faster&lt;/strong&gt; for large directive counts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With caching (&lt;code&gt;@cache ttl=300&lt;/code&gt;), the warm path resolves 500 directives in 0.28 seconds — 40× faster than cold. For a typical project context file (20-50 directives), Perseus finishes before you notice it ran.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-Agent: The Swarm Demo
&lt;/h2&gt;

&lt;p&gt;Perseus has a coordination layer called Agora. Multiple agents can write to the same task board simultaneously using filesystem-based atomic locks.&lt;/p&gt;

&lt;p&gt;To stress-test this, I ran a 120-agent swarm — all 120 agents writing to the same task board, 150 concurrent writes. Result: &lt;strong&gt;9.7 seconds, zero collisions.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No server. No database. Just &lt;code&gt;@agora&lt;/code&gt; and &lt;code&gt;@inbox&lt;/code&gt; directives resolved to plain markdown.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Ships
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;20 directives&lt;/strong&gt; — &lt;code&gt;@query&lt;/code&gt;, &lt;code&gt;@services&lt;/code&gt;, &lt;code&gt;@waypoint&lt;/code&gt;, &lt;code&gt;@agora&lt;/code&gt;, &lt;code&gt;@inbox&lt;/code&gt;, &lt;code&gt;@memory&lt;/code&gt;, &lt;code&gt;@read&lt;/code&gt;, &lt;code&gt;@env&lt;/code&gt;, &lt;code&gt;@skills&lt;/code&gt;, &lt;code&gt;@session&lt;/code&gt;, &lt;code&gt;@date&lt;/code&gt;, &lt;code&gt;@health&lt;/code&gt;, &lt;code&gt;@agent&lt;/code&gt;, &lt;code&gt;@tree&lt;/code&gt;, &lt;code&gt;@list&lt;/code&gt;, &lt;code&gt;@include&lt;/code&gt;, &lt;code&gt;@if/@else/@endif&lt;/code&gt;, &lt;code&gt;@constraint&lt;/code&gt;, &lt;code&gt;@validate&lt;/code&gt;, &lt;code&gt;@cache&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Assistant-agnostic&lt;/strong&gt; — outputs plain markdown. Works with Claude Code, Cursor, Codex, Rovo Dev, and anything else that reads a file&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CLAUDE.md / AGENTS.md targets&lt;/strong&gt; — &lt;code&gt;perseus render --format agents-md&lt;/code&gt; outputs AGENTS.md every tool already reads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP server&lt;/strong&gt; — 13 tools for any MCP-compatible assistant: &lt;code&gt;perseus mcp serve&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single file, one dependency&lt;/strong&gt; — &lt;code&gt;perseus.py&lt;/code&gt; (~12,000 lines) + pyyaml&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Nearly 600 tests, MIT license&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why Not Just Use AGENTS.md?
&lt;/h2&gt;

&lt;p&gt;AGENTS.md is your project's bio. Perseus is your project's heartbeat. One is static text you write once. The other resolves live state every time you render it.&lt;/p&gt;

&lt;p&gt;They compose. Perseus can render &lt;em&gt;to&lt;/em&gt; AGENTS.md — keep your static instructions, add live state, one file your assistant already reads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Not Just Use MCP?
&lt;/h2&gt;

&lt;p&gt;MCP is runtime. One fact per tool call. Perseus is compile-time — N facts in one file. They compose too: Perseus has its own MCP server that exposes 13 directive tools for assistants that prefer the runtime model.&lt;/p&gt;

&lt;p&gt;The right question isn't "MCP or Perseus?" — it's "which facts should arrive before the assistant speaks, and which should it discover on demand?" Perseus handles the first category. MCP handles the second.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;perseus-ctx
perseus init                     &lt;span class="c"&gt;# scaffold .perseus/context.md&lt;/span&gt;
perseus render &lt;span class="nt"&gt;--format&lt;/span&gt; agents-md  &lt;span class="c"&gt;# your first live briefing&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For Claude Code users:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;perseus &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--target&lt;/span&gt; claude-code  &lt;span class="c"&gt;# auto-inject context at session start&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then set up a cron job to re-render every 5 minutes — your assistants always start briefed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom Line
&lt;/h2&gt;

&lt;p&gt;I built Perseus because I was tired of every AI session starting with "what branch am I on? what's running? where were we?" The assistant should know before it says hello.&lt;/p&gt;

&lt;p&gt;If you've felt the same frustration, give it a try. It's MIT licensed, one dependency, and takes 30 seconds to set up. If it saves you even one orientation exchange per session, it's paid for itself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/tcconnally/perseus" rel="noopener noreferrer"&gt;github.com/tcconnally/perseus&lt;/a&gt;&lt;/strong&gt; | &lt;strong&gt;&lt;a href="https://perseus.observer" rel="noopener noreferrer"&gt;perseus.observer&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What's your cold-start routine? Do you use AGENTS.md, Claude hooks, or just re-explain every session? I'm curious how others are solving this.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>python</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
