There's a moment every developer who builds multi-session AI agents hits. You've spent time crafting a workflow. Your agent is doing real work — analyzing code, researching topics, investigating a problem. It's surfacing insights you genuinely couldn't get to as fast on your own.
Then the session ends.
You start a new one the next day, and your agent introduces itself like it's never met you. It has no memory of what it found. The work it did yesterday is completely gone — not archived, not summarized, just gone. You're back to square one.
I hit this wall enough times that I stopped treating it as a minor inconvenience and started treating it as a fundamental design problem. This post is the origin story of what I'm building to fix it — and the beginning of a series where I'll share everything I'm learning along the way.
The Problem Has a Name: Agent Amnesia
I started calling it agent amnesia, because that's exactly what it is.
Modern LLM-based agents are stateless by default. Each session starts with a fresh context window. Anything the agent learned in a previous session — patterns it identified, facts it discovered, analyses it ran — lives only in that session's context. When the session ends, the context is gone.
This wouldn't matter much if agents were just answering one-off questions. But the direction agentic AI is moving is toward long-running, multi-session, multi-agent workflows. Agents that analyze codebases across many sessions. Agents that do research over days or weeks. Teams of agents that each specialise in something and need to share what they find.
Agent amnesia breaks all of these patterns. And it breaks them silently, in ways that compound.
Here are the three specific scenarios that made me decide to actually do something about it.
Scenario 1: The Re-Analysis Loop
I was using an agent to build up a map of a large codebase — understanding module dependencies, authentication patterns, data models. The agent was good at it. In session one, it mapped out the auth module. In session two, it re-analyzed the same auth module from scratch.
Then in session three, it did it again.
Every session, because it had no memory of what it had already found, it started the same analysis from zero. The token cost was wasted. The time was wasted. And the "knowledge" it had built up across sessions was entirely in my head, not in the system.
Scenario 2: Multi-Agent Knowledge Isolation
I had a two-agent pipeline: Agent-1 scanned code for security issues, Agent-2 wrote the security report.
Agent-1 found a real issue — a SQL injection vector in a specific endpoint. It was in Agent-1's context. When Agent-2 ran to write the report, it had no access to Agent-1's findings unless I manually passed them as part of the prompt. Which I sometimes forgot to do, or did incompletely.
The two agents were islands. There was no shared memory pool. Critical information lived in one context and was invisible to everything else.
Scenario 3: The Long-Running Research Problem
I ran an agent over several days to survey a technical topic — reading papers, summarizing findings, connecting ideas. By day two, the agent had accumulated so many notes that its context window was filling up. As I added new content, the oldest context was being truncated.
The agent started forgetting its own earlier work mid-task. The longer it ran, the less it knew about where it started. Compounding, not accumulating.
What I Tried First (What Didn't Work)
Before building anything new, I tried the obvious fixes.
Dump everything to a text file. This sounds simple and is, for a few sessions. At scale it becomes a blob of unstructured, unranked text that the agent can't navigate efficiently. Relevance is lost in the noise.
Add a vector database. This is the common answer — embed everything, store in Qdrant or Chroma, retrieve by semantic similarity. And it works. But it also requires a persistent service running alongside your agent. For local or offline workflows, it means spinning up infrastructure. And most hosted options meant data leaving my machine.
Make the context window bigger. The context window is not the memory. Bigger context just delays the problem — it doesn't solve it. The agent still loses older context when newer content pushes it out. And you're paying token cost for everything in the window every turn.
None of these felt like they addressed the root cause.
The Reframe That Changed How I Thought About It
Here's the thing I eventually landed on: agent amnesia is not a retrieval problem. It's a memory architecture problem.
The tools that exist are mostly retrieval tools. They assume you've already solved the question of what to store, when to store it, how to structure it, and how to make it available across agents and sessions. They give you a fast way to find things in a pile. But they don't give you the pile.
What agents actually need isn't just a fast search index. They need a memory system that:
- Persists across sessions — survives context window resets
- Promotes observations into knowledge — raw findings consolidate into durable insights over time
- Supports multi-agent sharing — what one agent learns, others can access
- Works without external dependencies — no service to run, no cloud to authenticate against
- Is inspectable — you should be able to see what your agent knows, not just trust that the embeddings are right When I listed these out, I realised I was describing something that doesn't quite exist yet. Not as a standalone, local-first tool.
The Idea: What If the Filesystem IS the Memory?
The design I landed on is a little unusual.
Instead of adding a database as a dependency, I made the filesystem the memory layer. Agent memories are JSON files, stored in a structured .afs/ directory. Indexing uses SQLite FTS5 and HNSW vector indices — both embedded, no server. Relationships between memories are stored as msgpack graph edges in the same directory.
.afs/
├── agents/
│ └── my-agent/
│ ├── memories/
│ │ ├── working/ ← recent observations (< 24h)
│ │ ├── episodic/ ← full history, searchable
│ │ └── semantic/ ← auto-consolidated knowledge
│ └── indices/
├── sessions/
└── system/logs/
The three-tier structure — working, episodic, semantic — mirrors how human memory actually works:
- Working memory is fresh and fast. Raw observations your agent made today.
- Episodic memory is the complete searchable history — every finding with its full provenance.
- Semantic memory is the interesting one: durable knowledge synthesized automatically from episodic memories by a background scheduler. This is where "found JWT expiry of 24h in auth module" becomes "auth system uses short-lived JWT, no refresh token — stateless by design." The agent stores observations. The system extracts the knowledge.
Why Files?
I want to be transparent about this choice, because it's the one that will get the most pushback.
Files are not the obvious answer. The ecosystem default is a vector DB. There are good reasons for that default.
But here's what files give you that a vector DB doesn't:
Inspectability. You can cat a memory. You can jq it. You can grep through all of them. When your agent behaves unexpectedly, you can actually look at what it knows. This turned out to matter more than I expected for debugging.
Versionability. git add .afs/ && git commit -m "agent knowledge after day 1" is a valid workflow. You can snapshot your agent's knowledge at any point in time. Diff it. Roll it back.
Portability. rsync .afs/ user@other-machine:./ and it just works. No migration scripts. No schema exports.
Zero dependencies. No database server process. No cloud service. No API key. Works offline, works air-gapped, works on a homelab, works on a flight.
The trade-off is scale. If you need millions of memories across thousands of agents with millisecond latency, a dedicated vector DB will outperform this. I've tested to around 100k memories per agent at under 100ms search latency, and that covers the use cases I care about. If you're operating at a different scale, the answer might be different.
Where AFS Stands Now
I've been building this under the name AFS. It's open source and under active development.
⚠️ I want to be upfront: this is early software. APIs change frequently. I'm not sharing it because it's finished — I'm sharing it because I want to build in public, get feedback from people hitting the same problems, and figure out what I'm missing.
What it can do today:
- Persistent agent memory across sessions (working → episodic → semantic, automatic lifecycle)
- Full-text + vector search (FTS5 + HNSW hybrid)
- Multi-agent swarm knowledge sharing
- Auto-built knowledge graph (memories connect automatically)
- Token-budgeted session management with auto-compression
- Audit trail with standardized operation logging
- CLI + Python API, framework-agnostic What I'm still figuring out:
- How to handle conflicting memories across agents
- Consolidation correctness at edge cases
- The right balance between compression and fidelity in semantic synthesis
Repo: https://github.com/thompson0012/project-afs (https://github.com/thompson0012/project-afs)
Top comments (0)