Last week I was three sessions deep into refactoring a hybrid retrieval ranker. Two days of decisions: which scoring strategies I'd rejected and why, the FTS5 + RRF math, the source-weight invariants we'd locked in. I closed the laptop. The next morning I asked Claude Code to keep going.
It didn't remember any of it.
It re-suggested the exact ranking approach we'd ruled out 48 hours earlier. Not similar — the same one, with the same flaw I'd already explained.
That was the moment I stopped pretending the AI agent memory problem was a tooling preference and started treating it as a missing layer.
The actual problem with Claude Code memory (and every other agent)
If you live inside Claude Code, Codex, or any MCP client, you've hit at least one of these:
- Session amnesia. Every fresh session starts at zero. Decisions, rejections, constraints — gone.
- RAG returns noise. Vector search surfaces the chunk that looks like your query, not the chunk that records the decision you made.
- Cloud lock-in. Slick managed memory services want your code, your decisions, and your project history living on someone else's box.
The Claude Code built-in MEMORY.md helps a little, but it has a hard practical limit before it stops working — and it's one flat file per project. Once you're juggling three projects, it's not enough.
What I tried before building anything
I'm not in the "build it from scratch" camp by default. I tried the cloud memory services first — the mem0 / Zep / hosted-graph category. They work. They're impressive. They also park your context outside your machine, charge for it, and assume "good retrieval equals good memory." For a solo dev with private repos, that's the wrong trade on every axis.
I also tried treating MEMORY.md as the answer. It's not. A flat markdown file isn't a memory system; it's a sticky note with delusions of grandeur.
So I built waypath.
What waypath actually does — local-first AI memory in one CLI
waypath is a local-first external brain for AI agents. One npm install, one SQLite file, no cloud. It bootstraps Claude Code or Codex sessions, exposes a native MCP server for any MCP-compatible client, and gives you a CLI to recall, page, promote, and govern memory.
Four design choices make it different from "another memory tool."
1. Truth and archive are two different things
Most memory systems mash everything into one store and rank it. waypath keeps them physically separate.
- Truth kernel — canonical decisions, entities, preferences, with temporal validity and supersede.
- Archive kernel — raw evidence, content-hash dedup, FTS5 full-text index.
When the agent asks "what's the current decision on X," it hits truth. When it asks "why did we decide that," it hits archive. The same row never plays both roles. That single split kills a whole class of "the agent is now confidently wrong" failures.
2. Promotion is an explicit gate, not a vibe
Auto-extracting "facts" from conversations sounds great until your memory store is full of confidently-wrong claims an LLM inferred at 2am with no human in the loop.
waypath makes promotion a real verb:
waypath page --subject "hybrid ranker v2 design"
waypath promote --subject "hybrid ranker v2 design"
waypath review-queue --json
Nothing becomes long-term truth without passing a review queue. The agent can suggest. You ratify. That boundary is the difference between a notebook you trust and one you stop opening.
3. Recall is graph-aware, not chunk-aware
Pure vector RAG is keyword soup with extra steps. waypath runs FTS5 + reciprocal rank fusion + graph expansion — when you ask about a decision, it walks entity → relation → related decision, not just "find me a chunk that lexically matches."
Default expansion patterns: project_context, person_context, system_reasoning, contradiction_lookup. The last one is the killer: if your new claim contradicts an old one, it surfaces both, side by side. No silent overwrite.
4. One facade, every host
I didn't want a separate memory tool per agent. waypath is a single facade with 14 verbs behind it, exposed as 26 CLI commands. The Codex shim, the Claude Code shim, and the built-in Codex MCP server are thin adapters on top of the same kernel.
flowchart LR
H[Codex / Claude Code / MCP client] --> F[Facade]
F --> TK[Truth Kernel]
F --> AK[Archive Kernel]
F --> ON[Ontology / graph]
F --> PR[Promotion engine]
The same ~/.waypath/my-project.db file follows you across every agent host — no re-import, no dual write.
60-second quick start
npm install -g waypath
waypath --help
Bootstrap a Codex session with persistent context:
waypath codex --json \
--project my-project \
--objective "ship v2 of the retrieval pipeline" \
--task "refactor hybrid ranker" \
--store-path ~/.waypath/my-project.db
Recall what you and the agent decided last time:
waypath recall --query "hybrid ranker decisions" --json
Run as an MCP server for Claude Code, Cursor, or anything MCP-compatible:
waypath mcp-server --store-path ~/.waypath/my-project.db
That's the funnel. Make it past those four commands and you have an external brain for your agents that survives every restart, panic-quit, and "let me start a fresh session."
What's not done yet (because lying about a v0.1.1 is how trust dies)
waypath is v0.1.1, first public release on 2026-04-17, MIT-licensed, currently sitting at a humble star count and a handful of open issues. Honest list of what's still rough:
- No hosted deployment. Local-first first. Multi-user sync is on the roadmap, not in this release.
- No adaptive ranking feedback loop. Retrieval weights are configurable but not self-tuning yet.
- More source adapters wanted. Notion, Linear, Obsidian — these would be excellent first PRs.
Tests passing at the time of writing: 131 (unit + integration + benchmark). Runtime needs Node ≥ 22; native node:sqlite is used on 22.5+, with auto-fallback to better-sqlite3 (the one optional dependency). Zero required runtime services.
I want your worst use case
If you've ever closed Claude Code mid-session and silently mourned the context you were about to lose — waypath is for you.
npm install -g waypath
GitHub: TheStack-ai/waypath
Three asks, ranked by how much they help me ship the next version:
- Star the repo if "local-first AI agent memory" is a thing you want to exist. It's the strongest signal a maintainer can act on.
- Open an issue with the worst memory failure your agent has ever inflicted on you. I want fixes #1, #2, and #3 designed around real pain, not synthetic benchmarks.
- Tell me in the comments below what you'd plug into the MCP server first. Notion? Obsidian? A team PR archive? Your local Logseq? I'll pick the next adapter based on what shows up here.
Forgetting is a feature for goldfish. Your coding agent should be doing better.
Top comments (1)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.