"Reading YC-Backed Code" is a series where I read the actual source code of Y Combinator-backed products and review their design and algorithms.
Subject: claude-mem
- Repository: github.com/anthropics/claude-mem
- What it does: A plugin that gives Claude Code persistent memory across sessions
- Backed by: Y Combinator (via Anthropic)
- Popularity: Widely used worldwide. Officially recommended by Anthropic
The concept is brilliant. Claude Code resets its memory every time you start a new session. Giving it long-term memory is exactly what many people want.
That said, let me state my conclusion upfront. The idea is great. The implementation is poor.
Background
I'm a fan of local LLMs, so I wanted to run claude-mem with a local model. I forked the repo and started reading the source code — only to find implementation after implementation that lacked basic computer science fundamentals.
Below, I walk through each issue with actual code.
Issue 1: AI Compression Requests Are Sent One at a Time
Here's how claude-mem's memory pipeline works:
Hook fires (on every tool use)
→ Enqueue to pending_messages
→ Send 1 item to Claude API
→ Save AI-compressed result to observations table
→ Delete from pending_messages (raw data lost)
Every single tool use triggers an API request. Every Bash command, every file read — each one fires off a separate request.
If you use 100 tools in a session, that's 100 API requests. Each request carries overhead tokens for the system prompt and other boilerplate. Sending 100 items one at a time means paying that overhead 100 times. Batch processing would reduce it to once. Instead, you're burning 100x the overhead tokens.
On top of that, raw data is discarded after compression. Once an entry moves from pending_messages to observations, the original is gone. If the compression result is inaccurate, there's no way to recover the source data.
What it should look like:
Store the raw data in the DB first. If compression is needed, batch it in appropriate chunks.
Hook fires
→ INSERT raw data directly into DB (no AI needed, completes in 1ms)
Background thread (at its leisure)
→ Batch-compress accumulated data in appropriate chunks
→ Retry on failure, never delete raw data
Separate recording from compression. Recording is non-blocking. Compression is best-effort. Batching instead of per-item API calls also dramatically reduces API costs.
Issue 2: No Timeouts
The fetch call to the AI has no AbortController.
// Fetch to AI — no timeout
const response = await fetch(url, {
method: 'POST',
body: JSON.stringify({ model, messages, ... }),
});
If the AI doesn't respond, this fetch waits forever. The entire worker process hangs, and nothing else in the queue gets processed. Nothing shows up in the logs. From the user's perspective, it just looks like "memory isn't being saved."
This is all it takes:
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 60_000);
try {
const response = await fetch(url, { signal: controller.signal, ... });
} finally {
clearTimeout(timeout);
}
Cut it off at 60 seconds and move on.
Issue 3: No Retry Strategy
What happens when a request to the AI fails:
- Parse failure → Retries the same item forever
- API connection error → Just logs it in
.catch() - No "failed" status in the queue → Broken items block the head of the queue indefinitely
With a high-performance model like Claude API, parse failures are rare. However, users online have reported that "the PC keeps getting slower and slower" after installing this plugin. This is likely the deadlock occurring at low frequency. The fundamentals are clear: mark failed items with a flag and move on, give up after a fixed number of retries, never block subsequent processing — these are the absolute basics of queue management.
Issue 4: Liveness and Readiness Confusion
How the installer checks if the worker is ready:
const response = await fetch(`http://127.0.0.1:${port}/api/health`);
if (response.ok) return true; // Consider it "ready"
But /api/health returns 200 the instant the HTTP server starts. DB initialization happens in a background process afterward. The result: the installer reports success while the DB is still 0 bytes.
In standard server operations:
-
Liveness probe: Is the process alive? →
/api/health -
Readiness probe: Can it handle requests? →
/api/readiness
These are two different things. The worker already has /api/readiness (which only returns 200 after initialization is complete), yet the installer was using /api/health.
Issue 5: Background Initialization Errors Are Silently Swallowed
this.initializeBackground().catch((error) => {
logger.error('SYSTEM', 'Background initialization failed', {}, error as Error);
});
If initializeBackground() fails, it just logs it. The user never sees it. The process keeps running. But since the DB was never initialized, every API call fails.
Furthermore, chroma migration and mode loading are executed before DB initialization. If these fail first, DB initialization is never reached. The most critical processing step is placed last in the chain.
Issue 6: Attempting to Substitute with Local LLMs
This section is a report on my attempt to replace the Claude API with local LLMs, driven by frustration with the cost-heavy design. This process helped me discover the algorithmic issues described above.
The one-at-a-time compression design requires very frequent LLM calls. If latency isn't low enough, the queue jams up and nothing works. So I tried progressively smaller models to get acceptable processing speed, but smaller models were fast at the cost of catastrophic accuracy.
Results with a 2B model (qwen3.5-2b):
| Recorded Title | Actual Work | Accuracy |
|---|---|---|
| Enhanced User Feedback System with Real-time Analytics | Fixing plugin installation | Completely fabricated |
| Enhanced Authentication with OAuth2 PKCE Flow | Fixing health check | Completely fabricated |
| Fix: Worker migration execution failure | Fixing DB initialization | Partial (details missing) |
Out of 8 entries, only 2 were accurate (25%). OAuth2 and Real-time Analytics don't exist anywhere in this project.
An 8B model (ELYZA) scored about 40% accuracy. No local LLM proved viable.
At this point, I realized the entire algorithm needed to be rethought. With the ultra-high-performance Claude API it might be practical, but the one-at-a-time compression design is fundamentally flawed. This isn't limited to local LLMs either — the same thing would happen with poor network conditions. Use this tool on spotty Wi-Fi while traveling, and these issues will likely surface.
Overall Assessment
Timeouts, retries, backpressure, liveness/readiness separation — these are the basics of distributed systems. All are missing. The only reason these issues don't surface is that Claude API responses are fast and accurate. A high-performance model is masking the design flaws.
With poor network conditions, the same issues would appear even with the Claude API. On unstable Wi-Fi, a timeout-less fetch hangs, the queue backs up, and the PC slows down. You just don't notice it as long as you're developing on a stable home connection.
The Future of AI-Based Compression
I'm skeptical that AI-based information compression will remain viable long-term.
This technique was popular among gamers during the ChatGPT 3.5 era. I've verified through my own testing that Claude Sonnet can compress information down to roughly 5% of the original in a single pass.
However, when OpenAI upgraded from GPT-3.5 to 4.0, they likely prioritized robustness against prompt injection, and the compression efficiency dropped as a result. Anthropic's models currently maintain high compression efficiency, but it's entirely possible that similar measures will be implemented within six months to a year.
In other words, a design built around AI-based compression may look attractive now, but it carries the risk of breaking when models change their specifications.
BUN's Environment Variable Problem
Additionally, claude-mem uses BUN as its runtime, and BUN has a known issue where environment variables propagate to child processes (Bun Official Docs - Spawn). If Claude API keys are set as environment variables, they could be passed to unintended child processes, potentially resulting in expensive API charges.
This isn't hypothetical. Vulnerabilities in Claude Code itself have been reported where API tokens can be exfiltrated through project files (CVE-2025-59536 / CVE-2026-21852, The Hacker News). OWASP has also flagged token management and secret exposure as security risks in MCP (MCP01:2025 - Token Mismanagement and Secret Exposure). The issue of Claude Code silently auto-loading .env secrets has also been documented (Knostic).
Putting secrets in environment variables goes against best practices in the first place (Node.js Security - Do not use secrets in environment variables).
"Build It Yourself Then" — So I Did
With all these concerns piling up, I decided to build my own tool that achieves the same goal. Design took 5 minutes, implementation took 15, testing took 15. Done in 35 minutes.
A primitive algorithm is more than sufficient. The fact that this project has generated so much global buzz probably has more to do with timing than technical quality. That's something I need to learn from myself. Rather than obsessing over perfection, shipping what people need at the right moment — that's clearly what matters when it comes to getting funded.
Despite its global popularity, nobody seems to have pointed out these design issues. There are no reports like this in the GitHub Issues. There are plenty of "installation walkthrough" articles, but zero articles that actually review the source code. I find that unhealthy.
Do We Even Need AI Compression?
While I was struggling to fix the forked codebase, something clicked.
There's no need for AI compression in the first place.
Claude Code automatically writes all session data to JSONL files under ~/.claude/projects/. The full conversation history is right there.
Just shove it into SQLite and let Claude's 1M token context read the raw data at search time. No compression needed. Claude can understand what happened by reading the raw logs.
The Alternative: claude-relay
So I built one from scratch. claude-mem is about 7,500 lines of TypeScript across 44 files. claude-relay is about 1,600 lines of Rust. Even with that reduction, all the essential features are there. No need for BUN. No need for a plugin architecture, so token overhead is minimal. Deadlocks are prevented. Timeouts are properly implemented. And since it doesn't use AI at all, it costs nothing to run.
Honestly, tools like this probably already exist everywhere. It's not going to make headlines. But that's fine.
Core Idea
Store: JSONL files → Shove directly into SQLite (no AI, completes in 1ms)
Search: Hit SQLite FTS5 (full-text search)
Understand: Pass search results to Claude's context — Claude reads and understands
No AI for recording or compression. "Recording" is just file reading and DB insertion.
Architecture
┌─────────────────────────────────┐
│ ~/.claude/projects/**/*.jsonl │
│ (Claude Code writes these) │
└──────────┬──────────────────────┘
│
↓ On session start or MCP tool call
│ (No daemon. Runs only when needed)
│
┌──────────▼──────────────────────┐
│ Delta ingestion │
│ Read only new lines from │
│ the last byte offset │
└──────────┬──────────────────────┘
│
↓
┌──────────▼──────────────────────┐
│ SQLite │
│ ├─ raw_entries (all raw data) │
│ ├─ raw_entries_fts (FTS) │
│ ├─ sync_state (read position) │
│ └─ summaries (session summaries│
└──────────┬──────────────────────┘
│
↓ Via MCP tools
┌──────────▼──────────────────────┐
│ Claude Code │
│ "What was I working on?" │
│ "How did I fix that OAuth bug?"│
└─────────────────────────────────┘
No Daemon
JSONL is only written while Claude Code is running. You only want to read memories during a Claude Code session. So "ingest right before reading" is sufficient.
Sync happens at two points:
- SessionStart hook — Ingests data from previous sessions at startup
- MCP tool call — Fallback. Catches anything the hook missed
Both paths are idempotent, with each JSONL file's "last read position" tracked by byte offset.
CREATE TABLE sync_state (
file_path TEXT PRIMARY KEY,
last_offset INTEGER DEFAULT 0
);
Store Everything, Filter on Read
No need to overcomplicate it. Store everything from the JSONL into SQLite, and pull out only what you need. That's sufficient for Claude Code's memory.
SELECT * FROM raw_entries
WHERE type IN ('user', 'assistant')
AND date = '2026-03-23'
ORDER BY timestamp;
For heavy users whose sessions grow until they can no longer open them, there's a feature to limit the readable range by number of days.
Table Design
CREATE TABLE raw_entries (
id INTEGER PRIMARY KEY,
session_id TEXT NOT NULL,
timestamp TEXT NOT NULL,
date TEXT NOT NULL,
time TEXT NOT NULL,
type TEXT NOT NULL,
tool_name TEXT,
content TEXT NOT NULL,
cwd TEXT,
git_branch TEXT,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX idx_raw_date ON raw_entries(date);
CREATE INDEX idx_raw_type ON raw_entries(type);
CREATE VIRTUAL TABLE raw_entries_fts USING fts5(
content, tool_name, session_id
);
claude-mem vs claude-relay
| claude-mem | claude-relay | |
|---|---|---|
| Language | Node.js | Rust |
| AI dependency | AI compression required for recording | No AI needed |
| Persistent process | Express server + worker | None |
| Raw data retention | Deleted after compression | Kept permanently (archivable) |
| Local LLM | Practically unusable | No LLM needed at all |
| Dependencies | node_modules | Single binary |
| API cost | Claude API charges | $0 |
Want to Try It?
Testing has been minimal. It works on my machine (macOS), but I haven't tried other environments.
If you find bugs or can't get it running, please let me know via Issue.
I don't accept PRs. I generate tens of thousands of lines a day, so by the time you submit a PR, the codebase has probably been completely rewritten. If you're interested, just fork it. Anyone can build this with vibe coding.
Repository: github.com/veltrea/claude-relay
Licensed under MIT.
Next up: "Reading YC-Backed Code" #2 is TBD. I'll write another one when I find an interesting repo.
Top comments (0)