DEV Community

Nate Nelson
Nate Nelson

Posted on

Why I built a lossless alternative to AI memory summarization

Why I built a lossless alternative to AI memory summarization

Every AI memory tool I tried summarized my sessions before giving them back to me.

I'd spend an hour debugging a gnarly webhook bug with Claude Code. A week later I'd come back, ask about it, and get a three-sentence LLM summary. The actual fix? Gone. The reasoning trace? Gone. The five wrong attempts before the right one? Summarized into "you worked on webhook authentication."

Summarization is a lossy decision disguised as a convenience. An LLM decides what's worth remembering, and I never get to see what it threw away.

I built Longhand because I didn't want that tradeoff anymore.

The industry is racing in the wrong direction

The mainstream answer to AI memory is "make the context window bigger." 1M tokens. 2M tokens. Context-infinite. Every model lab is pushing the same axis: make the model carry more state.

This is the wrong abstraction. The model doesn't need to carry the memory. The disk does.

Storage is a solved problem. SQLite shipped in 2000. ChromaDB shipped two years ago. Both run on a laptop. The "AI memory crisis" is artificial — an industry-wide assumption that memory must live where inference happens, even though it makes the whole system more expensive, less private, and more vendor-locked.

The state of the world, unfiltered

Here's what most people don't realize: Claude Code already writes rich logs of every session. Every tool call. Every file edit. Every thinking block. All of it, verbatim, to JSONL files in ~/.claude/projects/.

Those files contain a forensic-level record of your entire collaboration with the model. Nothing is lossy. Nothing is summarized. It's just sitting there on your disk, right now, for every session you've ever had.

The problem is two-fold.

First, Claude Code rotates those files off disk after a few weeks. If you don't capture them, they're gone.

Second, every memory tool that tries to "use" them does so by summarizing — asking another LLM to compress the session into a paragraph before handing it back. Which is the lossy move I was trying to avoid in the first place.

The architecture

Longhand takes the opposite path. It reads the JSONL files verbatim and indexes them into two local stores:

  • SQLite for structured events — every tool call, edit, commit, thinking block as a typed row with a timestamp and session ID
  • ChromaDB for semantic search — vector embeddings of episode summaries and conversation segments

Auto-ingestion runs via a SessionEnd hook that Claude Code fires after every session. Once-off backfill ingests your existing history on install. The data persists forever after that — even after Claude Code rotates the source JSONL off disk, Longhand has its own copy.

Recall is exposed as an MCP server. Claude Code itself gets 17 tools:

  • recall — fuzzy natural-language query ("that stripe webhook fix from last week")
  • search_in_context — find text across sessions, with surrounding conversation
  • get_session_timeline — chronological replay of a session
  • replay_file — reconstruct the exact state of a file at any point in any session
  • find_commits, get_file_history, recall_project_status, and 10 more

When you ask Claude "do you remember when we fixed X?" it doesn't hallucinate from the last 10K tokens of context. It queries its own history on disk and returns the actual event.

The numbers

After testing against 107 real Claude Code sessions (53,668 events, 665 git operations, 376 problem→fix episodes, 299 conversation segments across 37 projects):

  • Semantic recall across 100+ sessions: ~126ms
  • Storage footprint: ~1GB for a heavy power user, 200–400MB typical
  • API calls per query: zero
  • Summarization per query: zero
  • Network requests: zero
  • Works offline: yes

170 unit tests. Security-audited, zero critical findings. Published on PyPI as longhand. Registered in the official MCP Registry.

What this unlocks

The interesting part isn't the speed. It's what becomes possible once memory lives on your disk instead of in a vendor's context window.

Cross-model portability. Your history isn't locked to any model version. When Claude Opus 5 ships tomorrow, the same Longhand database works unchanged. Switch to a different model entirely? The data is yours.

Privacy by default. Nothing leaves your machine. For regulated workflows, client work under NDA, or anyone who just doesn't want their session history flowing through someone else's servers, this is the only architecture that actually fits.

Forensic replay. Not just "what did we discuss" but "what was the exact state of auth.ts on line 42 at 3:17pm last Tuesday?" — answerable deterministically, because every edit is in the record.

Offline work. Airplane, remote location, air-gapped environment. Your memory works. Because it's a SQLite file.

What Longhand doesn't try to do

It's not a general-purpose AI memory system. It's specific to Claude Code's JSONL format.

It won't help you with ChatGPT, Cursor, or any other client that doesn't write per-session logs to disk. (Though the architectural pattern — verbatim capture, local indexing, semantic recall — generalizes cleanly to anything that produces a rich session log.)

It's also not trying to replace the context window. The window is still useful for the current conversation. Longhand handles the rest — the 107 sessions that came before.

Install

pip install longhand
longhand setup
Enter fullscreen mode Exit fullscreen mode

The setup command backfills your existing Claude Code history, installs the auto-ingest hook, and registers as an MCP server. Takes about two minutes on a laptop with a year of sessions. Safe to re-run.

Then try it:

longhand recall "that webhook fix from last week"
Enter fullscreen mode Exit fullscreen mode

Why I'm sharing this

The memory crisis in AI was an artificial constraint — a default that everyone inherited without questioning. I wanted to see what fell out if you rejected the constraint entirely and asked: what if the disk carries the memory, and the model just queries it?

What fell out is Longhand. 336 unique developers have cloned it in the last 14 days. 733 PyPI installs in the same window. 193 weekly visitors on PulseMCP. The curve is bending up, not flattening.

If that resonates, the repo is here: https://github.com/Wynelson94/longhand

MIT licensed. Python 3.10+. 170 tests. Zero API calls. Yours.

Top comments (0)