DEV Community

Nate Nelson
Nate Nelson

Posted on

Why I built a lossless alternative to AI memory summarization

Why I built a lossless alternative to AI memory summarization

Every AI memory tool I tried summarized my sessions before giving them back to me.

I'd spend an hour debugging a gnarly webhook bug with Claude Code. A week later I'd come back, ask about it, and get a three-sentence LLM summary. The actual fix? Gone. The reasoning trace? Gone. The five wrong attempts before the right one? Summarized into "you worked on webhook authentication."

Summarization is a lossy decision disguised as a convenience. An LLM decides what's worth remembering, and I never get to see what it threw away.

I built Longhand because I didn't want that tradeoff anymore.

The industry is racing in the wrong direction

The mainstream answer to AI memory is "make the context window bigger." 1M tokens. 2M tokens. Context-infinite. Every model lab is pushing the same axis: make the model carry more state.

This is the wrong abstraction. The model doesn't need to carry the memory. The disk does.

Storage is a solved problem. SQLite shipped in 2000. ChromaDB shipped two years ago. Both run on a laptop. The "AI memory crisis" is artificial — an industry-wide assumption that memory must live where inference happens, even though it makes the whole system more expensive, less private, and more vendor-locked.

The state of the world, unfiltered

Here's what most people don't realize: Claude Code already writes rich logs of every session. Every tool call. Every file edit. Every thinking block. All of it, verbatim, to JSONL files in ~/.claude/projects/.

Those files contain a forensic-level record of your entire collaboration with the model. Nothing is lossy. Nothing is summarized. It's just sitting there on your disk, right now, for every session you've ever had.

The problem is two-fold.

First, Claude Code rotates those files off disk after a few weeks. If you don't capture them, they're gone.

Second, every memory tool that tries to "use" them does so by summarizing — asking another LLM to compress the session into a paragraph before handing it back. Which is the lossy move I was trying to avoid in the first place.

The architecture

Longhand takes the opposite path. It reads the JSONL files verbatim and indexes them into two local stores:

  • SQLite for structured events — every tool call, edit, commit, thinking block as a typed row with a timestamp and session ID
  • ChromaDB for semantic search — vector embeddings of episode summaries and conversation segments

Auto-ingestion runs via a SessionEnd hook that Claude Code fires after every session. Once-off backfill ingests your existing history on install. The data persists forever after that — even after Claude Code rotates the source JSONL off disk, Longhand has its own copy.

Recall is exposed as an MCP server. Claude Code itself gets 17 tools:

  • recall — fuzzy natural-language query ("that stripe webhook fix from last week")
  • search_in_context — find text across sessions, with surrounding conversation
  • get_session_timeline — chronological replay of a session
  • replay_file — reconstruct the exact state of a file at any point in any session
  • find_commits, get_file_history, recall_project_status, and 10 more

When you ask Claude "do you remember when we fixed X?" it doesn't hallucinate from the last 10K tokens of context. It queries its own history on disk and returns the actual event.

The numbers

After testing against 107 real Claude Code sessions (53,668 events, 665 git operations, 376 problem→fix episodes, 299 conversation segments across 37 projects):

  • Semantic recall across 100+ sessions: ~126ms
  • Storage footprint: ~1GB for a heavy power user, 200–400MB typical
  • API calls per query: zero
  • Summarization per query: zero
  • Network requests: zero
  • Works offline: yes

170 unit tests. Security-audited, zero critical findings. Published on PyPI as longhand. Registered in the official MCP Registry.

What this unlocks

The interesting part isn't the speed. It's what becomes possible once memory lives on your disk instead of in a vendor's context window.

Cross-model portability. Your history isn't locked to any model version. When Claude Opus 5 ships tomorrow, the same Longhand database works unchanged. Switch to a different model entirely? The data is yours.

Privacy by default. Nothing leaves your machine. For regulated workflows, client work under NDA, or anyone who just doesn't want their session history flowing through someone else's servers, this is the only architecture that actually fits.

Forensic replay. Not just "what did we discuss" but "what was the exact state of auth.ts on line 42 at 3:17pm last Tuesday?" — answerable deterministically, because every edit is in the record.

Offline work. Airplane, remote location, air-gapped environment. Your memory works. Because it's a SQLite file.

What Longhand doesn't try to do

It's not a general-purpose AI memory system. It's specific to Claude Code's JSONL format.

It won't help you with ChatGPT, Cursor, or any other client that doesn't write per-session logs to disk. (Though the architectural pattern — verbatim capture, local indexing, semantic recall — generalizes cleanly to anything that produces a rich session log.)

It's also not trying to replace the context window. The window is still useful for the current conversation. Longhand handles the rest — the 107 sessions that came before.

Install

pip install longhand
longhand setup
Enter fullscreen mode Exit fullscreen mode

The setup command backfills your existing Claude Code history, installs the auto-ingest hook, and registers as an MCP server. Takes about two minutes on a laptop with a year of sessions. Safe to re-run.

Then try it:

longhand recall "that webhook fix from last week"
Enter fullscreen mode Exit fullscreen mode

Why I'm sharing this

The memory crisis in AI was an artificial constraint — a default that everyone inherited without questioning. I wanted to see what fell out if you rejected the constraint entirely and asked: what if the disk carries the memory, and the model just queries it?

What fell out is Longhand. 336 unique developers have cloned it in the last 14 days. 733 PyPI installs in the same window. 193 weekly visitors on PulseMCP. The curve is bending up, not flattening.

If that resonates, the repo is here: https://github.com/Wynelson94/longhand

MIT licensed. Python 3.10+. 170 tests. Zero API calls. Yours.

Top comments (23)

Collapse
 
itskondrat profile image
Mykola Kondratiuk

ran into this problem managing AI agents - the summary loses the why. three wrong approaches before the right one means the debug history IS the value, not just the outcome.

Collapse
 
wynelson94 profile image
Nate Nelson

for sure! they why is the important factor and it gets thrown away then you burn tokens doing the same thing over and over again.

Collapse
 
itskondrat profile image
Mykola Kondratiuk

yeah - and the repetition cost isn't just tokens. it's watching an agent confidently rediscover the same dead end

Thread Thread
 
wynelson94 profile image
Nate Nelson

that's what triggered the idea, I hated the start a new session or run an agent and it goes down the same failed approach it did before.

Thread Thread
 
itskondrat profile image
Mykola Kondratiuk

yeah the session boundary problem is real. my partial answer has been a dead-ends.md that agents read at boot - cuts the obvious loops at least. the harder problem is when the failure isn't clean - agent just drifts back to the same branch without realizing it

Thread Thread
 
wynelson94 profile image
Nate Nelson

if you run longhand let me know how it works, its solved a lot of those issues for me.

Thread Thread
 
itskondrat profile image
Mykola Kondratiuk

planning to run it — what does your setup look like? mostly wondering if it handles context carryover across restarts or if there’s still some manual reset logic involved.

Thread Thread
 
wynelson94 profile image
Nate Nelson

github.com/Wynelson94/longhand drop this link into your Claude Code and it can install it or you can install it from pypi, pypi.org/project/longhand/ it auto ingests and carries across sessions. once you install it it updates, ingests and runs.

Thread Thread
 
itskondrat profile image
Mykola Kondratiuk

solid — auto-ingest on restart is exactly the piece I was skeptical about. does it store raw history or does it compress/summarize before persisting? curious how it handles sessions that ran for hours before the restart.

Thread Thread
 
wynelson94 profile image
Nate Nelson • Edited

stores raw data from the jsonl already on disk, you will get roughly the last 20 days of sessions ingested on first ingest then as the disk rotates them off every twenty days longhand does not and stores them locally. it can handle any length of session. I've had 3 day sessions plus and every message, tool change is stored in longhand. claude creates a file for everything it does but rotates them, longhand stores them in a sqlite and chromadb locally
Global stats (all 45 projects Longhand has indexed)

  • 151 sessions — 82,999 events
  • 27,515 tool calls · 5,256 file edits · 1,409 git operations · 81 commits
  • 759 episodes (238 resolved) · 151 outcomes tagged
  • 3,482 segments · 67,984 vectors indexed
  • 224 thinking blocks captured

    Session │ Wall │ Events │ Tool │ File │
    │ │ time │ │ calls │ edits │
    ├───────────────────────────────┼─────────┼────────┼──────────┼──────────┤
    │ Apr 9–11 (BSOI-CRM → Longhand │ ~65 hrs │ 3,877 │ 1,267 │ 443 │
    │ origin) │ │ │ │ │
    ├───────────────────────────────┼─────────┼────────┼──────────┼──────────┤
    │ Apr 11–12 │ ~5 hrs │ 2,378 │ 431 │ 79 │
    ├───────────────────────────────┼─────────┼────────┼──────────┼──────────┤
    │ Apr 23 (early AM) │ ~3 hrs │ 1,215 │ 325 │ 64 │
    ├───────────────────────────────┼─────────┼────────┼──────────┼──────────┤
    │ Apr 23 (late AM) │ ~2 hrs │ 1,203 │ 324 │ 65 │
    ├───────────────────────────────┼─────────┼────────┼──────────┼──────────┤
    │ Apr 20 │ ~3.75 │ 685 │ 206 │ 45 │
    │ │ hrs │ │ │ │
    ├───────────────────────────────┼─────────┼────────┼──────────┼──────────┤
    │ Apr 28 (ingestion fix) │ ~38 min │ 579 │ 165 │ 33

Thread Thread
 
itskondrat profile image
Mykola Kondratiuk

raw jsonl plus disk rotation is a solid tradeoff — you get full fidelity for the hot window without unbounded growth. the 20-day window makes sense too; most meaningful debugging happens within that range anyway. appreciate the clarification.

Thread Thread
 
wynelson94 profile image
Nate Nelson

you actually keep it technically indefinitely the json rotates off but the Sqlite and chroma stay. heavy usage might be a couple gb a year and that's daily heavy usage

Thread Thread
 
itskondrat profile image
Mykola Kondratiuk

Good point on the persistent layers — SQLite and Chroma staying put means you keep durable semantic search even after the raw JSONL ages out. A couple GB a year for daily heavy usage is basically nothing on a dev machine.

Thread Thread
 
wynelson94 profile image
Nate Nelson

with my current usage which is a few million tokens a day in 3 months I have just over 1gb in longhand db. so maybe 4gb a year. My photos and videos are way more space strains. Just want to say thank you for the questions and comments, I really appreciate it!

Thread Thread
 
itskondrat profile image
Mykola Kondratiuk

those numbers are actually reassuring — 4GB/year for a few million tokens/day keeps the storage cost squarely in 'external drive' territory, not 'infra problem.' and yeah, photos always win. thanks for sharing the real data — that kind of ground truth is hard to find in these conversations.

Thread Thread
 
wynelson94 profile image
Nate Nelson

of course I try to be upfront because I can't make fixes or address issues if im not honest about capability. and if you notice it using more in you're own usage it would be good to know and there's ways to address it, I just haven't seen a need to add a layer that hasn't been a bottleneck.

Thread Thread
 
itskondrat profile image
Mykola Kondratiuk

that's the right call — adding a layer before the bottleneck materializes just creates maintenance overhead with no payoff. the transparency piece is underrated too; most storage tools market the p99 happy path and leave you to discover the failure modes yourself.

Collapse
 
peacebinflow profile image
PEACEBINFLOW

The insight that Claude Code already writes forensic-level session logs and the problem is just that it rotates them off disk—that's the part that reframes the whole thing. It's not a memory generation problem. It's a memory retention problem. The data already exists. It's just on a self-destruct timer because nobody thought to keep it.

What I find myself thinking about is how this pattern reveals a weird assumption baked into the whole AI memory discussion: that memory is the model's job. The industry keeps trying to make models remember more, when the actual solution might be making them forget less—or rather, offloading the remembering to a system that's actually designed for it. SQLite has been doing reliable, lossless recall for a quarter century. An LLM has been doing it for about five minutes. We somehow decided the five-minute-old thing should be in charge.

The cross-model portability angle is bigger than it looks at first glance. Session history stored as structured, typed events in SQLite doesn't care which model generated them. That means your memory isn't just portable across model versions—it's portable across entire model providers. Switch from Claude to something else next year? The database still works. The queries still run. That's the kind of future-proofing that's impossible when your memory is baked into a proprietary context window.

The question that sits with me: if this pattern works for Claude Code sessions, what other AI tools are quietly writing rich logs to disk that we're just not capturing? Cursor must have something. Windsurf. Aider. The logs exist. We're just letting them evaporate.

Collapse
 
wynelson94 profile image
Nate Nelson

im currently working on getting my codex to log into the same one im beta testing it now still some bugs but it allows me to switch from codex to claude in the same session from one response and then starting in a different ai with the same context where I left off. hit a token limit in Claude Code just jump to codex with the full map. Like you said sqlite has been around for awhile why not keep those logs and store them. It seemed obvious to me in the moment. If you run longhand id love feedback on it, only gets better by knowing what needs done.

Collapse
 
aibughunter profile image
AI Bug Slayer 🐞

Lossless memory storage is a clever approach — summaries inevitably lose context. Keeping raw conversation graphs makes retrieval much more reliable for long-running AI sessions.

Collapse
 
wynelson94 profile image
Nate Nelson

summarization tools have their place for sure but I like having the full context for certain tasks and issues. its especially nice starting a new session and I can build context much more efficiently on a project.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.