DEV Community: Nate Nelson

My memory tool said "no session history." The session had 2,526 lines.

Nate Nelson — Thu, 23 Apr 2026 06:13:02 +0000

Source of truth for this post is the repo: github.com/Wynelson94/longhand/blob/main/docs/devto-dogfood-post.md. Edits go through git.

Yesterday I asked Claude Code to pull up where we'd left off on a project I'd been working on a few hours earlier. It's a project called bsoi-mesh-kit — a local STL validator I'm building for a service bureau. The recall tool I built, Longhand, is supposed to handle exactly this question.

The response came back:

recall_project_status("bsoi-mesh-kit") → "No session history found for this project."

Except: there were four JSONL transcripts on disk for that project, including a 2,526-line work session from earlier that day where I'd shipped three version bumps, invited a collaborator, and patched a Pantheon Slicer config bug. The session was real. Longhand had captured none of it.

The rest of this post is the diagnosis and the two releases that came out of it. It's written as a self-contained case study in building a tool that can catch itself in a lie.

What Longhand is, in one paragraph

Longhand is a Python CLI + MCP server that reads Claude Code's session transcripts (~/.claude/projects/**/*.jsonl), indexes every tool call / file edit / thinking block into SQLite + ChromaDB, and exposes semantic recall via MCP tools. Zero API calls. Local-only. The pitch in one line: the model doesn't need to carry the memory — the disk does. The longer pitch is here. Installed on PyPI: pip install longhand.

Step 1: confirm the failure is real

First thing I checked was the raw file system:

$ ls ~/.claude/projects/-Users-natenelson/ | grep -E "823dd358|002f6297|e6a3b13f"
002f6297-129e-4d09-b112-c48bd777e3ba.jsonl
823dd358-f32f-4d73-a481-38a05b378966.jsonl
e6a3b13f-3912-4ee3-b9aa-fa4fc509cb29.jsonl

$ wc -l ~/.claude/projects/-Users-natenelson/823dd358*.jsonl
    2526 /Users/natenelson/.claude/projects/-Users-natenelson/823dd358-f32f-4d73-a481-38a05b378966.jsonl

2,526 lines on disk. Now what does SQLite have?

$ sqlite3 ~/.longhand/longhand.db "
    SELECT session_id, project_path, project_id
    FROM sessions
    WHERE transcript_path LIKE '%823dd358%'
       OR transcript_path LIKE '%002f6297%'
       OR transcript_path LIKE '%e6a3b13f%';"

e6a3b13f-3912-4ee3-b9aa-fa4fc509cb29 | /Users/natenelson |
002f6297-129e-4d09-b112-c48bd777e3ba | /Users/natenelson |

Two things jumped out:

The big session (823dd358) isn't in the sessions table at all. Never ingested.
The two shorter sessions are ingested but have project_id = NULL and a project_path of /Users/natenelson — my home directory, not the project.

Two distinct failure modes in one dataset. Time to understand each.

Root cause A: SessionEnd hook didn't fire on the big session

Longhand ingests new sessions via a Claude Code SessionEnd hook that runs longhand ingest-session. The hook was installed and pointed to the right binary. But 823dd358 — the most important session of the day — never got captured by it.

I don't know exactly why the hook didn't fire (Claude Code's exit paths are varied, and a few of them skip SessionEnd). What I know is there was no retry, no log, no detection mechanism. If a hook silently fails, the only way to notice is to manually query something that should have been there and find it missing.

That's the dogfood failure in one sentence: the tool that was supposed to give me observability into my past work silently lost an entire work session, and I only noticed because I happened to ask about that specific session the next day.

Root cause B: project inference was using the first-event cwd

For the two sessions that did get ingested, the project_id was NULL because project_path was /Users/natenelson. Why?

Claude Code launched from my home directory. So the transcript's first event had cwd=/Users/natenelson. Later events — after I cd'd into the project — had cwd=/Users/natenelson/Projects/bsoi-mesh-kit. But Longhand's ingest pipeline only looked at the first event.

A quick scan of the big session confirmed the multi-cwd pattern:

cwds = set()
for line in open('823dd358-....jsonl'):
    obj = json.loads(line)
    if c := obj.get('cwd'): cwds.add(c)

# => {
#   '/Users/natenelson',
#   '/Users/natenelson/Projects/bsoi-mesh-kit',
#   '/Users/natenelson/Projects/bsoi-ops',
# }

Any session where I cd between repos mid-session got misattributed. And since recall_project_status filters WHERE project_id = ?, NULL-project rows are invisible to it.

The v0.6.0 fix

Four changes shipped together:

1. Mode-of-cwd project inference. Tally every event's cwd, filter out $HOME and any path that doesn't walk up to a project marker (.git, pyproject.toml, package.json, …), pick the mode. Multi-project sessions get attributed to the repo where most of the work happened.

def _pick_best_project_cwd(events):
    home_resolved = Path.home().resolve()
    counts = Counter()
    resolved_cache = {}
    for e in events:
        cwd = e.cwd
        if not cwd or cwd in resolved_cache:
            if cwd in resolved_cache and resolved_cache[cwd]:
                counts[resolved_cache[cwd]] += 1
            continue
        p = Path(cwd).resolve()
        if p == home_resolved:
            resolved_cache[cwd] = None; continue
        root = find_project_root_strict(p)  # returns None if no marker
        resolved_cache[cwd] = str(root) if root else None
        if root: counts[str(root)] += 1
    return counts.most_common(1)[0][0] if counts else None

2. A new longhand reconcile [--fix] command. Walks ~/.claude/projects/*/*.jsonl, diffs against the sessions table, buckets into:

Fully indexed
Ingested but project_id IS NULL
Missing from sessions entirely

With --fix it re-ingests the problem buckets. Idempotent (upsert + size-check skip). This is the safety net that was missing.

3. A stale flag on recall_project_status. So the next time a caller queries a project with un-ingested transcripts, they see stale: true and a reason string pointing at reconcile --fix — not silence.

4. Fixed a pre-existing bug in discover_sessions. It was rglob-ing all JSONLs under ~/.claude/projects, including subagent transcripts (in */subagents/ subdirs) and pytest temp dirs. On my machine this was inflating "missing" counts from 28 → 650. The fix is three lines and one regret about not catching it sooner.

Then I ran longhand reconcile --fix against my own live DB. 33 sessions re-ingested, 0 errors. The 2,526-line 823dd358 session got correctly attributed to bsoi-mesh-kit. recall_project_status started returning real narrative. 182 tests passing. Tagged v0.6.0, pushed — PyPI Trusted Publishing does the release:

git push --follow-tags origin main
# ... 45 seconds later ...
pip install longhand==0.6.0  # live

Step 2: audit the fix

I then asked Claude — in the same session — to give me a "full audit full honesty" of what I'd just shipped. This is the part that matters.

Claude wrote back a multi-page critique. Some of it was flattering (release pipeline, test discipline). Some of it was not:

"The narrative generator leaks garbage into authoritative-looking output. Look at what recall_project_status("bsoi-mesh-kit") returned after I fixed everything:
Outcome: **fixed** · can you pull my bsoi-ops from my git and review the whole program

Recent commits (10)
- cc5f72f no message (today)
- `` no message (today)    ← blank commit hash
- `` no message (today)
... (8 more blanks)
The 'fix summary' is pulling a raw user question. The commit list has nine empty entries. Agents will read this as ground truth."

And another:

"Drift detection is 2.3 seconds per recall_project_status. On every call, we scan all 59 JSONLs looking for cwd matches. That's going to bite at 500+ sessions."

Four classes of issue came out of that audit. Four more fixes — all traceable to the audit's specific findings — shipped as v0.7.0 within the same session:

Narrative cleanup. Commits with empty hashes now get dropped at the extractor (no row written), in SQL (filter), AND in the narrative (render-time guard). The "last fix" trailer now sources from the most-recent episode's fix_summary instead of the outcome classifier's buggy summary field.
longhand doctor grew a "Recent ingest (7d)" row that counts on-disk JSONLs in the last week vs sessions-table rows and emits a red ✗ with reconcile --fix hint when ratio < 0.5. Catches the next silent-hook-failure the moment the user runs doctor.
A filesystem-backed drift cache. _detect_project_drift now reads (transcript_path, mtime) → set[canonical_paths] from ~/.longhand/cache/jsonl_project_map.json, keyed on mtime so file edits invalidate automatically. Warm recall_project_status on my live DB dropped from 2,333ms → 68ms — 34×.
search auto-scopes when the query names a project. If the query hits a known project at fuzzy-match score ≥0.8 and the caller didn't pass a project filter, the search is pre-scoped to that project's events. The response wraps in {auto_scoped_to, auto_scope_hint, hits} so agents can tell the filter applied (and override it if wrong).

git push --follow-tags → PyPI → v0.7.0 live. 197 tests passing. 45 seconds.

The meta point

Two meaningful releases in one session. Both were driven by a failure the tool itself surfaced. Both were audited by the tool itself after shipping. The tool is its own test harness.

This is the shape I didn't expect when I started. I thought I was building a memory tool — something that stores and retrieves past work. What I actually ended up with is a memory tool that can audit its own memory. When it fails, it fails loudly enough (or I can make it fail loudly enough, on demand) that the failure itself becomes a seed for the next fix.

The industry pitch is "bigger context windows will solve memory." I keep arguing the inverse: the disk already has the memory; you just need a tool that reads it honestly. The last two days have been me testing "reads it honestly" against its own bugs. The tool passed — but only because I forced it to audit itself.

What's still broken

Since this is a dev.to post and not marketing copy, here's the list of things v0.7.0 doesn't fix. These will probably be v0.8.0:

fix_summary still looks rough upstream. The narrative now pulls from episode.fix_summary correctly, but that field itself contains raw thinking-block text with "Intent:" prefixes and mid-code truncations. Fix is ~20 lines in the episode extractor.
Hook is still a single point of failure. doctor now flags silent failures, but only when the user thinks to run doctor. A recall-first user never sees it. Should be inlined into recall and recall_project_status.
Multi-project sessions are winner-takes-all. A session that spent 51% in project A and 49% in project B attributes only to A. Many-to-many attribution is the right shape; it's not built yet.
Auto-scope threshold is a magic 0.8. Not calibrated across ambiguous queries yet.
22 CLI commands + 16 MCP tools is too many. Needs a v1.0 prep pass.

Try it

pip install longhand==0.7.0
longhand setup         # ingest existing Claude Code history + install hook + register MCP
longhand recall "that bug I fixed last week"

If you're already on an older version:

pip install --upgrade longhand
longhand reconcile --fix   # replay historical sessions with corrected attribution

The source is at github.com/Wynelson94/longhand (MIT). Issues and discussions welcome. If you install it and find a silent failure of your own, please file it — that's the feedback loop that made these two releases happen.

If you built a tool that stores AI session history, how would you test that it's not lying to you? That's the problem Longhand is trying to solve. v0.7.0 is the third time it caught itself; it probably won't be the last.

Why I built a lossless alternative to AI memory summarization

Nate Nelson — Sat, 18 Apr 2026 00:10:40 +0000

Why I built a lossless alternative to AI memory summarization

Every AI memory tool I tried summarized my sessions before giving them back to me.

I'd spend an hour debugging a gnarly webhook bug with Claude Code. A week later I'd come back, ask about it, and get a three-sentence LLM summary. The actual fix? Gone. The reasoning trace? Gone. The five wrong attempts before the right one? Summarized into "you worked on webhook authentication."

Summarization is a lossy decision disguised as a convenience. An LLM decides what's worth remembering, and I never get to see what it threw away.

I built Longhand because I didn't want that tradeoff anymore.

The industry is racing in the wrong direction

The mainstream answer to AI memory is "make the context window bigger." 1M tokens. 2M tokens. Context-infinite. Every model lab is pushing the same axis: make the model carry more state.

This is the wrong abstraction. The model doesn't need to carry the memory. The disk does.

Storage is a solved problem. SQLite shipped in 2000. ChromaDB shipped two years ago. Both run on a laptop. The "AI memory crisis" is artificial — an industry-wide assumption that memory must live where inference happens, even though it makes the whole system more expensive, less private, and more vendor-locked.

The state of the world, unfiltered

Here's what most people don't realize: Claude Code already writes rich logs of every session. Every tool call. Every file edit. Every thinking block. All of it, verbatim, to JSONL files in ~/.claude/projects/.

Those files contain a forensic-level record of your entire collaboration with the model. Nothing is lossy. Nothing is summarized. It's just sitting there on your disk, right now, for every session you've ever had.

The problem is two-fold.

First, Claude Code rotates those files off disk after a few weeks. If you don't capture them, they're gone.

Second, every memory tool that tries to "use" them does so by summarizing — asking another LLM to compress the session into a paragraph before handing it back. Which is the lossy move I was trying to avoid in the first place.

The architecture

Longhand takes the opposite path. It reads the JSONL files verbatim and indexes them into two local stores:

SQLite for structured events — every tool call, edit, commit, thinking block as a typed row with a timestamp and session ID
ChromaDB for semantic search — vector embeddings of episode summaries and conversation segments

Auto-ingestion runs via a SessionEnd hook that Claude Code fires after every session. Once-off backfill ingests your existing history on install. The data persists forever after that — even after Claude Code rotates the source JSONL off disk, Longhand has its own copy.

Recall is exposed as an MCP server. Claude Code itself gets 17 tools:

recall — fuzzy natural-language query ("that stripe webhook fix from last week")
search_in_context — find text across sessions, with surrounding conversation
get_session_timeline — chronological replay of a session
replay_file — reconstruct the exact state of a file at any point in any session
find_commits, get_file_history, recall_project_status, and 10 more

When you ask Claude "do you remember when we fixed X?" it doesn't hallucinate from the last 10K tokens of context. It queries its own history on disk and returns the actual event.

The numbers

After testing against 107 real Claude Code sessions (53,668 events, 665 git operations, 376 problem→fix episodes, 299 conversation segments across 37 projects):

Semantic recall across 100+ sessions: ~126ms
Storage footprint: ~1GB for a heavy power user, 200–400MB typical
API calls per query: zero
Summarization per query: zero
Network requests: zero
Works offline: yes

170 unit tests. Security-audited, zero critical findings. Published on PyPI as longhand. Registered in the official MCP Registry.

What this unlocks

The interesting part isn't the speed. It's what becomes possible once memory lives on your disk instead of in a vendor's context window.

Cross-model portability. Your history isn't locked to any model version. When Claude Opus 5 ships tomorrow, the same Longhand database works unchanged. Switch to a different model entirely? The data is yours.

Privacy by default. Nothing leaves your machine. For regulated workflows, client work under NDA, or anyone who just doesn't want their session history flowing through someone else's servers, this is the only architecture that actually fits.

Forensic replay. Not just "what did we discuss" but "what was the exact state of auth.ts on line 42 at 3:17pm last Tuesday?" — answerable deterministically, because every edit is in the record.

Offline work. Airplane, remote location, air-gapped environment. Your memory works. Because it's a SQLite file.

What Longhand doesn't try to do

It's not a general-purpose AI memory system. It's specific to Claude Code's JSONL format.

It won't help you with ChatGPT, Cursor, or any other client that doesn't write per-session logs to disk. (Though the architectural pattern — verbatim capture, local indexing, semantic recall — generalizes cleanly to anything that produces a rich session log.)

It's also not trying to replace the context window. The window is still useful for the current conversation. Longhand handles the rest — the 107 sessions that came before.

Install

pip install longhand
longhand setup

The setup command backfills your existing Claude Code history, installs the auto-ingest hook, and registers as an MCP server. Takes about two minutes on a laptop with a year of sessions. Safe to re-run.

Then try it:

longhand recall "that webhook fix from last week"

Why I'm sharing this

The memory crisis in AI was an artificial constraint — a default that everyone inherited without questioning. I wanted to see what fell out if you rejected the constraint entirely and asked: what if the disk carries the memory, and the model just queries it?

What fell out is Longhand. 336 unique developers have cloned it in the last 14 days. 733 PyPI installs in the same window. 193 weekly visitors on PulseMCP. The curve is bending up, not flattening.

If that resonates, the repo is here: https://github.com/Wynelson94/longhand

MIT licensed. Python 3.10+. 170 tests. Zero API calls. Yours.