I wrote a session distiller for Claude Code that cuts 70MB sessions down to 7MB. That post covered the what and how. This one covers the why, with data.
Before I wrote a single line of code, I needed to answer three questions:
- What's actually inside a 70MB session?
- What's safe to throw away?
- How do I prove I'm not losing anything that matters?
Dissecting a 70MB Session
I opened a real 70MB session JSONL and categorized every byte. Here's the breakdown:
JSON envelope (sessionId, cwd, version, gitBranch): ~54%
Tool results (Read, Bash, Edit, Write, Agent): ~25%
Base64 images (screenshots, UI captures): ~12%
Thinking blocks (internal reasoning): ~4%
Actual conversation text: ~3%
Progress lines, file-history-snapshots: ~2%
That first line is the surprise. Every single JSONL line repeats the same envelope fields: sessionId, userType, cwd, version, gitBranch, entrypoint. Same values, every line. On a 70MB file with thousands of lines, that's 38MB of identical JSON keys and values repeated over and over.
The actual conversation, the words you and Claude exchanged, is 3% of the file. Everything else is either redundant metadata or tool output that served its purpose hours ago.
Why Tool Results Are Safe to Strip
Not all tool results are equal. Each tool type has different re-obtainability:
| Tool | Safe to strip? | Why |
|---|---|---|
| Read | Yes | The file is still on disk. Claude can Read it again in 50ms. Keeping 28MB of file contents that haven't changed is pure waste. |
| Bash | Mostly | Build outputs, test runs, git log results. Stale the moment they were captured. Keep head 5 + tail 5 lines: the command itself and whether it succeeded or failed. |
| Edit | Partially | The file path and what changed matter. But we don't need the full file content. Keep old_string and new_string previews (200 chars each), enough to remember the intent. |
| Write | Partially | Same idea as Edit. Keep file path and a head/tail preview. |
| Agent | Keep more | Research reports and analysis from subagents contain synthesized knowledge. Up to 2000 chars preserved. |
| Screenshots | Yes | Base64 images from hours ago showing yesterday's UI state. Claude can't even display them after a session grows past certain limits. |
There's a research finding that backs this up. A JetBrains NeurIPS 2025 study tested two approaches to handling tool outputs in coding agents: observation masking (replacing results with placeholders) vs LLM summarization. Task performance was identical. The model doesn't need the raw output once it's already processed it and responded. The response is the knowledge.
Or as one researcher put it: "When Claude reads 847 lines and responds 'this uses JWT with refresh tokens in httpOnly cookies,' that sentence is the knowledge. The 847 lines were consumed to produce it."
The Senior Engineer Audit
Before shipping, I had an AI agent role-play as a senior engineer and audit the entire distiller design. The audit found real bugs and real design problems, categorized by severity.
P0 — Breaks resume:
Dangling tool_results. 164 MCP tool messages (chrome-devtools, TodoWrite) were being dropped, but their corresponding
tool_resultblocks in user messages weren't. Resume would show orphan results with no matching tool call. Claude gets confused.MCP tool inputs missing. Fields like
selector,text,keyfrom chrome-devtools weren't in the generic fallback handler. Click and type actions became empty hints, then got dropped entirely.
P1 — Silent data corruption:
Parallel tool call mismatch. The distiller tracked a global
lastToolNamevariable. When Claude fires 3 parallel tool calls in one message, only the last tool name gets remembered. All 3 results get the wrong distillation rule applied. Fix: usetool_use_idmatching instead of position tracking.93% reduction too aggressive. The initial approach stripped too much content to hit a target number. The audit revealed that redundant JSON envelope fields account for 54% of file size. Strip the envelope instead of stripping content, and you hit 60-80% reduction while preserving more meaningful data.
P2 — Quality issues:
- Agent result limit too low (600 chars → raised to 2000)
- Bash results only kept tail (headers disappear) → changed to head 5 + tail 5
- Thinking blocks all dropped → now preserve first 200 chars of decision rationale
The P1 #3 bug (parallel tool calls) was the scariest. It would have silently applied Read-stripping rules to Bash results and vice versa. No error, no crash, just wrong data in the output. The kind of bug you don't notice until someone resumes a distilled session and Claude has amnesia about half the commands that were run.
Proving Information Retention
I ran the distiller against a real session and then had two independent agents reconstruct what happened in the session from (a) the original and (b) the distilled version. Then I compared:
| Content | Original | Distilled | Missed? |
|---|---|---|---|
| Project objectives and goals | Yes | Yes | — |
| Market research findings | Yes | Yes | — |
| Code architecture decisions | Yes | Yes | — |
| File changes (which files, what changed) | Yes | Yes | — |
| Feature modes and configurations | Yes | Yes (via tooltip/cinematic/plain) | — |
| FFmpeg multi-threading bug details | Yes (detailed) | No | Distilled missed |
| Zoom merge logic bug specifics | Yes (detailed) | No | Distilled missed |
| Event timing bug (end < start) | Yes (detailed) | No | Distilled missed |
| New file creation (tooltip.js, bubble.js) | No | Yes | Original missed |
| Git push SSH configuration | No | Yes | Original missed |
| Demo GIF file sizes | No | Yes (sizes noted) | Original missed |
| API tool switch (Tavily → Exa) | No | Yes | Original missed |
Overall information retention: 85-90%. The distilled version actually caught some things the original-based reconstruction missed, because the distiller preserves conversation text verbatim while the original is so large that even an AI agent skims past details buried in thousands of lines of tool output.
The 10-15% loss is debugging process details, specific error messages buried deep in Bash outputs. These are things you'd need the backup for, which is why the distiller always creates one.
Why Not LLM Summarization?
Claude Code itself uses LLM summarization for its auto-compact feature. There's a good reason for that in their context, and a good reason I chose differently.
Why CC uses summarization:
- Prompt cache economics. Summary goes at the start of the conversation as a stable prefix. Every subsequent API call gets a cache hit on that prefix. Extractive filtering changes the content each time, invalidating the cache. The token savings from distillation get eaten by cache misses.
- API constraints. Claude's API requires strict alternating user/assistant turns. Extractive filtering can easily break that alternation if you're not careful about which messages you keep.
Why I chose extractive:
- Zero cost. No API calls, no tokens burned. The distiller runs in under 2 seconds on a 70MB file using pure JSON parsing.
- Zero hallucination risk. LLM summarization can lose details or introduce inaccuracies. Extractive filtering keeps original words or drops them. Nothing in between.
- Different use case. CC auto-compact runs during live sessions where cache hits matter. Distillation runs offline, after the session, where cache economics don't apply.
The trade-off is real. Summarization gives you better compression. Extractive gives you perfect fidelity on what it keeps. For a tool whose job is "make old sessions resumable without losing context," I'll take fidelity.
Four Versions to Get It Right
The distiller went through four iterations. Here's how each one improved:
| Feature | V1 | V2 | V3 | V4 |
|---|---|---|---|---|
| Output size | 7.2M (90%) | 5.3M (93%) | 7.2M (90%) | 7.1M (90%) |
| Edit handling | Path only | old/new 200ch | old/new 200ch | old/new 200ch |
| Thinking blocks | All dropped | All dropped | First 200ch | First 200ch |
| MCP tool messages | Partial regex | Partial drop | All preserved | All preserved |
| Bash results | Head 300ch | Tail 8 lines | Head 5 + tail 5 | Head 5 + tail 5 |
| Agent results | 300 chars | 600 chars | 2000 chars | 2000 chars |
| Parallel tool safety | Bug | Bug | tool_use_id match | tool_use_id match |
| Envelope stripping | None | None | Strip redundant | Strip redundant |
| Backup + index | — | — | — | Yes |
V2 achieved 93% reduction but was too aggressive. V3 fixed the critical bugs (parallel tools, MCP inputs) and backed off on compression. V4 added the backup/index system so nothing is ever truly lost.
The counterintuitive lesson: more compression isn't better. V2's 93% broke things. V3/V4's 90% preserves everything that matters. The extra 3% bought correctness.
The Envelope Insight
The single biggest optimization wasn't about tool results at all. It was about the JSON envelope.
Every JSONL line in a Claude Code session looks like this:
{"sessionId":"abc-123","userType":"external","version":4,"cwd":"/home/user/project","gitBranch":"main","entrypoint":"cli","message":{...}}
That's roughly 150-200 bytes of identical metadata per line. On a session with 15,000 lines, that's 2-3MB of pure repetition. The distiller strips these after the first occurrence: if the sessionId and cwd haven't changed (they never do), subsequent lines only contain the message field.
This one change accounts for a significant chunk of the reduction without touching any content at all. No information lost. Just deduplication.
Real Numbers Across Sessions
| Session | Original | Distilled | Reduction |
|---|---|---|---|
| 4-hour coding session (Pagecast) | 70MB | 7.1MB | 90% |
| 2-hour refactor session | 17MB | 353KB | 98% |
| Short debugging session | 465KB | 321KB | 31% |
The pattern: longer sessions with more tool calls benefit more. A short session that's mostly conversation barely shrinks. A marathon session with hundreds of file reads and screenshots drops 90-98%.
CC Auto-Compact Destroys the Original
One detail that forced the backup design: Claude Code's auto-compact is an in-place rewrite. When compact triggers, the original JSONL gets a compact_boundary marker inserted. Old conversation turns are replaced with a summary. Original tool results and thinking blocks disappear. Same file, overwritten.
If you distill a session and later CC auto-compacts the original, the distilled version's backup is the only complete copy that exists. That's not duplication. That's preservation.
70MB in 2026 is nothing. 50 sessions backed up is 3.5GB. One external drive holds thousands of sessions. The cost of keeping everything is negligible. The cost of losing something you needed is not.
Try It
npx @mcpware/claude-code-organizer --distill <session.jsonl>
mcpware
/
claude-code-organizer
Dashboard to manage Claude Code memories, configs, and MCP servers — security scanner for tool poisoning, context token budget tracker, duplicate cleanup, scope management. npx @mcpware/claude-code-organizer
Claude Code Organizer
AI agents: read AI_INDEX.md first. It is the navigation manifest for this codebase — where to find every module, how they connect, and where to look before making any claim about the code.
English | 简体中文 | 繁體中文 | 廣東話 | 日本語 | 한국어 | Español | Bahasa Indonesia | Italiano | Português | Türkçe | Tiếng Việt | ไทย
Claude Code Organizer (CCO) is a free, open-source dashboard that lets you manage all Claude Code configuration — memories, skills, MCP servers, settings, agents, rules, and hooks — across global and project scopes. It includes a security scanner for MCP tool poisoning and prompt injection, a per-item context token budget tracker, per-project MCP enable/disable controls, and bulk cleanup for duplicate configs. All without leaving the window.
v0.17.0 — Session Distiller strips bloated sessions down to ~10% of their original size while keeping every word of conversation intact…
The distiller ships with CCO v0.17.0. Dashboard button, CLI flag, API endpoint. Backup is automatic, index is generated, and the original is never modified until you explicitly choose to replace it.
About Me
CS dropout. Building tools for the Claude Code ecosystem. github.com/ithiria894
⭐ Star the repo if you've ever lost context in a bloated session.
Top comments (0)