anton

Posted on Jan 17 • Edited on Jan 20

What Actually Happens When You Run `/compact` in Claude Code

#claudecode #anthropic #llm #ai

TL;DR

Claude Code stores conversations locally in ~/.claude/projects/ as JSONL files. Every message you send hits Anthropic's servers with the full conversation history attached. Running /compact creates a summary checkpoint and API payloads shrink by ~85%, but you lose granular context. I ran an experiment to see exactly how this works.

Prerequisites

Let’s start with installing some of the tools which we will use to deep dive in

brew install mitmproxy
brew install jq
pip install tiktoken

mitmproxy is a free, open source HTTPS proxy that lets you intercept and inspect HTTPS traffic.

I ran mitmproxy once to generate SSL certificates:

mitmproxy
# Press q to quit after it starts

Certs get created at ~/.mitmproxy/

Setting up the experiment

I created a fresh directory:

mkdir ~/compact-experiment && cd ~/compact-experiment

Started mitmproxy in one terminal:

mitmproxy

Once running, pressed f and set filter to api.anthropic.com so I only see requests to Anthropic's servers.

In another terminal, configured Claude Code to route through the proxy:

export HTTPS_PROXY=http://127.0.0.1:8080
export HTTP_PROXY=http://127.0.0.1:8080
export NODE_EXTRA_CA_CERTS=~/.mitmproxy/mitmproxy-ca-cert.pem

cd ~/compact-experiment
claude

Building conversation history

I ran 10 prompts to build up a realistic conversation:

1. Create a Python FastAPI server with a health endpoint at /health
2. Add a POST /users endpoint that accepts name and email with pydantic validation
3. Add rate limiting using slowapi with max 100 requests per 15 minutes per IP
4. Write pytest unit tests for the rate limiter
5. Add centralized exception handling
6. Add request logging middleware
7. Create a /users GET endpoint with pagination
8. Add input sanitization using bleach
9. Add a config module using pydantic settings
10. Add graceful shutdown handling

At this point, mitmproxy showed each request going to api.anthropic.com/v1/messages with increasingly large payloads.

Capturing pre compaction state

Found my JSONL file:

ls ~/.claude/projects/

Copied it:

cp ~/.claude/projects/<project-folder>/*.jsonl ~/compact-experiment/pre-compact.jsonl

Checked the size:

$ wc -l pre-compact.jsonl
24 pre-compact.jsonl

$ ls -lh pre-compact.jsonl
-rw-r--r--  1 user  staff  38K pre-compact.jsonl

Looked at the structure:

$ head -1 pre-compact.jsonl | jq .

User message entry:

{
  "uuid":"msg_01XK7Qv...",
  "type":"human",
  "message":{
    "role":"user",
    "content":"Add rate limiting using slowapi with max 100 requests per 15 minutes per IP"
  },
  "timestamp":"2025-01-17T10:23:45.123Z",
  "sessionId":"sess_abc123..."
}

Assistant message entry:

{
  "uuid":"msg_02YL8Rw...",
  "type":"assistant",
  "message":{
    "role":"assistant",
    "content":"I'll add rate limiting using slowapi..."
  },
  "timestamp":"2025-01-17T10:23:52.456Z",
  "sessionId":"sess_abc123..."
}

Capturing API request payload

In mitmproxy, found the latest POST request, pressed enter to view, then e → request body → saved to ~/compact-experiment/pre-compact-request.json

Counting tokens

tiktoken is OpenAI's tokenizer library. Anthropic uses a similar tokenization scheme, so it gives a good approximation.

count_tokens.py:

import json
import os
import sys
import tiktoken

enc = tiktoken.get_encoding("cl100k_base")
filepath = sys.argv[1]

with open(filepath) as f:
    messages = json.load(f).get("messages", [])

tokens = sum(len(enc.encode(m.get("content", ""))) for m in messages)

print(f"Messages: {len(messages)}")
print(f"Tokens: {tokens}")
print(f"Bytes: {os.path.getsize(filepath)}")

$ python count_tokens.py pre-compact-request.json
Messages: 21
Tokens: 14280
Bytes: 41847

Running compaction

In Claude Code:

/compact

Waited a few seconds for confirmation.

Capturing post compaction state

Copied the updated JSONL:

cp ~/.claude/projects/<project-folder>/*.jsonl ~/compact-experiment/post-compact.jsonl

Compared:

$ wc -l pre-compact.jsonl post-compact.jsonl
  24 pre-compact.jsonl
  27 post-compact.jsonl

$ ls -lh pre-compact.jsonl post-compact.jsonl
-rw-r--r--  1 user  staff  38K pre-compact.jsonl
-rw-r--r--  1 user  staff  41K post-compact.jsonl

Post compact file is larger. Compaction doesn't delete history, it appends a compact boundary document.

The compact boundary

$ grep -i "compact_boundary" post-compact.jsonl | jq .

{
  "parentUuid":null,
  "logicalParentUuid":"24aae814-9669-4858-aefc-6c12cb1b023f",
  "isSidechain":false,
  "userType":"external",
  "cwd":"/Users/<username>/srccode/compact-conv-experiment",
  "sessionId":"596539c1-3d3d-46c4-af79-346807029a3d",
  "version":"2.1.11",
  "gitBranch":"",
  "slug":"idempotent-herding-marshmallow",
  "type":"system",
  "subtype":"compact_boundary",
  "content":"Conversation compacted",
  "isMeta":false,
  "timestamp":"2026-01-17T16:47:51.316Z",
  "uuid":"d79a9fff-b6f9-4877-99d0-1c88267f26b6",
  "level":"info",
  "compactMetadata":{
    "trigger":"manual",
    "preTokens":37418
  }
}

This is the compact boundary. A few things stand out:

type is system and subtype is compact_boundary
compactMetadata.trigger shows it was manual (via /compact command)
compactMetadata.preTokens captures token count before compaction (37418 in my case)
logicalParentUuid links to the last message before compaction

Future API requests use the summary generated at this boundary instead of the full history.

Capturing post compaction request

Sent one more message:

Add a /metrics endpoint that returns request count and average response time

Saved the new request payload from mitmproxy to post-compact-request.json

$ python count_tokens.py post-compact-request.json
Messages: 3
Tokens: 1820
Bytes: 6234

Comparing payloads

compare.py:

import json
import os
import tiktoken

enc = tiktoken.get_encoding("cl100k_base")

def analyze(path):
    with open(path) as f:
        msgs = json.load(f).get("messages", [])
    tokens = sum(len(enc.encode(m.get("content", ""))) for m in msgs)
    return len(msgs), tokens, os.path.getsize(path)

pre = analyze("pre-compact-request.json")
post = analyze("post-compact-request.json")

print(f"{'Metric':<20} {'Pre':>10} {'Post':>10} {'Reduction':>12}")
print("=" * 55)
print(f"{'Messages':<20} {pre[0]:>10} {post[0]:>10} {pre[0]-post[0]:>12}")
print(f"{'Tokens':<20} {pre[1]:>10} {post[1]:>10} {(pre[1]-post[1])/pre[1]*100:>11.1f}%")
print(f"{'Bytes':<20} {pre[2]:>10} {post[2]:>10} {(pre[2]-post[2])/pre[2]*100:>11.1f}%")

$ python compare.py
Metric                      Pre       Post    Reduction
=======================================================
Messages                     21          3           18
Tokens                    14280       1820        87.3%
Bytes                     41847       6234        85.1%

Results summary

JSONL file

Metric	Pre compact	Post compact
Lines	24	27
Size	38 KB	41 KB

Local history grows because compact adds a boundary document, it doesn't remove anything.

API request payload

Metric	Pre compact	Post compact	Reduction
Messages	21	3	86%
Tokens	14,280	1,820	87%
Bytes	41.8 KB	6.2 KB	85%

What's in the post compact payload

$ jq -r '.messages[] | .role' post-compact-request.json
user
assistant
user

System context now contains the summary. Only messages after the compact boundary are sent as individual entries.

Files generated

~/compact-experiment/
├── pre-compact.jsonl
├── post-compact.jsonl
├── pre-compact-request.json
├── post-compact-request.json
├── count_tokens.py
└── compare.py

Takeaways

Local JSONL keeps everything, compact doesn't delete history
Compact boundary is a system document with subtype: compact_boundary that marks the checkpoint
compactMetadata tracks trigger type and pre compaction token count
Post compact requests only send summary plus recent messages to Anthropic
~85% reduction in payload size
Summary is lossy, it captures what was built but loses implementation details

Use /compact when switching tasks or hitting context limits. Avoid it mid task when you need to reference earlier decisions.

Curious about how other coding assistants manage long conversation history.

Top comments (1)

anton • Jan 22

You might have a question - if claude uses summary in subsequent requests post compaction then why to keep prior conversation data? Here is the magic. Claude switches between conversation summary and exact conversation as and when required and all of this is possible because everything is persisted in local