DEV Community: Dibyanshu kumar

How I Made Claude Code Finish Tasks That Outlast Its Memory

Dibyanshu kumar — Thu, 02 Apr 2026 15:44:40 +0000

How I Made Claude Code Finish Tasks That Outlast Its Memory

You know the moment. You're 45 minutes into a big task — processing a batch of files, refactoring a module, running a multi-phase workflow. Claude has been crushing it. Then the responses get vague. It starts repeating itself. It forgets what it already did. Context window: full.

You start a new conversation. Re-explain everything. Hope it picks up where it left off. It doesn't — it redoes half the work, skips the other half, and misses the files that were tricky the first time around.

I got tired of being the relay mechanism between conversations. So I built one.

The Idea: Tag-Team Relay

The concept is stolen from wrestling. When a wrestler is gassed, they tag in a fresh partner. The fresh partner knows the game plan, knows what's been tried, and picks up where the last one left off.

/tag-team does this for Claude Code. A lightweight dispatcher spawns worker agents one at a time. Each worker makes as much progress as possible, then writes a structured handoff file before its context fills up. The next worker reads that file and continues.

/tag-team Extract JSON docs from all 200 files in /tmp/batch.txt

That's it. Three workers later, all 200 files are processed. Zero re-prompting.

What Actually Happens

The dispatcher — your main conversation — stays lean. It never does the real work. It just:

Spawns Worker 1 with the task
Worker 1 processes files until context gets high, writes a handoff file, returns
Dispatcher reads the result, spawns Worker 2 with a pointer to the handoff
Worker 2 reads the handoff, continues from exactly where Worker 1 stopped
Repeat until done or max iterations reached

The key insight: the Agent tool returns only a short summary to the dispatcher. So even after 10 iterations, the dispatcher's context has barely grown. It can keep spawning workers indefinitely.

The Part That Actually Matters: Handoff Files

The handoff file is the entire trick. Without it, this is just "start a new conversation and hope for the best." With it, each worker has perfect context about what happened before — without needing conversation history.

Here's what a real handoff looks like (abbreviated):

## Mission
- Goal: Extract JSON documentation from all 200 API endpoint files
- Status: 40% complete (80 of 200 files processed)
- Next step: Continue from file 81 (src/api/payments/refund.ts)

## Key Decisions
- Decisions made: Using the compact schema format, not the verbose one.
  Output goes to docs/api/{module}/{endpoint}.json
- Dead ends: Tried batch-reading 20 files at once — hit context limits
  fast. Switched to batches of 6 with interleaved writes.

## Progress
- Completed: Files 1-80 (auth/*, users/*, orders/*)
- In progress: None (clean handoff)
- Remaining: Files 81-200 (payments/*, inventory/*, shipping/*, admin/*)

## Resume Instructions
Read the file list from /tmp/batch.txt. Skip to line 81. Process files
in batches of 6: read 6 files, write their JSON docs, repeat. Output
path pattern: docs/api/{module}/{endpoint}.json. Use the compact schema
format — see docs/api/auth/login.json for reference.

Notice the Dead ends section. Worker 1 tried batch-reading 20 files and it blew up. Worker 2 doesn't repeat that mistake — it already knows to use batches of 6. This is the thing that separates tag-team from naive restarts. Each worker inherits not just the progress, but the lessons.

Context Warning Protocol

Workers don't just blindly run until they crash. The proxy monitors context usage and injects warnings at configurable thresholds:

70% — Finish your current file. Don't start new large operations. Start composing your handoff mentally.
80% — Stop immediately. Write the handoff file. Return.
90% — Emergency. Dump whatever state you have and get out.

There's also a safety net: if a worker makes 50+ tool calls without seeing a context warning, it hands off anyway. Belt and suspenders.

Wrapping Other Skills

The part I didn't expect to be useful — but turned out to be the most useful — is the --skill flag:

/tag-team PROJ-12345 --skill develop

This wraps my /develop skill (a 12-phase Jira-to-PR orchestrator) inside the tag-team relay. If /develop runs out of context mid-implementation, it doesn't die — it hands off to a fresh worker who picks up at the same phase.

Any skill that might outlive a single context window becomes automatically resilient. The skill doesn't need to know about tag-team. Tag-team doesn't need to know about the skill. They compose.

Resume and Status

Sessions survive crashes:

/tag-team resume     # Pick up from the latest handoff file
/tag-team status     # Show progress log and list all handoffs

Old sessions get archived automatically when you start a new one, so you never lose historical state.

The Numbers

For a real batch of 200 files:

Without tag-team: Manual restart 4-5 times, re-explaining context each time, ~30% of work redone across restarts
With tag-team: 3-4 workers, fully automatic, zero rework, progress file showing exactly what happened at each stage

The raw time is similar. The difference is I walked away and came back to a completed task instead of babysitting context windows.

When to Use It

Tag-team shines for:

Batch processing: Extracting, transforming, or generating files from a large input set
Long workflows: Multi-phase skills that might exhaust context
Refactoring at scale: Renaming, restructuring, or updating patterns across many files
Any task where you'd normally restart the conversation mid-way

It's overkill for tasks that fit in a single context window. If your task completes in one worker, the dispatcher overhead is wasted. The sweet spot is work that would take 2-10 context windows.

Try It

The skill is four files — a dispatcher (SKILL.md), worker instructions, cross-phase policies, and a config. Drop them into .claude/skills/tag-team/ and you have relay-capable Claude Code.

The full implementation is in my skills repo. The deep dive into the architecture — handoff format, config options, how the dispatcher loop works — is in the companion technical reference.

This is part of a series on scaling Claude Code for enterprise workflows. Previously: How I Stopped Losing Work to Context Window Overflow, How I Taught an AI Agent to Save Its Own Progress, and Centralized Skill Management for Claude Code.

Tag-Team Deep Dive: Architecture & Technical Reference

Dibyanshu kumar — Thu, 02 Apr 2026 15:44:14 +0000

Tag-Team Deep Dive: Architecture & Technical Reference

This is the companion technical reference for "How I Made Claude Code Finish Tasks That Outlast Its Memory." That post covers the problem and the concept — this one covers the full architecture, configuration, and implementation details.

Architecture Overview

Tag-team has three layers:

┌─────────────────────────────────────┐
│  Dispatcher (SKILL.md)              │  ← Your conversation. Stays lean.
│  - Parses arguments                 │
│  - Manages the relay loop           │
│  - Tracks progress                  │
├─────────────────────────────────────┤
│  Workers (spawned agents)           │  ← Disposable. One at a time.
│  - Do the actual work               │
│  - Monitor their own context        │
│  - Write handoff files on exit      │
├─────────────────────────────────────┤
│  Handoff Files (.claude/tag-team/)  │  ← Persistent state on disk.
│  - Structured markdown              │
│  - Self-contained resume context    │
│  - Append-only progress log         │
└─────────────────────────────────────┘

The dispatcher never reads large files or does real work. It spawns agents, reads their short return messages, and appends to a progress log. This is why it can run 10+ iterations without hitting its own context limit.

File Structure

After a 3-worker relay, your .claude/tag-team/ directory looks like:

.claude/tag-team/
├── progress.md          # Append-only log of all iterations
├── handoff-001.md       # Worker 1's state when it handed off
├── handoff-002.md       # Worker 2's state when it handed off
└── archive-20260401-143022/  # Previous session (if any)
    ├── progress.md
    └── handoff-001.md

Configuration

config.json controls relay behavior:

{
  "max_iterations": 10,
  "handoff_dir": ".claude/tag-team",
  "handoff_prefix": "handoff",
  "progress_file": "progress.md",
  "context_thresholds": {
    "prepare_handoff": 70,
    "force_handoff": 80,
    "emergency_stop": 90
  },
  "worker": {
    "tool_call_soft_limit": 50,
    "mode": "bypassPermissions"
  }
}

Field	What it does
`max_iterations`	Hard cap on relay workers. Prevents runaway loops.
`context_thresholds.prepare_handoff`	At this %, worker finishes current item and stops starting new work.
`context_thresholds.force_handoff`	At this %, worker stops immediately and writes handoff.
`context_thresholds.emergency_stop`	At this %, worker dumps partial state and exits.
`worker.tool_call_soft_limit`	If a worker makes this many tool calls without a context warning, it hands off proactively. Safety net for when the proxy monitoring is delayed.
`worker.mode`	Permission mode for worker agents. `bypassPermissions` avoids approval prompts mid-relay.

The Dispatcher Loop (Pseudocode)

parse arguments → (fresh | resume | status)

if status:
    read progress.md, list handoff files, display, stop

if resume:
    find highest-numbered handoff file
    set iteration = that number

if fresh:
    archive old session if exists
    create .claude/tag-team/
    write initial progress.md
    set iteration = 0

load worker-instructions.md

while iteration < max_iterations:
    build prompt:
        if iteration == 0: worker instructions + task description
        if iteration > 0:  worker instructions + "read handoff-{N}.md and continue"

    spawn agent(prompt, mode=bypassPermissions)

    parse result:
        "ALL_DONE: ..." → break
        "HANDOFF: ..." → increment iteration, continue
        else → check for handoff file, error if missing

    append to progress.md

display final report

Handoff File Format (Complete)

Every handoff file follows this structure. The sections are mandatory — workers are instructed to fill all of them.

## Mission
- Goal: <original task, copied verbatim>
- Status: <percentage done, items completed/total>
- Next step: <single most important thing for the next worker>

## Technical State
- Files modified: <full paths of every file changed>
- Files created: <full paths of every file created>
- Active entities: <repos, directories, configs, temp files in play>
- Working directory: <cwd if relevant>

## Key Decisions
- Decisions made: <choices that the next worker MUST preserve>
- Dead ends: <approaches that FAILED — saves the next worker from repeating>
- Constraints: <rules from the original task, copied exactly>

## Progress
- Completed: <numbered list with brief descriptions>
- In progress: <what was being worked on at handoff — include partial state>
- Remaining: <numbered list of what's left>

## Resume Instructions
<Written as a briefing for a new teammate who has never seen this task.
Explicit and actionable. Includes file paths, exact commands, specific
next steps. The next worker reads ONLY this section — it doesn't have
the conversation history.>

Why Each Section Exists

Mission: Prevents goal drift across workers. The task description is copied verbatim so Worker 5 is solving the same problem as Worker 1.
Technical State: Workers need to know what's on disk. Without this, they waste tool calls re-discovering the file layout.
Key Decisions / Dead ends: The highest-value section. Dead ends are the difference between a smart relay and a dumb restart. If Worker 1 learned that batch-reading 20 files blows up context, Worker 2 shouldn't rediscover that.
Progress: Explicit item-level tracking. "Completed files 1-80" is actionable. "Made good progress" is not.
Resume Instructions: Self-contained. Written for someone with zero context. This is what the next worker actually executes from.

Worker Rules

Workers follow five operational rules designed to prevent the most common failure modes:

1. Interleave reads and writes

The #1 cause of context exhaustion: reading 50 files into context, then trying to write outputs for all of them. By the time you start writing, you've forgotten the early files.

Instead: read 5-8 files, write their outputs, repeat. Each batch is a self-contained unit of work that survives a handoff.

2. Save work to disk frequently

Every file written to disk is progress that survives a handoff. Unwritten work that's only in the conversation context is lost when the worker exits. Write early, write often.

3. Prefer small committed units

Finish one item completely before starting the next. A half-processed file is harder to resume than an unstarted one — the next worker has to figure out what's done and what isn't.

4. Track your own progress

Workers maintain a mental count of items completed vs. total. This count goes into the handoff file and the return message. Without it, the dispatcher can't report meaningful progress.

5. Don't re-read what the previous worker summarized

If the handoff file says "file X contains a REST endpoint for user creation," trust it. Only re-read source files when you need to generate output from them. This saves context for actual work.

The --skill Flag

The composition model is simple: when --skill is provided, the task description is rewritten to invoke that skill:

# Without --skill
/tag-team "Process all 200 files"
→ Task sent to worker: "Process all 200 files"

# With --skill
/tag-team PROJ-12345 --skill develop
→ Task sent to worker: "Run /develop PROJ-12345. This is a tag-team relay
   — follow the worker instructions for context warning handling and handoff."

The wrapped skill doesn't know it's inside a relay. It just runs normally. If it exhausts context, the worker's context-warning protocol kicks in, writes a handoff, and the next worker resumes the skill from the handoff state.

This means any long-running skill becomes context-resilient without modifying the skill itself.

Error Handling

Three failure modes and how each is handled:

Failure	Detection	Response
Worker exits without `ALL_DONE` or `HANDOFF` prefix	Dispatcher checks if handoff file exists at expected path	If file exists: treat as handoff. If not: log error, stop relay, show full output to user.
Worker crashes mid-handoff	Handoff file is incomplete or missing expected sections	Dispatcher warns user and stops. Suggests `/tag-team resume` after manual inspection.
Max iterations reached	Loop counter	Dispatcher reports progress and suggests `/tag-team resume` to continue.

Workers are instructed to write a handoff file even on errors — partial state is better than no state.

Session Management

Fresh Start with Existing Session

If .claude/tag-team/ already has files from a previous run, the dispatcher warns:

Found existing tag-team session with 3 handoff files.
Use `/tag-team resume` to continue, or confirm to start fresh (will archive old files).

On confirmation, existing files are moved to .claude/tag-team/archive-{timestamp}/. Nothing is deleted.

Resume Detection

/tag-team resume globs for handoff-*.md, sorts numerically, takes the highest number, validates the file has the expected sections, then continues the loop from that iteration.

Progress File

progress.md is an append-only log. After each worker:

## Iteration 2 (Worker-2)
- Completed: 2026-04-02T14:35:22Z
- Result: HANDOFF
- Summary: Processed files 81-150. Handed off at context 78%. Next: file 151...

After relay completion:

## Relay Complete
- Total iterations: 3
- Final result: COMPLETED
- Completed: 2026-04-02T14:52:08Z

When Tag-Team Adds Overhead vs. Value

Overhead wins (don't use tag-team):

Task fits in one context window
Task requires deep cross-file reasoning where splitting work loses coherence
Task is interactive and needs human input at multiple points

Value wins (use tag-team):

Batch processing 20+ files
Multi-phase workflows that routinely exhaust context
Any task where you've restarted a conversation more than once
Wrapping skills that sometimes fail due to context limits

The break-even point is roughly when you'd need 2+ manual restarts without tag-team.

Why I Built a Centralized Skill Registry Instead of Using Claude Code Plugins

Dibyanshu kumar — Wed, 25 Mar 2026 03:46:28 +0000

Why I Built a Centralized Skill Registry Instead of Using Claude Code Plugins

Claude Code has a plugin system. It's well-designed — namespaced skills, marketplace distribution, versioned releases. So why did I build my own centralized skill management layer on top of plain .claude/skills/ directories?

Because plugins solve the distribution problem. I needed to solve the coordination problem.

The Setup

I work across multiple repos — different tech stacks, different teams, different build systems. Each repo needs Claude Code skills tailored to its architecture: how to run tests, how to structure PRs, what patterns to follow in code review.

At first, I copied skill files into each repo's .claude/skills/ directory. Within a week, the copies drifted. I'd fix a bug in one repo's /review-pr skill and forget to propagate it. Worse, some skills — like /develop (a 12-phase Jira-to-PR orchestrator) — span 16 files and 4,000+ lines. Copy-pasting that across repos was not sustainable.

Why Not Plugins?

Claude Code plugins were the obvious answer. They're designed for exactly this: package skills into a distributable unit, install across projects. But when I evaluated them against my requirements, several gaps emerged.

1. Namespacing Adds Friction to Muscle Memory

Plugin skills are namespaced: /my-plugin:develop instead of /develop. This is a good design decision for preventing conflicts in the ecosystem, but when you're the sole consumer of your skills across your own repos, the namespace is overhead. I want to type /develop PROJ-123 everywhere, not /my-plugin:develop PROJ-123.

Standalone .claude/skills/ directories give you bare /skill-name invocations. I wanted that simplicity and centralized management.

2. No Built-in Multi-Repo Coordination

Plugins are install-and-forget — you install them per project and they work independently. But my workflow requires cross-repo awareness. When a Jira ticket comes in, I need to figure out which repo it belongs to before executing any skill.

I built a /dispatch skill that scores Jira tickets against a registry of projects using weighted signals:

Component match: +10 points
Label match: +5 points
Keyword match: +2 points (capped at +10)

If the top score is >= 10 and >= 2x the runner-up, it auto-routes. Otherwise, it asks the user. Plugins have no concept of this — they don't know about other repos or how to route work between them.

3. Per-Project Configuration Without Forking

Each repo has different thresholds for when a PR gets a full multi-agent review vs. a quick single-agent pass. Different build commands. Different branch naming conventions. Different default branches.

Plugins handle this with plugin-level settings.json, but that configuration lives inside the plugin. If two repos need different thresholds for the same skill, you'd need either:

Two separate plugins (defeats the purpose of sharing)
Configuration logic inside the skill that reads from some external source

I went with the second approach directly: a registry.json that stores per-project metadata, and a config.json per project profile that tunes skill behavior. The skills are shared; the configuration is project-specific. The central repo has a directory per project profile, each with its own skills, agents, and config. A top-level dispatch/ skill handles cross-repo routing.

4. Symlinks > Install Cycles

When I update a skill, the change should be live immediately in every repo. No reinstall, no version bump, no marketplace push.

Symlinks do this. Each project's .claude/skills directory is a symlink pointing to the corresponding profile in the central repo. Edit a skill file in the central repo, and every project sees it instantly. This is critical during active development — when I'm iterating on a skill, I don't want a publish-install cycle between each test.

Plugins require either re-running --plugin-dir or doing /reload-plugins. With symlinks and --add-dir, Claude Code's live change detection picks up edits automatically.

5. The Registry as a Metadata Layer

The real power isn't just shared skills — it's the registry.json that sits above them. Each project entry contains its repo path, profile directory, tech stack, Jira routing signals (components, keywords, labels), available skills, and build commands. This enables:

Dynamic routing: /dispatch reads the registry to score and route tickets
Skill validation: Before executing, verify the target project actually supports that skill
Status checks: /dispatch status verifies all symlinks are intact and repos exist
Project discovery: /dispatch list shows all registered projects in a table

None of this exists in the plugin model because plugins don't need it — they're scoped to a single project. A centralized registry is only valuable when you're managing skills across projects.

The Setup Script

A setup script reads the registry and creates all the symlinks — each project's .claude/skills and .claude/agents directories point back to the central repo's profile for that project.

Key decisions:

Symlink subdirectories, not the entire .claude/ — this preserves each project's local settings.local.json and session artifacts
Backup before replacing — if a project already has a skills/ directory, back it up with a timestamp
Idempotent — running it twice is safe; it skips correct symlinks

New team member onboarding: clone the central repo, run the setup script, done. Every project gets the latest skills.

When Plugins Are the Right Choice

This approach isn't universally better than plugins. Plugins win when:

You're distributing to the community — namespacing and marketplaces matter
You want versioned releases — semver, changelogs, controlled rollouts
Skills are self-contained — no cross-repo coordination needed
You need to bundle MCP servers or hooks — plugins package these alongside skills
Multiple consumers with different needs — the marketplace model handles this well

My approach wins when:

You control all the target repos — no need for marketplace discovery
Skills need cross-repo awareness — routing, shared configuration, project metadata
You want instant propagation — symlinks over install cycles
Bare skill names matter — /develop over /plugin:develop
Per-project config must be separate from skill logic — registry + config.json pattern

The Hybrid Path

These aren't mutually exclusive. You could package a centralized skill registry as a plugin that manages symlinks and registry state. Or use plugins for truly standalone skills (like a generic /explain-code) while using the registry pattern for workflow skills that need cross-repo context.

The point isn't that plugins are wrong. It's that "how do I share skills across repos?" and "how do I coordinate AI workflows across repos?" are different problems. Plugins answer the first. A centralized registry with dispatch routing answers the second.

This is part 3 of a series on scaling Claude Code for enterprise workflows. Previously: How I Stopped Losing Work to Context Window Overflow and How I Taught an AI Agent to Save Its Own Progress.

How I Taught an AI Agent to Save Its Own Progress

Dibyanshu kumar — Mon, 23 Mar 2026 03:20:14 +0000

AI coding agents are stateless. Every time you start a new session, the agent has no memory of what happened before. If the session crashes, if you close the terminal, if context runs out — everything the agent knew is gone.

I needed my agent to handle multi-hour development workflows. So I built a checkpoint system that lets the AI save and restore its own progress.

The Problem With Long Workflows

I use Claude Code for full development cycles — not just "write a function" tasks, but the whole thing: read a Jira ticket, write a design document, get it reviewed, implement across multiple modules, run tests, create PRs.

That's a lot of steps. And any one of them can fail:

The session crashes mid-implementation
Context window fills up during code review
I close my laptop and come back the next day
A reviewer agent times out

Without checkpoints, I'd restart from scratch every time. Read the ticket again. Regenerate the design. Redo work that was already done.

What I Built

I broke the development workflow into phases with two types of boundaries: automatic checkpoints (the AI saves state on its own) and human gates (the AI stops and waits for my approval).

The workflow looks like this:

Gather Context → Write Design → Review Design
    → CHECKPOINT 1: I approve or edit the design →
Implement → Review Code
    → CHECKPOINT 2: I approve or reject the code →
Fix Issues → Commit → Create PR → Respond to PR Comments

Each phase saves its status and artifacts to persistent storage. When a session dies, the next session picks up where the last one left off.

How Checkpoints Work

Saving State

After each phase completes, the agent writes a checkpoint — a record of what was done, what was produced, and what comes next. The checkpoint includes:

Phase name and status (completed, in-progress, failed)
Artifacts produced (design doc path, review report, branch names)
Context needed for resumption (which modules are done, which review round we're on)

This isn't conversation history. It's structured metadata about the workflow's progress.

Resuming

When I start a new session and say "resume," the agent runs a reconciliation step:

Check persistent storage for saved checkpoints
Scan disk for artifacts (does the design doc exist? are there feature branches?)
Reconcile — disk is the source of truth, checkpoints are supplementary
Determine the first incomplete phase and jump to it

The key insight: disk artifacts are more reliable than metadata. If a design document exists on disk but the checkpoint says the design phase is "in progress," trust the disk. The file is there. The phase is done.

Human Gates

Two points in the workflow require my explicit approval:

After design review — the agent presents the design document, review findings, and asks: approve, edit, or reject? If I say "edit," it applies my changes to the design doc and automatically re-runs the review. This loops until I approve.

After code review — same pattern. Approve, fix issues, or reject. If there are critical findings, the agent auto-fixes them before I even see the checkpoint.

These gates exist because some decisions shouldn't be automated. The agent can write code all day, but I decide whether the design makes sense.

What Makes This Work

Phase-Level Granularity

I don't checkpoint every tool call or every message. I checkpoint at phase boundaries — after "gather context" is done, after "write design" is done, after each module is implemented. This keeps the checkpoint data small and meaningful.

Module-Level Progress

Implementation can span five or six modules. The checkpoint tracks which modules are completed:

Implementation progress (2/5 modules):
  [DONE] module-a
  [DONE] module-b
  [    ] module-c  ← resuming here
  [    ] module-d
  [    ] module-e

If the session dies after module 2, the next session skips straight to module 3.

Timeout Recovery

Sometimes a reviewer agent times out — it hits its turn limit before finishing. Instead of re-running everything, the checkpoint records which reviewers completed and which didn't. On resume, I can choose to re-run just the failed reviewer and merge its findings into the existing report.

What I Learned

Checkpoints should be boring. They're not a feature users interact with. They're infrastructure that makes everything else reliable. The best checkpoint system is one you never think about — sessions crash, you resume, and it just works.

Disk is a better source of truth than a database. Files on disk are visible, auditable, and survive any kind of failure. A database record that says "design phase complete" is useless if the design file doesn't exist. Check the artifacts, not the metadata.

Human gates are the real value. Automatic checkpointing is nice, but the ability to pause the workflow, inspect the output, and say "go back and fix this" — that's what makes the difference between an AI assistant and an AI that runs off and does whatever it wants.

AI agents need state management, not just prompts. We spend a lot of time crafting perfect prompts, but the hard problem isn't getting the AI to write good code. It's getting the AI to pick up where it left off without losing context, repeating work, or forgetting decisions that were already made.

— DK

How I Stopped Losing Work to Context Window Overflow in Claude Code

Dibyanshu kumar — Mon, 23 Mar 2026 02:54:31 +0000

If you use Claude Code for long coding sessions, you've probably experienced this: you're 40 minutes in, deep in a complex refactor, and the model starts forgetting things. It repeats itself. It loses track of what files it already edited. Then the session just dies — context window full, conversation over, work lost.

I got tired of it and built a proxy to fix it.

The Problem

LLM coding tools like Claude Code send everything — system prompts, tool definitions, project context, and your entire conversation history — in every API request. As the conversation grows, the payload approaches the model's context limit silently. There's no progress bar. No warning. The tool doesn't tell you "hey, you're at 80%, maybe wrap up."

When it finally overflows, you lose the session. Whatever the model was working on, whatever context it had built up — gone. You start a new conversation from scratch.

What I Tried First

Manual summarization — I'd try to remember to ask the model to write a summary before context ran out. But I'd forget, or misjudge how much room was left.

Shorter sessions — Breaking work into tiny chunks defeats the purpose of having an AI coding assistant handle complex, multi-step tasks.

Prompt caching — I built an entire cache optimization layer with volatility-based decomposition. Six layers, hash-based change detection, provider-specific cache hints. It was elegant in theory. In practice, it didn't meaningfully reduce costs or prevent overflows. I disabled it.

What Actually Worked

I built a local HTTP proxy called Prefixion that sits between Claude Code and the Anthropic API:

Claude Code → Prefixion (localhost:8080) → api.anthropic.com

It doesn't modify your prompts for caching. It doesn't try to be clever. It does two things well:

1. Context Window Warnings

Every request passes through the proxy. Prefixion estimates token usage from the payload size and tracks where you are relative to the model's context limit.

When you cross a threshold, it injects a warning directly into the conversation — appended to your last message so the model sees it as an urgent instruction:

At 70% — a gentle alert:

"This conversation has used 72% of its context window. Write a conversation summary and suggest starting a new conversation."

At 80% — a firm warning:

"STOP. BEFORE responding to the user, write a conversation summary. Tell the user to start a new conversation."

At 90% — an emergency stop:

"STOP ALL WORK IMMEDIATELY. Do not make any more tool calls. Write a conversation summary. This conversation must end now."

The warnings escalate per conversation, so you only see each level once. And because they're injected into the user message (not the system prompt), they don't break any existing cache prefixes.

The result: the model writes a summary file — what was accomplished, current status, open items, key files modified — before the session dies. When you start a new conversation, you have full context to pick up where you left off.

2. Everything Gets Tracked

Every turn is logged to a local SQLite database with:

Input/output token counts
Cache read/write tokens (from the API response)
Calculated cost in USD
Guard events that fired (which warnings triggered, when)

There's a web dashboard where you can browse conversations, see per-turn token breakdowns, and check guard efficiency metrics. It's useful for understanding how your sessions actually behave — which ones cost the most, where context fills up fastest, how often you hit the wall.

How It's Set Up

Point Claude Code at http://localhost:8080 as the API base URL and start the proxy. That's it. Auth headers pass through untouched. Streaming works. If the proxy fails for any reason, it forwards the original request unmodified — the "do no harm" principle.

What I Learned

The real problem isn't cost — it's session reliability. I started this project trying to optimize prompt caching and reduce API bills. That turned out to be the wrong problem. The thing that actually hurt was losing work. A $2 session that crashes is worse than a $4 session that finishes.

Warnings need to be injected, not displayed. A notification in a sidebar doesn't help. The model needs to see the warning as an instruction it can act on. Injecting it into the conversation is crude but effective.

LLM tools will probably build these features natively. Context awareness, session handoff — these should be built into Claude Code and Cursor and Aider. Until they are, a proxy is a clean way to add them without forking anything.

Should You Build One?

Honestly — probably not. If you're a casual user, shorter sessions and manual summaries work fine. If you're a power user running 60-minute sessions on complex codebases, the context overflow problem is real and a proxy like this helps.

But the ideas are what matter more than the code:

Monitor context usage and intervene before it's too late
Inject warnings as model instructions, not UI notifications
Always write a summary before a session ends, not after

These are patterns any tool can implement. The proxy approach is just one way to do it.

— DK