The Wipe & Inject Pattern: Full Context for Implementation After Long Planning Sessions

#ai #typescript #llm #productivity

If you use Claude Code (or any agentic tool) for serious development, you have hit "The Wall."

The Scenario:

Phase 1 (Planning): You spend 45 minutes debating the architecture. You ask Claude to read 20 files, check dependencies, and plan the auth system.

Cost: ~150k tokens.
Result: A perfect plan.

Phase 2 (Implementation): You say: "Great, write the code."

The Crash: Claude responds: "I need to compact my memory to proceed."

The Workaround Everyone Uses:

"OK, save the plan to a .md file first"

Claude compacts (loses context)

"Now read the .md file you just created"

Claude re-explores files mentioned in the plan

"Update the .md with your progress as you go"

Repeat steps 2-5 every time context fills up

The Problem: When the agent "compacts" context, it summarizes the WHAT ("We are building auth") but loses the WHY ("We chose cookies over headers because of XSS concerns"). The implementation phase starts with a "low-resolution" brain. The agent forgets constraints, re-asks questions, and writes buggy code.

The Solution: The "Wipe & Inject" Pattern
We faced this daily while building Grov. To fix it, we built an orchestration flow called Planning CLEAR.

It changes the lifecycle of a session from a "Run-on Sentence" to a "Chapter Book."

How it works (The Logic)
Instead of letting the context window fill up with chatty "back and forth," we use a local proxy to enforce a hard reset between Planning and Coding.

Detect Completion: We use a small model (Claude Haiku) to monitor the session. When it detects the task_type has switched from "Planning" to "Implementation," it triggers the CLEAR event.
Extract the Signal: Before wiping the memory, we extract two specific data points into a JSON structure:

Key Decisions: What did we agree on? (e.g., "Use Zod for validation").

Reasoning Trace: Why did we agree on it? (e.g., "Because Joi doesn't support type inference").

The "Wipe" (Reset): We empty the messages[] array completely.

Old Context Usage: 150,000 tokens.

New Context Usage: 0 tokens.

The "Inject" We inject the structured Summary directly into the system_prompt of the new session.

The Result
When you start typing code, you aren't fighting for the last 50k tokens of space. You have a fresh ~195k context window, but the agent still has "Full Recall" of the architectural constraints.

Bonus: The "Heartbeat" (Solving the 5-minute timeout)
Even with a fresh context window, there is another "invisible tax": Cache Expiry. Anthropic's prompt cache expires after 5 minutes of inactivity.

If you take a 10-minute break to grab coffee, your "Warm Cache" dies. The next prompt costs you full price (read tokens) and takes longer to process.

The Fix: We added a --extended-cache flag to Grov. It runs a background heartbeat that sends a minimal token (literally just a .) to the API every 4 minutes if you are idle.

Cost: ~$0.002 per keep-alive request (roughly every 4 minutes when idle).
Value: Keeps the session "Hot" indefinitely.

Try it out (Open Source)

We built these workflows into Grov, our open-source proxy for Claude Code.
If you are tired of running out of tokens or losing context mid-implementation, give it a shot.

Repo: github.com/TonyStef/Grov
Install: npm install -g grov

Let me know if this pattern helps your workflow!