The Wipe & Inject Pattern: Full Context for Implementation After Long Planning Sessions

Stef Antonio Virgil — Wed, 10 Dec 2025 23:30:45 +0000

If you use Claude Code (or any agentic tool) for serious development, you have hit "The Wall."

The Scenario:

Phase 1 (Planning): You spend 45 minutes debating the architecture. You ask Claude to read 20 files, check dependencies, and plan the auth system.

Cost: ~150k tokens.
Result: A perfect plan.

Phase 2 (Implementation): You say: "Great, write the code."

The Crash: Claude responds: "I need to compact my memory to proceed."

The Workaround Everyone Uses:

"OK, save the plan to a .md file first"

Claude compacts (loses context)

"Now read the .md file you just created"

Claude re-explores files mentioned in the plan

"Update the .md with your progress as you go"

Repeat steps 2-5 every time context fills up

The Problem: When the agent "compacts" context, it summarizes the WHAT ("We are building auth") but loses the WHY ("We chose cookies over headers because of XSS concerns"). The implementation phase starts with a "low-resolution" brain. The agent forgets constraints, re-asks questions, and writes buggy code.

The Solution: The "Wipe & Inject" Pattern
We faced this daily while building Grov. To fix it, we built an orchestration flow called Planning CLEAR.

It changes the lifecycle of a session from a "Run-on Sentence" to a "Chapter Book."

How it works (The Logic)
Instead of letting the context window fill up with chatty "back and forth," we use a local proxy to enforce a hard reset between Planning and Coding.

Detect Completion: We use a small model (Claude Haiku) to monitor the session. When it detects the task_type has switched from "Planning" to "Implementation," it triggers the CLEAR event.
Extract the Signal: Before wiping the memory, we extract two specific data points into a JSON structure:

Key Decisions: What did we agree on? (e.g., "Use Zod for validation").

Reasoning Trace: Why did we agree on it? (e.g., "Because Joi doesn't support type inference").

The "Wipe" (Reset): We empty the messages[] array completely.

Old Context Usage: 150,000 tokens.

New Context Usage: 0 tokens.

The "Inject" We inject the structured Summary directly into the system_prompt of the new session.

The Result
When you start typing code, you aren't fighting for the last 50k tokens of space. You have a fresh ~195k context window, but the agent still has "Full Recall" of the architectural constraints.

Bonus: The "Heartbeat" (Solving the 5-minute timeout)
Even with a fresh context window, there is another "invisible tax": Cache Expiry. Anthropic's prompt cache expires after 5 minutes of inactivity.

If you take a 10-minute break to grab coffee, your "Warm Cache" dies. The next prompt costs you full price (read tokens) and takes longer to process.

The Fix: We added a --extended-cache flag to Grov. It runs a background heartbeat that sends a minimal token (literally just a .) to the API every 4 minutes if you are idle.

Cost: ~$0.002 per keep-alive request (roughly every 4 minutes when idle).
Value: Keeps the session "Hot" indefinitely.

Try it out (Open Source)

We built these workflows into Grov, our open-source proxy for Claude Code.
If you are tired of running out of tokens or losing context mid-implementation, give it a shot.

Repo: github.com/TonyStef/Grov
Install: npm install -g grov

Let me know if this pattern helps your workflow!

Collective AI Memory for Engineering Teams (Open Source)

Stef Antonio Virgil — Mon, 08 Dec 2025 21:01:31 +0000

I've been using Claude Code heavily for a long period of time now. It’s incredible, but I noticed two massive hidden costs that were eating my budget (and my patience).

1. The Context Tax

Every time I started a new session, I had to watch the agent re-explore my codebase. It would read the same auth.ts file it read yesterday, re-analyze the same dependencies, and burn thousands of input tokens just to get back to "baseline."

It felt like hiring a senior engineer who gets amnesia every morning.

2. "Context Drift"

This was the bigger problem. My co-founder would establish an architectural pattern in Session A (e.g., "Always use the Service pattern for database calls"). Two days later, in Session B, my agent would ignore that constraint and write direct SQL queries.

The agent didn't know what my team decided yesterday. This led to regression bugs and "Drift," where the agent slowly deviates from the project's goals.

The Solution: Grov (Team Memory)

I built Grov to give agents a persistent, shared brain.

It acts as a local proxy that sits between your terminal and the LLM. It captures reasoning traces (why a change was made) and syncs them to the cloud.

If developer A explains the Auth system to their agent, developer B's agent automatically knows about it 10 minutes later.

How we built "Anti-Drift"

We didn't just want a vector database dump. We needed active protection against hallucination. We implemented a real-time drift detection system inside the proxy.

Here is the logic flow:
Intercept: The proxy captures every proposed action from Claude (edit, write, bash).

Score: We use a fast, cheap model (Claude 4.5 Haiku) to score the action on a scale of 1-10.

Verify: Haiku checks the action against:

The Original Goal
The Current User Instruction (Takes priority!)
Established Project Constraints
Logical Coherence (Current vs. Historical Reasoning)

Correct: If the score indicates "Drift" (deviation from the goal or constraints), Grov intercepts the request and injects a correction before Claude commits the code.

This effectively gives the agent a "Reviewer" that sits on its shoulder and says, "Hey, remember we decided to use the Service pattern? Don't write that SQL query."

Open Source

I built this because my co-founder and I were tired of repeating ourselves to our AI.

It is fully Open Source (Apache 2.0).

Data: Reasoning traces are captured locally. Team Sync is opt-in.

Repo: github.com/TonyStef/Grov NPM: npm install -g grov
(Star the repo to follow updates!)

We are currently in v0.5 Public Beta. If you are running an engineering team and want to test the "Shared Memory" implementation, I'd love your feedback on the product so far.

DEV Community: Stef Antonio Virgil

The Wipe & Inject Pattern: Full Context for Implementation After Long Planning Sessions

Collective AI Memory for Engineering Teams (Open Source)