If you are building autonomous AI agents right now using OpenAI, Anthropic, or local models, you have probably run into the exact same wall I did.
You build a smart agent. You give it access to a few API tools (web search, database queries, a CRM integration). You set it loose.
It works great for about four turns. Then, suddenly, it forgets its core instructions. It starts hallucinating. And when you check your API dashboard the next morning, your token usage has spiked so high you think your API key got leaked.
What happened? API Data Bloat.
The 50KB JSON Problem
Here is the dirty secret of agentic workflows: APIs were built for traditional software, not for LLM context windows.
When your AI agent decides to call a tool—let's say it searches for a user profile in a database—the API doesn't just return the user's name and email. It returns a massive 40KB wall of raw JSON containing timestamps, nested metadata, tracking IDs, and null fields.
Your AI only needed about 120 bytes of that data to answer the user's question. But because of how most agent frameworks operate, the entire 40KB payload gets dumped directly into the active context window.
This causes two massive problems:
The Cost: You are paying for tens of thousands of useless tokens on every single tool call.
Context Compaction: LLMs have finite memory. When you shove 40KB of junk JSON into the chat history, the LLM is forced to push out its original system prompt and early conversation history. The agent gets "dumb" because its working memory is full of tracking IDs.
The Flawed Solution: "Just use a cheaper model"
When developers see their API bills explode, their first instinct is to swap out GPT-4o or Claude 3.5 Sonnet for a cheaper, smaller model to save money.
But cheap models deliver cheap reasoning. The problem isn't that the smart models are too expensive; the problem is that you are feeding them garbage data they didn't ask for.
I got tired of this, so I built a middleware fix.
Enter: The OpenClaw Context Saver
I built and open-sourced a drop-in tool called the OpenClaw Context Saver. It is pure Python, has zero external dependencies, and acts as a protective shield for your LLM's context window.
It cuts agent token usage by 70% to 98% by solving the data bloat problem before the data ever reaches the AI.
Here is how it works under the hood:
Sandboxed Execution (ctx_run)
Instead of the LLM calling the API directly and eating the response, the LLM calls my ctx_run sandbox. The sandbox executes the API call in an isolated layer.Intent-Driven Filtering
Before passing the data back to the LLM, the Context Saver intercepts the massive JSON payload. It shrinks it down, extracting only the specific data points the agent actually needs to complete its current reasoning step.Session Continuity (The Magic Trick)
What if the agent needs the rest of that data later?
Instead of throwing the extra data away, the Context Saver indexes the full payload in a lightweight background database (SQLite). It passes a tiny, 120-byte summary into the active context window, along with a reference ID. If the agent realizes it needs more details three turns later, it can instantly retrieve them from the background index without re-running the API call.
The Real-World Impact
Let's look at the difference on a standard background agent task:
❌ WITHOUT Context Saver:
Agent calls API ➔ 20 KB raw JSON floods context.
Agent calls API again ➔ 30 KB raw JSON floods context.
Result: Session memory maxes out, working state is lost, and you burn ~750,000 tokens a day just on background noise.
✅ WITH Context Saver:
Agent calls ctx_run ➔ 120-byte summary enters context (full data indexed in the background).
Agent calls ctx_batch ➔ 500-byte combined summary enters context.
Result: Massive cost savings, perfect memory retention, and you can afford to keep using the smartest models available.
Stop burning tokens.
If you are optimizing AI agents, building autonomous systems, or just looking to drastically reduce your LLM API costs without sacrificing reasoning quality, drop this into your architecture today.
💻 I just open-sourced it. You can grab the code, check out the examples, and star the repo here:
https://github.com/tlancas25/openclaw-context-saver
I'm a solo dev, so I'd love to hear your feedback. Drop a comment if you've been struggling with context limits, or open an issue on GitHub if you want to see a specific feature!
Top comments (0)