Solving response Token 25k limit Wall: Introducing mcp-cache

Swapnil Surdi — Tue, 30 Sep 2025 20:12:52 +0000

I've been working with Claude and MCP servers extensively—building web automation, analyzing codebases, automating testing workflows. But I kept hitting the same frustrating wall:

Error: Response exceeds maximum allowed tokens (25,000)

The Problem

Modern applications generate massive responses:

Web page DOMs: 1.3MB+ (154K tokens)
GitHub PR diffs: 36K tokens (44% over limit)
Figma exports: 351K tokens (1,300% over)

Every time I asked Claude to analyze a real web page. Not because the AI couldn't handle it—because MCP had a hard ceiling at 25,000 tokens.

The Real-World Impact

Looking at GitHub issues across popular MCP servers, I found hundreds of developers facing identical problems:

Chrome MCP: "screenshot always gives 'exceeds maximum tokens' error"
GitHub MCP: "get_pull_request_diff fails for any substantial PR"
Playwright MCP: "DOM content returns 'Conversation Too Long' error"

The pattern was clear: MCP works beautifully for toy examples. Breaks on real-world complexity.

The Solution: mcp-cache

I built mcp-cache—a universal response manager that wraps any MCP server and solves the token limit automatically.

How it works:

Claude Desktop
    ↓
mcp-cache (transparent proxy)
├─ Intercepts large responses
├─ Caches full data locally
├─ Returns summary + query tools
└─ AI searches cached data on demand
    ↓
Target MCP Server (unchanged)

Before mcp-cache:

→ "Get the DOM and find payment forms"
❌ Error: Response exceeds maximum length

After mcp-cache:

→ "Get the DOM and find payment forms"
✅ Cached as resp_xyz (1.2MB)
→ "Show forms with 'payment' in action"
✅ Found 3 forms

Zero Configuration

The best part? It's completely transparent:

# Instead of:
npx @playwright/mcp@latest

# Just add mcp-cache:
npx @hapus/mcp-cache npx @playwright/mcp@latest

That's it. No server modifications. No client changes.

Works with ANY MCP server:

✅ Playwright, Chrome, GitHub, Filesystem
✅ Python, Node.js, Go, Rust servers
✅ Your custom MCP servers

Real Results

Since integrating mcp-cache:

E-Commerce Testing:

✅ Full accessibility trees cached (was: 250K token errors)
✅ AI queries specific elements from 1.2MB+ responses
✅ Complex multi-page flows automated successfully

Performance:

⚡ <10ms overhead for normal responses
⚡ <200ms for cached queries
⚡ 90%+ cache hit rate

What's Next

Current: Local file-based caching
Coming: Redis-backed distributed caching for teams
Vision: Vector embeddings + semantic search

Imagine:

🏢 Organization-wide shared cache
🔍 Semantic search: "Find pages similar to our checkout flow"
📊 Compliance audit trails
🧠 Knowledge graphs from cached responses

Key Technical Highlights

Client-Aware Intelligence:

Auto-detects client (Claude Desktop, Cursor, Cline)
Adjusts token limits accordingly
No manual configuration needed

Powerful Query Interface:

// Text search
query_response('resp_id', 'submit button')

// JSONPath for structured data
query_response('resp_id', '$.div[?(@.class=="navbar")]')

// Regex patterns
query_response('resp_id', '/href=".*\\.pdf"/')

Try It Today

npm install -g @hapus/mcp-cache

# Or use directly:
npx @hapus/mcp-cache <your-server-command>

Links:

⭐ GitHub: https://github.com/swapnilsurdi/mcp-cache
📦 npm: @hapus/mcp-cache

Looking For

✅ Testers - Try it with your MCP workflows
✅ Feedback - What features would help you most?
✅ Contributors - Interested in building Redis/vector DB layers?
✅ Use cases - What are you trying to automate?

This started as a side project to scratch my own itch. Now I'm hoping it helps others facing the same problem.

DEV Community: Swapnil Surdi