DEV Community

Cover image for Solving response Token 25k limit Wall: Introducing mcp-cache
Swapnil Surdi
Swapnil Surdi

Posted on

Solving response Token 25k limit Wall: Introducing mcp-cache

I've been working with Claude and MCP servers extensively—building web automation, analyzing codebases, automating testing workflows. But I kept hitting the same frustrating wall:

Error: Response exceeds maximum allowed tokens (25,000)
Enter fullscreen mode Exit fullscreen mode

The Problem

Modern applications generate massive responses:

  • Web page DOMs: 1.3MB+ (154K tokens)
  • GitHub PR diffs: 36K tokens (44% over limit)
  • Figma exports: 351K tokens (1,300% over)

Every time I asked Claude to analyze a real web page. Not because the AI couldn't handle it—because MCP had a hard ceiling at 25,000 tokens.

The Real-World Impact

Looking at GitHub issues across popular MCP servers, I found hundreds of developers facing identical problems:

  • Chrome MCP: "screenshot always gives 'exceeds maximum tokens' error"
  • GitHub MCP: "get_pull_request_diff fails for any substantial PR"
  • Playwright MCP: "DOM content returns 'Conversation Too Long' error"

The pattern was clear: MCP works beautifully for toy examples. Breaks on real-world complexity.

The Solution: mcp-cache

I built mcp-cache—a universal response manager that wraps any MCP server and solves the token limit automatically.

How it works:

Claude Desktop
    ↓
mcp-cache (transparent proxy)
├─ Intercepts large responses
├─ Caches full data locally
├─ Returns summary + query tools
└─ AI searches cached data on demand
    ↓
Target MCP Server (unchanged)
Enter fullscreen mode Exit fullscreen mode

Before mcp-cache:

→ "Get the DOM and find payment forms"
❌ Error: Response exceeds maximum length
Enter fullscreen mode Exit fullscreen mode

After mcp-cache:

→ "Get the DOM and find payment forms"
✅ Cached as resp_xyz (1.2MB)
→ "Show forms with 'payment' in action"
✅ Found 3 forms
Enter fullscreen mode Exit fullscreen mode

Zero Configuration

The best part? It's completely transparent:

# Instead of:
npx @playwright/mcp@latest

# Just add mcp-cache:
npx @hapus/mcp-cache npx @playwright/mcp@latest
Enter fullscreen mode Exit fullscreen mode

That's it. No server modifications. No client changes.

Works with ANY MCP server:

  • ✅ Playwright, Chrome, GitHub, Filesystem
  • ✅ Python, Node.js, Go, Rust servers
  • ✅ Your custom MCP servers

Real Results

Since integrating mcp-cache:

E-Commerce Testing:

  • ✅ Full accessibility trees cached (was: 250K token errors)
  • ✅ AI queries specific elements from 1.2MB+ responses
  • ✅ Complex multi-page flows automated successfully

Performance:

  • ⚡ <10ms overhead for normal responses
  • ⚡ <200ms for cached queries
  • ⚡ 90%+ cache hit rate

What's Next

Current: Local file-based caching
Coming: Redis-backed distributed caching for teams
Vision: Vector embeddings + semantic search

Imagine:

  • 🏢 Organization-wide shared cache
  • 🔍 Semantic search: "Find pages similar to our checkout flow"
  • 📊 Compliance audit trails
  • 🧠 Knowledge graphs from cached responses

Key Technical Highlights

Client-Aware Intelligence:

  • Auto-detects client (Claude Desktop, Cursor, Cline)
  • Adjusts token limits accordingly
  • No manual configuration needed

Powerful Query Interface:

// Text search
query_response('resp_id', 'submit button')

// JSONPath for structured data
query_response('resp_id', '$.div[?(@.class=="navbar")]')

// Regex patterns
query_response('resp_id', '/href=".*\\.pdf"/')
Enter fullscreen mode Exit fullscreen mode

Try It Today

npm install -g @hapus/mcp-cache

# Or use directly:
npx @hapus/mcp-cache <your-server-command>
Enter fullscreen mode Exit fullscreen mode

Links:

Looking For

Testers - Try it with your MCP workflows
Feedback - What features would help you most?
Contributors - Interested in building Redis/vector DB layers?
Use cases - What are you trying to automate?


This started as a side project to scratch my own itch. Now I'm hoping it helps others facing the same problem.

Top comments (0)