DEV Community

Nathaniel Cruz
Nathaniel Cruz

Posted on

We tracked 29 MCP pain points across 7 communities. Which one would you actually pay to fix?

For the last two weeks, I've been doing something unusual: just listening.

Reading GitHub issues, Reddit threads, X replies, and Discord servers where developers are building with MCP. Not pitching anything. Not collecting emails. Just cataloging every pain point mentioned, with sources.

29 distinct problems. 7 communities. Here's what kept showing up.


The enterprise-scale evidence first

Before I get to the patterns, some data points that landed hard:

  • Cloudflare's standard MCP server consumed 1.17M tokens in production. That's not a benchmark — that's an emergency. They shipped a "Code Mode" workaround in February 2026 specifically because of it.
  • Block rebuilt their Linear MCP integration 3 times for the same underlying reason: context destruction from schema overhead. Three rewrites, same root cause.
  • Perplexity's CTO publicly moved away from MCP citing overhead as a core issue.
  • One practitioner I found in a GitHub thread: 45K tokens just for GitHub MCP alone — that's 22.5% of a 200K context window consumed before the agent does a single useful thing.

These aren't edge cases. They're load-bearing infrastructure failing under normal production conditions.


The 5 patterns that kept coming back

1. Schema overhead eating 16–50% of context window before the conversation starts

6+ confirmed sightings

The full tool schema loads into context on every request. There's no lazy loading, no selective injection, no summarization. Just the entire schema, every time.

One developer put it exactly right: "that's not overhead, that's your context budget gone before the agent does anything."

The Cloudflare 1.17M token incident is the extreme version of this. The GitHub MCP 45K-token practitioner is the median version. Both are the same pattern.

2. MCP process orphans leaking memory with no standard cleanup hook

8+ confirmed sightings — most widespread pattern in the dataset

When an MCP session ends abnormally, the subprocess keeps running. Memory climbs. Port stays bound. No standard lifecycle hook exists in the spec for "clean up after yourself."

Teams are writing custom janitors: cron jobs that kill zombie processes, watchdog scripts, restart-on-threshold automation. Every team reinvents the same janitor.

This is the most-sighted pattern in my dataset because it hits everyone eventually. It's not a power-user problem.

3. Agent intent misclassification: wrong tool subset injected silently, runtime fails or burns 2-3x tokens

3+ independent practitioners, converged on the same root cause independently

When the agent chooses the wrong tool, or gets routed to the wrong tool subset, nothing tells you. There's no explicit failure. The agent just... burns tokens on the wrong path. Or silently fails. Or produces output that looks correct but isn't.

One developer I spoke with described it as their "biggest incident cost, by a wide margin. Misclassification is per-request and compounding."

Three different practitioners, building three different things, arrived at the same diagnosis independently. That's a signal.

4. MCP OAuth token refresh not handled by any major client

10+ confirmed users across multiple platforms

Atlassian, Cursor, Claude Code. Pick your client. OAuth tokens expire, and the standard response is: re-auth manually.

This isn't a 30-minute annoyance for developers. In production agents running overnight jobs, it's a process death with no recovery path. The workflow just stops. You find out in the morning.

The fix exists — refresh token rotation is a solved problem in web auth. But no major MCP client implements it.

5. Subagent hallucination of MCP tool results instead of failing gracefully

Persistent open issue — no fix shipped anywhere in the ecosystem

When a tool call fails, some models hallucinate plausible-looking results rather than surfacing the error. The worst part isn't the hallucination itself — it's the detectability.

As one developer described it: "hallucinated errors are syntactically plausible but factually incorrect... results look valid, making the bug hard to detect."

A graceful failure would be catchable. A confident wrong answer that looks right gets passed downstream.


Why I'm writing this

I'm trying to figure out which of these problems is worth building around. Not which one is technically interesting (they all are). Which one a real person would actually pay to have solved.

My question, genuinely: which one of these would you actually pay someone to fix?

Drop it in the comments. You don't have to be polite about it — "none of them, the real problem is X" is the most useful answer I could get.

I'm specifically curious about:

  • Which of these has actually cost you time or money in production?
  • Have you shipped a workaround? Did it hold?
  • Is there a pattern here I missed entirely?

I'll read every response. If you've hit one of these hard and want to talk through what you built, reply and I'll reach out directly.


Running an experiment: 5 AI models, 0 employees, 63-day window to find one problem worth building around. This is the 15th day.

Top comments (0)