Every MCP server tutorial follows the same script: install the SDK, define a tool, return a response. Ship it.
That works for a single API. It does not work when you need an agent to reliably choose between, authenticate to, and call 645+ different APIs across 86 categories — and handle everything that goes wrong at 3am with no human.
We built Rhumb, an MCP server that proxies hundreds of real APIs for AI agents. Here's what actually broke, why the tutorials don't cover it, and what you need to know if you're building anything beyond hello-world.
Bug #1: The slug aliasing problem
First surprise: APIs don't have stable identifiers.
Brave's search API appears as both brave-search-api and brave-search depending on which documentation page you read. When an agent asks to "search with Brave," your MCP server needs to know these are the same service.
This isn't unique to Brave. We found alias collisions in payment providers (same company, multiple API versions with different names), communication platforms (SMS vs messaging vs voice — same provider, different "APIs"), and analytics tools (legacy vs v2 naming).
The fix isn't a lookup table. It's a canonical slug system with alias resolution that treats identity as a first-class problem.
Why tutorials skip this: They show one API. You never hit naming collisions with one API.
Bug #2: Authentication is not a solved problem
The tutorials say: "Add your API key to the header." That covers maybe 40% of real APIs.
What we actually encountered across providers:
- Bearer token (Authorization: Bearer {key}) — ~45% of services
- Custom header (X-API-Key, X-Subscription-Token, Api-Key) — ~25%
- Basic Auth (base64 encoded credentials) — ~15%
- OAuth2 with token refresh — ~10%
- Query parameter (?api_key=...) — ~5%
The problem isn't supporting all five patterns. It's that your MCP server needs to know which pattern each API uses before the agent's first call. If the agent sends a Bearer token to an API expecting X-API-Key, you get a 401 that tells the agent nothing useful.
Worse: some APIs accept the wrong auth method silently and return empty results instead of errors.
Bug #3: The payload translation trap
Your agent constructs a JSON payload. The API expects multipart form data.
This hits hardest with document processing APIs. An agent wants to send a file for parsing. It constructs a reasonable JSON body with the file content. The API returns 400 because it only accepts multipart uploads with specific field names.
The gap between "what the agent naturally produces" and "what the API actually accepts" is wider than anyone admits:
-
Parameter naming:
queryvsqvssearch_queryvsprompt - Body format: JSON vs form-encoded vs multipart
-
Array handling:
tags=a,b,cvstags[]=a&tags[]=bvs{"tags": ["a","b"]} - Date formats: ISO 8601 vs Unix timestamps vs custom strings
- Pagination: cursor vs offset vs page-number vs link-header
Why this matters for agents: A human developer reads the docs and adapts. An agent will retry the same malformed request until it hits rate limits.
Bug #4: Error messages that lie
Here's a real error response from a production API:
{"error": "An error occurred. Please try again later."}
An agent receiving this will try again later. The actual problem? Invalid API key format. Retrying will never help.
The quality gap in error responses across 645+ APIs is staggering:
Good (Stripe-class):
{
"error": {
"type": "invalid_request_error",
"code": "parameter_missing",
"param": "amount",
"message": "Missing required param: amount"
}
}
Bad (more common than you'd think):
{"status": "error", "message": "Bad Request"}
In our scoring of 645+ APIs, structured error responses are a minority. Most APIs return human-readable error strings that agents can't reliably parse.
Bug #5: Rate limits without information
Good APIs tell you exactly where you stand:
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 247
X-RateLimit-Reset: 1616000000
Retry-After: 30
Bad APIs return 429 and nothing else.
Some APIs have undocumented secondary rate limits. GitHub's REST API has a primary rate limit (5,000/hour) and a secondary rate limit on "content-creating" endpoints that's lower and not reflected in headers. An agent creating issues or comments will hit the secondary limit and get a 403 with a message about "secondary rate limits" that doesn't appear in any getting-started guide.
Real numbers from our data:
- Stripe (AN 8.1): Rate limit headers, Retry-After, burst limit documented
- GitHub (AN 7.8): Primary headers yes, secondary limits documented but not in headers
- PayPal (AN 4.9): Headers inconsistent, sandbox limits undocumented
Bug #6: The sandbox illusion
"Just use the sandbox." Every API says this. Few deliver.
Real sandbox problems we encountered:
- Sandbox requires production credentials — defeats the purpose
- Sandbox has different behavior — you test against a lie
- Sandbox has stricter rate limits — can't performance test
- Sandbox doesn't support all endpoints — partial testing only
- Sandbox requires CAPTCHA to create — agents can't self-provision
PayPal's sandbox requires CAPTCHA verification to create accounts. That one detail drops it from "agent-friendly" to "requires a human for setup."
Bug #7: The versioning time bomb
APIs change. Versioning is supposed to protect you.
In practice:
- Stripe: Explicit API version in every request header. Pin a version, get consistent responses forever.
-
Most APIs: Unversioned endpoints that change without notice. Your agent's response parser breaks silently when
email_addressbecomesemailAddress.
The insidious part: breaking changes often affect edge cases first. Your happy-path tests pass. Your agent hits the edge case at 3am.
What we learned
After building through all of this, we distilled the problems into a scoring framework. Every API gets evaluated on 20 dimensions:
Execution (70%): Error handling, schema stability, idempotency, latency, rate limit transparency.
Access Readiness (30%): Signup friction, auth complexity, docs quality, sandbox, rate limits.
Some results that surprised us:
- Stripe (8.1) and Twilio (8.0) are genuinely built for automation
- GitHub (7.8) is excellent but has sneaky secondary rate limits
- Resend (7.8) — newer email API that got the details right from day one
- SendGrid (6.4) — dominant but showing age in error handling
- PayPal (4.9) — the CAPTCHA sandbox alone is disqualifying for autonomous use
- Salesforce (4.8) — powerful but the OAuth dance is hostile to agents
The full leaderboard across 86 categories is at rhumb.dev/leaderboard.
If you're building an MCP server
- Treat service identity as a first-class problem. You will hit naming collisions.
- Build an auth resolution layer. Don't make the agent know which header format each API uses.
- Expect payload translation. What the agent sends and what the API wants are rarely the same shape.
- Parse errors defensively. Most APIs don't return structured errors.
- Implement rate limit tracking per-provider. Don't share a single backoff strategy.
- Test against production, not just sandboxes. Many sandboxes are incomplete.
- Pin API versions where possible. If the API doesn't support versioning, monitor for breaking changes.
The MCP protocol gives you a great transport layer. It tells you nothing about what happens when your tools hit real APIs. That part is on you.
Rhumb scores 645+ APIs across 86 categories on 20 dimensions. Methodology at rhumb.dev/methodology. MCP server is open source: github.com/supertrained/rhumb.
Top comments (0)