Rhumb

Posted on Mar 29 • Edited on Apr 1 • Originally published at rhumb.dev

What Nobody Tells You About Building a Multi-Provider MCP Server

#ai #api #mcp #programming

Every MCP server tutorial follows the same script: install the SDK, define a tool, return a response. Ship it.

That works for a single API. It does not work when you need an agent to reliably choose between, authenticate to, and call 645+ different APIs across 86 categories — and handle everything that goes wrong at 3am with no human.

We built Rhumb, an MCP server that proxies hundreds of real APIs for AI agents. Here's what actually broke, why the tutorials don't cover it, and what you need to know if you're building anything beyond hello-world.

Bug #1: The slug aliasing problem

First surprise: APIs don't have stable identifiers.

Brave's search API appears as both brave-search-api and brave-search depending on which documentation page you read. When an agent asks to "search with Brave," your MCP server needs to know these are the same service.

This isn't unique to Brave. We found alias collisions in payment providers (same company, multiple API versions with different names), communication platforms (SMS vs messaging vs voice — same provider, different "APIs"), and analytics tools (legacy vs v2 naming).

The fix isn't a lookup table. It's a canonical slug system with alias resolution that treats identity as a first-class problem.

Why tutorials skip this: They show one API. You never hit naming collisions with one API.

Bug #2: Authentication is not a solved problem

The tutorials say: "Add your API key to the header." That covers maybe 40% of real APIs.

What we actually encountered across providers:

Bearer token (Authorization: Bearer {key}) — ~45% of services
Custom header (X-API-Key, X-Subscription-Token, Api-Key) — ~25%
Basic Auth (base64 encoded credentials) — ~15%
OAuth2 with token refresh — ~10%
Query parameter (?api_key=...) — ~5%

The problem isn't supporting all five patterns. It's that your MCP server needs to know which pattern each API uses before the agent's first call. If the agent sends a Bearer token to an API expecting X-API-Key, you get a 401 that tells the agent nothing useful.

Worse: some APIs accept the wrong auth method silently and return empty results instead of errors.

Bug #3: The payload translation trap

Your agent constructs a JSON payload. The API expects multipart form data.

This hits hardest with document processing APIs. An agent wants to send a file for parsing. It constructs a reasonable JSON body with the file content. The API returns 400 because it only accepts multipart uploads with specific field names.

The gap between "what the agent naturally produces" and "what the API actually accepts" is wider than anyone admits:

Parameter naming: query vs q vs search_query vs prompt
Body format: JSON vs form-encoded vs multipart
Array handling: tags=a,b,c vs tags[]=a&tags[]=b vs {"tags": ["a","b"]}
Date formats: ISO 8601 vs Unix timestamps vs custom strings
Pagination: cursor vs offset vs page-number vs link-header

Why this matters for agents: A human developer reads the docs and adapts. An agent will retry the same malformed request until it hits rate limits.

Bug #4: Error messages that lie

Here's a real error response from a production API:

{"error": "An error occurred. Please try again later."}

An agent receiving this will try again later. The actual problem? Invalid API key format. Retrying will never help.

The quality gap in error responses across 645+ APIs is staggering:

Good (Stripe-class):

{
  "error": {
    "type": "invalid_request_error",
    "code": "parameter_missing",
    "param": "amount",
    "message": "Missing required param: amount"
  }
}

Bad (more common than you'd think):

{"status": "error", "message": "Bad Request"}

In our scoring of 645+ APIs, structured error responses are a minority. Most APIs return human-readable error strings that agents can't reliably parse.

Bug #5: Rate limits without information

Good APIs tell you exactly where you stand:

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 247
X-RateLimit-Reset: 1616000000
Retry-After: 30

Bad APIs return 429 and nothing else.

Some APIs have undocumented secondary rate limits. GitHub's REST API has a primary rate limit (5,000/hour) and a secondary rate limit on "content-creating" endpoints that's lower and not reflected in headers. An agent creating issues or comments will hit the secondary limit and get a 403 with a message about "secondary rate limits" that doesn't appear in any getting-started guide.

Real numbers from our data:

Stripe (AN 8.1): Rate limit headers, Retry-After, burst limit documented
GitHub (AN 7.8): Primary headers yes, secondary limits documented but not in headers
PayPal (AN 4.9): Headers inconsistent, sandbox limits undocumented

Bug #6: The sandbox illusion

"Just use the sandbox." Every API says this. Few deliver.

Real sandbox problems we encountered:

Sandbox requires production credentials — defeats the purpose
Sandbox has different behavior — you test against a lie
Sandbox has stricter rate limits — can't performance test
Sandbox doesn't support all endpoints — partial testing only
Sandbox requires CAPTCHA to create — agents can't self-provision

PayPal's sandbox requires CAPTCHA verification to create accounts. That one detail drops it from "agent-friendly" to "requires a human for setup."

Bug #7: The versioning time bomb

APIs change. Versioning is supposed to protect you.

In practice:

Stripe: Explicit API version in every request header. Pin a version, get consistent responses forever.
Most APIs: Unversioned endpoints that change without notice. Your agent's response parser breaks silently when email_address becomes emailAddress.

The insidious part: breaking changes often affect edge cases first. Your happy-path tests pass. Your agent hits the edge case at 3am.

What we learned

After building through all of this, we distilled the problems into a scoring framework. Every API gets evaluated on 20 dimensions:

Execution (70%): Error handling, schema stability, idempotency, latency, rate limit transparency.

Access Readiness (30%): Signup friction, auth complexity, docs quality, sandbox, rate limits.

Some results that surprised us:

Stripe (8.1) and Twilio (8.0) are genuinely built for automation
GitHub (7.8) is excellent but has sneaky secondary rate limits
Resend (7.8) — newer email API that got the details right from day one
SendGrid (6.4) — dominant but showing age in error handling
PayPal (4.9) — the CAPTCHA sandbox alone is disqualifying for autonomous use
Salesforce (4.8) — powerful but the OAuth dance is hostile to agents

The full leaderboard across 86 categories is at rhumb.dev/leaderboard.

If you're building an MCP server

Treat service identity as a first-class problem. You will hit naming collisions.
Build an auth resolution layer. Don't make the agent know which header format each API uses.
Expect payload translation. What the agent sends and what the API wants are rarely the same shape.
Parse errors defensively. Most APIs don't return structured errors.
Implement rate limit tracking per-provider. Don't share a single backoff strategy.
Test against production, not just sandboxes. Many sandboxes are incomplete.
Pin API versions where possible. If the API doesn't support versioning, monitor for breaking changes.

The MCP protocol gives you a great transport layer. It tells you nothing about what happens when your tools hit real APIs. That part is on you.

Rhumb scores 645+ APIs across 86 categories on 20 dimensions. Methodology at rhumb.dev/methodology. MCP server is open source: github.com/supertrained/rhumb.

Start here: Want the full map of agent API selection, comparisons, reliability checklists, and the full infrastructure series? Read The Complete Guide to API Selection for AI Agents (2026).

DEV Community