Sahajmeet Kaur

Posted on Jul 1

MCP Authentication: How We Secured 12 MCP Servers Without Losing Our Minds

#ai #security #mcp #devops

TL;DR: MCP authentication is genuinely more complex than regular API auth because you're managing credentials across many servers, for many agents, under many user identities - often all at once. The approaches range from static API keys (fast, insecure at scale) to OAuth 2.1 with PKCE (spec-compliant, more setup) to a centralized gateway that handles all downstream auth for you. We went through all three stages. This post covers what we learned.

Eight months ago our MCP auth story was: shared API key in a .env file, every developer had access to everything, fingers crossed nothing bad happened.

Two near-misses later - one agent that almost deleted production data via a misconfigured write tool, one contractor whose MCP access wasn't revoked after they left - we got serious about it.

Here's the setup we landed on, what each part solved, and what it cost in setup time.

Why MCP auth is harder than regular API auth

Regular API auth has one credential relationship: your application authenticates to a service. MCP auth has three:

Client → Gateway/Server: how your agent proves its identity to the MCP infrastructure
Gateway → Downstream service: how the MCP server authenticates to GitHub, Jira, Slack, or whatever backend it wraps
User delegation: when an agent acts on behalf of a specific human (post to Slack as a user, not as a bot), how that user's identity flows through the call chain

Managing all three manually, per server, per developer, per agent is where the complexity explodes. Most MCP auth problems are coordination problems, not cryptography problems.

The authentication methods, in order of complexity

Static API keys / Bearer tokens

The simplest option. The MCP server expects a static Bearer token in the Authorization header. You set it once in the server config and again in the client config. Done.

{
  "mcpServers": {
    "my-server": {
      "url": "https://my-mcp-server.internal/mcp",
      "headers": {
        "Authorization": "Bearer your-static-token-here"
      }
    }
  }
}

Where it works: Internal servers with a single operator, dev environments, quick prototypes.

Where it breaks: Rotation. Every time the token changes, every client that uses it needs updating. With 15 developers and 8 servers, token rotation becomes a coordination nightmare. And if the token is in a .env file in a repo, it's eventually in git history.

The real risk: Static tokens have no user identity attached. When an agent calls a tool using a static token, there's no way to know which developer or workflow triggered it. Audit trails become "token X called tool Y at time Z" - useless for compliance or incident response.

Environment variables for stdio servers

Stdio-based MCP servers run as local processes. Auth happens outside the MCP protocol — you pass credentials as environment variables that the server reads at startup.

{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "${env:GITHUB_TOKEN}"
      }
    }
  }
}

The ${env:VAR} syntax in Claude Code and Cursor pulls from your shell environment rather than hardcoding the value in the config file — this keeps credentials out of version control.

Where it works: Local development with stdio servers where each developer authenticates with their own credentials.

Where it breaks: It doesn't scale. Each developer manages their own credentials per server. There's no centralized revocation. When a developer leaves, you're hoping they cleared their local environment (they often haven't).

OAuth 2.1 with PKCE for remote servers

The MCP spec standardized OAuth 2.1 with PKCE in its March 2025 revision. This is the correct long-term answer for remote MCP servers because it ties tool calls to real user identities through your existing identity provider.

The flow:

Agent initiates an MCP connection
Server redirects to your IdP (Okta, Azure AD, Auth0)
User authenticates in the browser
IdP issues an authorization code
Client exchanges the code for an access token (PKCE ensures this can't be intercepted)
Token is attached to all subsequent MCP calls

What this gives you: Tool calls are tied to the authenticated user, not a shared service credential. Tokens expire and auto-refresh. Revoking a user's access in your IdP automatically revokes their MCP access.

What this costs: More setup. Your MCP server needs to implement the OAuth resource server side. Your client needs to handle the browser redirect flow. Not all MCP clients have fully implemented the November 2025 spec revision yet - check your specific client's OAuth support before depending on it.

PATs and VATs for service accounts

For production agents that run without human-in-the-loop (no browser redirect possible), the pattern is Personal Access Tokens (PATs) for individual users and Virtual Account Tokens (VATs) for service accounts.

PAT: bound to a specific user's identity, appropriate for development workflows and user-delegated actions
VAT: a service account credential with defined permissions, appropriate for automated agents running in production without a human user attached

The distinction matters for audit trails: PAT calls show as coming from the specific developer; VAT calls show as coming from the named service account.

The setup we landed on

After the two near-misses, we didn't want to manage individual OAuth flows per server per developer. The maintenance surface was too large. What we implemented instead was a centralized gateway that handles all downstream auth for us.

How it works:

One token per developer, one token per service account. Developers authenticate to the gateway with a single PAT. Service agents authenticate with a VAT. The gateway manages every downstream credential — GitHub OAuth tokens, Jira API keys, Confluence tokens, internal API service accounts and auto-refreshes them before they expire.

RBAC at the tool level. We can say: "team type A can call search_issues and create_issue in Jira but not delete_issue." Defined per-server, per-tool, per-role in the gateway config. The agent never sees tools it isn't authorized to call - tools/list returns a filtered set based on the caller's identity.

OAuth 2.0 for user-delegated actions. For tool calls that should act on behalf of a real user - posting to Slack as a specific person, creating a Jira ticket attributed to the right engineer - we use OAuth 2.0 with our Okta setup. The gateway handles token exchange and refresh. Agents don't manage OAuth flows directly.

Audit log for every call. Every tool invocation logged: which agent, which user identity, which tool, what parameters, what response, timestamp. This was non-negotiable for our security team and it's also genuinely useful for debugging production agent failures.

We implemented this using TrueFoundry's MCP Gateway. The Okta integration took about a day to configure. The RBAC setup took roughly a day per server to define policies properly. The time investment paid back in the first month - we had one offboarding event and MCP access was fully revoked in a single dashboard action rather than hunting down six separate credentials.

The CVE worth knowing about

In early 2026, JFrog Security Research disclosed a vulnerability in a popular MCP OAuth implementation (v0.1.16 and earlier) where the package forwarded OAuth authorization endpoint URLs to system handlers without sanitization. A malicious MCP server could construct a URL that executed arbitrary OS commands on the developer's machine.

The fix shipped in v0.1.16. But the broader lesson: OAuth flows in MCP clients are relatively new and the spec is still settling (the November 2025 revision introduced Client ID Metadata Documents as the preferred registration method, replacing Dynamic Client Registration in most cases). Check your MCP client's patch level and the spec compliance version it's implementing before depending on OAuth for production workloads.

What's your current MCP auth setup, and what forced you to take it seriously? The "near-miss with a write tool" pattern seems to be a common forcing function - interested to hear what others have hit. Drop it in the comments.

Top comments (2)

Alex Shev • Jul 1

Tool-level RBAC is the part I would not skip. A single MCP server can expose harmless reads and dangerous writes under the same umbrella, so auth needs to understand the operation, not just the server. Audit trails per call also make incident review much less theatrical.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.