The runnable companion to my AgentCon HK 2026 talk, "Empower Team-Wide Vibe Coding with LLM Gateway and Security-First MCPs." The talk argued per-user OAuth is what turns a shared god-token into safe, auditable agent access. This is the wiring — explained at the architecture level. The full template.yaml, Lambda, and scripts live in the public repo linked at the end.
The gap nobody fills: your identity in front of a shared-key API
By 2026 most big SaaS shipped an official MCP server, and many even support OAuth. So "my tool has no MCP" and "my tool has no OAuth" are both fading problems — chasing vendors that lack one is a losing game; the list shrinks every week.
Tavily — the search API I use here — is actually a good citizen: its remote MCP supports both a shared key in the URL (https://mcp.tavily.com/mcp/?tavilyApiKey=<KEY>) and an OAuth flow. So why wrap it at all? Because even a vendor's OAuth answers a different question than the one a security team is asking:
- Shared keys are the default, and cost is why. Plenty of these tools price per seat. So the cheapest integration — the one teams actually ship — is one account's API key wired into a single shared MCP for the whole group. The instant that key is shared, per-user identity is gone: every call is the same caller, and "who ran this?" has no answer. Vendor OAuth only helps if everyone pays for their own seat, which is exactly the bill teams are dodging.
-
Claimed ≠ enforced. Even when a tool does attribute per user, it often rides on a value the client supplies — Tavily's
X-Human-Idheader is exactly this. The client asserts it, so nothing stops a caller from sending someone else's; the attribution is a courtesy, not a control. Trustworthy attribution has to come from a token your own gateway cryptographically verified, not a string the client typed.
And here's the part that doesn't shrink at all: the upstreams that matter most to you will never ship OAuth. Your internal billing API, the legacy claims system nobody wants to touch, the compliance-mandated third-party tool with a single team API key. For every one of those, the only auth is a shared key — shared by nature. And it's not only internal systems: plenty of public vendors are the same shape — Brave Search's own official MCP server, for instance, authenticates with a single BRAVE_API_KEY and no OAuth at all. One key, the whole team behind it.
That's the real, durable gap: no front door that authenticates callers as your identity, scopes them, and audits them — before handing off to a shared-key upstream. That's what this post builds. Tavily is just a free, public stand-in you can actually run; mentally swap it for your own shared-key API.
💡 The one idea: put a real OAuth 2.0 / OIDC authorization server bound to your identity (Amazon Cognito) — running the PKCE and client-credentials flows the MCP spec expects — in front of a shared-key upstream, in a single Lambda. Each caller gets their own scoped, cryptographically-verified, audited identity. The shared key never leaves the server.
Want the 30-second version first? Walk through the interactive demo — click through the whole OAuth flow, the scoped tool call, and the 403 when a caller reaches past its scope.
The architecture
Two pictures tell the whole story. The shared-key path — one key for everyone, whether it sits in a URL or a config file:
With the wrapper, each caller arrives as themselves — verified against your own identity provider — and the key is locked away server-side:
Three managed AWS pieces do the work, each with one clear job:
Amazon Cognito — the bouncer who issues the wristbands. It's the OAuth 2.0 / OIDC authorization server (with PKCE — the piece OAuth 2.1 and the MCP spec lean on). A caller proves who they are to Amazon Cognito, which hands back a short-lived signed token stamped with a scope (here, tavily-mcp/search). It handles both kinds of caller from the same pool: a backend agent authenticates machine-to-machine, a human logs in through a hosted login page with PKCE. Crucially, Amazon Cognito is your identity layer — swapping the demo's username/password for real corporate SSO (SAML, OIDC, or social logins like Google, Microsoft, Apple) is a configuration change, not a rebuild. The tokens, scopes, and everything downstream stay identical.
API Gateway HTTP API + its JWT authorizer — the wristband scanner at the door. Every request to the MCP endpoint — a plain JSON-RPC call over HTTP POST (the Streamable HTTP transport; this wrapper uses request/response POSTs only, not the transport's optional SSE streaming channel) — hits API Gateway first. Its built-in JWT authorizer — a native feature of the HTTP API (v2) flavour; the older REST API would need a Cognito or custom Lambda authorizer instead — checks the token's signature, issuer, and expiry at the edge, before a single line of your code runs. No token, expired token, forged token: rejected with a 401 right there. This is pure authentication: are you who the token says you are? Nothing reaches your logic until that passes. (One footgun worth flagging: Cognito access tokens carry no aud claim, so the authorizer's audience must be set to the Cognito app client ID — a mismatch here is the most common source of silent 401s in this stack.)
Lambda — the room you're actually allowed into. Once the token is valid, the Lambda does the authorization: it re-reads the scope to confirm this identity may call this specific tool, writes an audit line tying the action to that caller, then — and only then — reaches into Secrets Manager for the shared upstream key and calls Tavily. The key lives only inside this function's execution role; it is never sent back to the client.
The split is the whole point: API Gateway answers "is this a real, valid token?" and Lambda answers "is this identity allowed to do this, and let's record that they did." Authentication at the edge, authorization in your code, the secret sealed behind both.
Why type: http, not type: stdio
There's a reason this wrapper is a remote MCP server and not a local one — and it's the same reason the talk put OAuth-fronted remote servers at the centre of the architecture. An MCP client config gives you three transport choices, and the choice quietly decides your entire security story:
-
stdio— the client spawns a local process (npx some-mcp, a Python script) and talks to it over stdin/stdout. The catch: that process needs the upstream credential on the developer's machine. So the key lands inmcp.json, in anenvblock, in shell history, in a dotfile that gets synced to who-knows-where. Every laptop is now a copy of the shared key. This is the exact shape of the BYOAI / shared-key problem — one credential, sprayed everywhere, impossible to attribute or revoke cleanly. -
sse— the older remote transport. Remote is the right instinct, but plain SSE was only ever a transport; it never carried an auth story of its own, so in practice people bolted a static bearer token onto it and landed right back at "one shared secret." -
http(Streamable HTTP) — a remote endpoint the client reaches over plain HTTPS, and the transport the 2025 MCP spec standardised on. Crucially, the spec defines the MCP server as an OAuth resource server — so authentication is a first-class part of the transport, not an afterthought. The client config holds no secret at all; it just points at a URL and lets OAuth do the rest. (With an OAuth-native client the browser login is automatic; clients that don't yet drive the flow themselves lean on a small local helper to do it — more on that below.)
That last line is the whole pitch. Compare the two configs a developer actually writes:
// stdio — the secret lives on every laptop
{ "tavily": { "type": "stdio", "command": "npx",
"args": ["tavily-mcp"],
"env": { "TAVILY_API_KEY": "tvly-SHARED-KEY-everyone-has-this" } } }
// http — no secret, OAuth per user
{ "tavily": { "type": "http", "url": "https://…/mcp" } }
The http version has nothing to leak. The credential never reaches the client; identity is established per-user through Cognito; and the server — not the laptop — is the only thing that ever touches the upstream key. Whenever an upstream can be reached as a remote OAuth-fronted http server, it should be. This wrapper exists precisely to turn a stdio-shaped shared-key tool into an http-shaped one.
One endpoint, two kinds of caller
A backend agent and a human engineer authenticate through completely different OAuth flows — client_credentials for the machine, authorization_code + PKCE for the person. But they end up carrying the same scope and hitting the same endpoint. The server treats them identically; the only difference is what lands in the audit log — a client ID for the machine, an email for the human.
The last mile most "MCP + OAuth" posts skip
One piece of the human flow is easy to gloss over, and it's exactly where most "remote MCP + OAuth" walkthroughs quietly wave their hands: a static MCP client config (the claude_desktop_config.json kind) can't pop a browser for a Cognito login on its own. The fix is a thin local helper — registered as the client's stdio command — that checks for a cached token, triggers the PKCE browser login when it's missing or expired, stashes the result, and forwards each request with the Authorization: Bearer … header. This is exactly what tools like mcp-remote exist to do — and the rough edges around that login flow are a recurring source of confusion when people first wire up a remote MCP server. It's a small shim, but without it the "human logs in with PKCE" line is doing a lot of unspoken work. The repo includes a runnable PKCE script you can wire in as that helper.
The honest limitation
The Lambda is the final enforcement point — but the security of the whole scheme rests on the entire chain: the JWT authorizer validating token integrity at the edge, and the Lambda enforcing scope behind it. The upstream still sees one shared key; it has no idea which human is behind a call. All the per-user identity, scoping, and audit live in your layer. So the scheme reduces to a few disciplines:
- Lock the secret to only the Lambda's execution role. Anyone who can read it gets full upstream access.
- Make sure every route to the function goes through the JWT authorizer. A second unprotected trigger would bypass the whole thing.
-
Machine callers blur the human behind them. When a backend agent authenticates with
client_credentials, the audit log names the machine client, not the person who prompted it. For a human-triggered agent you've narrowed "who" to one service identity, not one person. Closing that last gap means propagating the user's own token down to the agent (OAuth 2.0 token exchange, RFC 8693) instead of falling back to a shared machine credential — worth knowing before you lean on M2M audit lines as proof of who did something. -
Mind the 30-second ceiling. API Gateway HTTP APIs cap each integration at 30 seconds. A search call returns in well under a second, so it's a non-issue here — but if you wrap a slower upstream (a heavy query, a multi-step scrape, a long-running agent tool), a call that overruns hits a
504at the gateway. That's the point where you'd reach for the transport's SSE streaming channel, or an async submit-then-poll pattern, instead of a single blocking POST.
Get those right and you've genuinely converted a shared god-token into per-user delegation. Get them wrong and you've just added a proxy in front of the same shared key. This is also why the audit log matters more than it looks: it's the only place "who ran what" is recoverable at all — Tavily's logs will forever show one key.
It works
Deployed to a real AWS account, the end-to-end path is exactly what you'd hope. A caller fetches an Amazon Cognito token, calls the MCP tool, and a real Tavily answer comes back through the wrapper — the client never touches the upstream key:
Answer: The Model Context Protocol (MCP) is a standardized framework
enabling AI models to access external data sources and tools securely…
1. What is the Model Context Protocol (MCP)?
https://www.databricks.com/blog/what-is-model-context-protocol
And the boundary holds: a request with no token, or a forged one, gets a 401 at the edge — rejected by API Gateway before the Lambda ever runs. Meanwhile every successful call writes an audit line naming the caller — client 59psk… or demo@example.com — something the upstream's shared-key logs physically cannot produce. (Commands and full output are in the repo.)
Why this generalizes
Swap Tavily for a legacy internal API, a SaaS whose own OAuth logs into its identity instead of yours, or a machine credential you don't want sprayed across developer laptops — the Lambda is the only thing that changes, and the payoff is the same. A leaked Amazon Cognito access token is short-lived and expires on its own, and you cut a compromised identity off at the source by revoking its refresh token (and disabling the user/app client) so it can't mint new ones — versus a leaked shared key, which means rotate-and-redeploy for the whole team. (Worth knowing: with the edge JWT authorizer, an already-issued access token stays valid until it expires; for instant kill-switch revocation you'd add a server-side check in the Lambda.) It ties straight back to the Golden Rule from the talk — if a human can't do it in the UI, the agent can't do it via MCP — because the agent inherits exactly the caller's scope, nothing more. That's what makes team-wide AI-assisted coding safe to roll out: developers plug a powerful shared-key tool straight into their own IDE, and security doesn't have to say no, because there's no shared credential on the laptop to leak — just the developer's own scoped, revocable session.
What does all this cost? Almost nothing.
Here's the part that makes this an easy yes: the entire control plane is serverless and pay-per-use, so for a team's real workload it rounds to a rounding error. Amazon Cognito is free up to 50,000 monthly active users. Lambda's free tier covers a million requests a month. An idle HTTP API costs nothing and is about $1 per million requests after that.
The one real decision is where the shared upstream key lives. An encrypted Lambda environment variable is free but readable by anyone with lambda:GetFunctionConfiguration and baked into the deployment — fine for a demo, not for production. SSM Parameter Store SecureString is also free and gives the same at-rest protection for a static key (KMS-encrypted, IAM-scoped reads) — the sweet spot for most single-key wrappers. AWS Secrets Manager costs ~$0.40/month but adds the operational layer you may actually need: automatic rotation, resource policies, cross-account sharing. I used Secrets Manager in the reference because it makes the lockdown story explicit, but Parameter Store swaps in with a few lines. Either way the boundary is identical: IAM grants read to only the Lambda's execution role, and the key never leaves the server.
You also get measurement nearly for free: because every call carries a real identity, your Lambda can log structured per-user, per-tool fields, and CloudWatch Logs Insights (or an EMF metric) turns them into per-identity rate limits, anomaly alerts, or usage you bill back to a team. The shared-key world gives you one undifferentiated blob of traffic and none of that.
Full source — template.yaml, the Lambda, both flow scripts, and a one-command teardown — is on GitHub. Clone it, point it at your own shared-key API, and you've got per-user OAuth in an afternoon.
Further reading
- The talk this builds on — "Empower Team-Wide Vibe Coding with LLM Gateway and Security-First MCPs" (Gabriel Koo & Rakshit Jain, AgentCon HK 2026).
-
Interactive demo — click through the full OAuth flow (login, scoped call, 403 on out-of-scope tool, and the
stdiovshttpconfig contrast). - Runnable demo & full source: github.com/gabrielkoo/tavily-oauth-mcp-wrapper




Top comments (0)