DEV Community

h6o
h6o

Posted on

Securely Exposing a Stateful MCP Server on Cloud Run (n8n Playwright MCP Example)

TL;DR

  • I wanted to operate pages that require Google login from n8n via Playwright MCP
  • The sidecar approach is easy, but has gaps from the perspectives of authentication and team isolation
  • I built defense-in-depth with ingress: internal + IAM (roles/run.invoker) + service-to-service auth via ID tokens + a Go auth-proxy + Secret Manager
  • For stateful MCP, set maxScale=1 to stop scale-out and prevent sessions from jumping to another instance

Intended Audience

  • People who want to run an MCP server on Cloud Run
  • People who want to automate operations on pages that require Google login using Playwright MCP
  • People who share n8n across multiple teams and want to handle pages requiring per-team Google logins via Playwright MCP
  • People who want to set up Cloud Run service-to-service authentication (ID tokens + IAM) in a practical way

Background

The starting point was: I wanted to operate and capture pages that require Google login, like Looker Studio, from n8n workflows. Playwright MCP looked like it could make this work, so I tried it. But once I tried to put it into operation, I ran into the following challenges.

  • Since n8n is shared across multiple teams, I want to switch login states per team account
  • I don’t want a Playwright MCP endpoint that "anyone can hit" in the first place

The Problem: Gaps in the Sidecar Setup

The first thing that comes to mind is running playwright-mcp as a sidecar in the same Cloud Run instance as n8n. It's easy, but it has gaps with respect to the challenges above.

[Before: dangerous setup]
┌─ Cloud Run instance ─────────────────────────┐
│  n8n (port 5678)                             │
│        │ localhost:3000 (no auth required)   │
│        ▼                                     │
│  playwright-mcp (port 3000)                  │
│        ※ holds Google-logged-in session      │
└──────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Since it's a sidecar, playwright-mcp isn't visible from outside. However, n8n inside the same instance can hit localhost:3000 without any authentication.

Because playwright-mcp is holding the Google-logged-in state (Storage State):

  • Anyone who can build n8n workflows can use the shared login credentials
  • It can't accommodate use cases where each team needs a different account

That's the result. Removing the sidecar and splitting it into a separate Cloud Run service decouples them, but just splitting it leaves "should it be exposed to the internet or only inside the VPC?" and "how do we authenticate?" up in the air.

Solution Architecture

In the end, I adopted the following setup.

[After: defense-in-depth setup]

n8n (Cloud Run, ingress=internal)
 │  Mcp-Auth-Key: <per-team API key>
 │  Path: /playwright-mcp-team-a/...
 │
 │  ※ n8n has vpc-access-egress=all-traffic, so traffic
 │    is routed to internal Cloud Run via the VPC
 ▼
auth-proxy (Cloud Run, ingress=internal)
 │  - Verifies Mcp-Auth-Key
 │  - Picks the backend by the first URL segment
 │  - Attaches its own service account's ID token and forwards
 ▼
playwright-mcp-team-a (Cloud Run, ingress=internal, maxScale=1)
   - IAM: roles/run.invoker is granted only to the auth-proxy SA
   - On receipt, Cloud Run verifies the ID token
   - Storage State is mounted via Secret Manager
Enter fullscreen mode Exit fullscreen mode

There are 4 defensive layers. They are gates stacked in series, and if any one of them is breached, the meaning of the remaining layers weakens.

Layer Role
Network Block direct access from outside with ingress: internal
IAM Grant roles/run.invoker only to the auth-proxy SA, excluding other principals
Service-to-svc auth The auth-proxy presents a Google-signed ID token in the Authorization header
Application The auth-proxy verifies the Mcp-Auth-Key and routes to the backend for each team

I'll cover how the ID token and IAM mesh together in detail in the service-to-service authentication section below.

Persisting Google Login State: storage-state

Playwright has a --storage-state option that lets you save and reuse logged-in session info (cookies and localStorage) as a file. Storing this in Secret Manager and mounting it as a Cloud Run volume lets you keep the login state even after a cold start.

volumes:
  - name: playwright-storage-state
    secret:
      secretName: PLAYWRIGHT_STORAGE_STATE
      items:
        - key: "1"
          path: storage-state.json
containers:
  - name: playwright-mcp
    args:
      - "--storage-state=/etc/playwright/storage-state.json"
    volumeMounts:
      - name: playwright-storage-state
        mountPath: /etc/playwright
        readOnly: true
Enter fullscreen mode Exit fullscreen mode

When you want to update the login state, just log in again on another machine and register the new storage-state.json as a new version in Secret Manager. Restarting the service will pick it up.

Implementing mcp-auth-proxy (Go)

I implemented a lightweight service handling authentication and reverse proxying in Go. The reverse-proxy foundation is just the standard library's net/http/httputil.ReverseProxy, and I only add google.golang.org/api/idtoken for getting ID tokens.

Backend List

This is the only place you touch when adding a new team.

var routeSpecs = []struct {
    PathID        string
    APIKeyEnv     string
    BackendURLEnv string
}{
    {
        PathID:        "playwright-mcp-team-a",
        APIKeyEnv:     "TEAM_A_PLAYWRIGHT_MCP_KEY",
        BackendURLEnv: "TEAM_A_PLAYWRIGHT_MCP_URL",
    },
    // To add team B, add one element here
}
Enter fullscreen mode Exit fullscreen mode

Authentication

It's a simple mechanism that just compares the value of the Mcp-Auth-Key header to an environment variable's key. subtle.ConstantTimeCompare is used to avoid timing attacks.

const authHeader = "Mcp-Auth-Key"

func authenticate(r *http.Request, routes map[string]*route) (string, *route, bool) {
    providedKey := r.Header.Get(authHeader)
    if providedKey == "" {
        return "", nil, false
    }

    // Pick the backend by the first segment of the URL
    routeID := strings.TrimPrefix(r.URL.Path, "/")
    if i := strings.IndexByte(routeID, '/'); i > 0 {
        routeID = routeID[:i]
    }

    matched, found := routes[routeID]
    if !found {
        return routeID, nil, false
    }

    if subtle.ConstantTimeCompare([]byte(matched.apiKey), []byte(providedKey)) != 1 {
        return routeID, nil, false
    }

    return routeID, matched, true
}
Enter fullscreen mode Exit fullscreen mode

How Cloud Run Service-to-Service Authentication Works

When the auth-proxy calls a backend Cloud Run service, it uses Cloud Run service-to-service authentication. This is a two-stage mechanism: "the caller proves who they are with a Google-signed ID token, and the receiving Cloud Run service compares it against the IAM policy to decide whether to admit it."

[auth-proxy SA] ── Authorization: Bearer <ID token (aud=backend URL)> ──▶ [Cloud Run frontend]
                                                                            ① Verify ID token
                                                                            ② Check via IAM whether
                                                                               the issuing principal
                                                                               has roles/run.invoker
                                                                            ③ OK → route to container
                                                                               NG → return 403
Enter fullscreen mode Exit fullscreen mode

A common misunderstanding here is that "as long as you send an ID token, you can call it" is not true. The actual decision lives on the IAM side. The ID token is an ID proving "who is calling," and "whether to let that identity in" is determined by who has been granted roles/run.invoker.

Grantee of roles/run.invoker Behavior
allUsers Anyone can call it without an ID token (open to the internet)
A specific service account Only ID tokens issued by that SA can call it
Not granted No one can call it

This time, I grant roles/run.invoker only to the auth-proxy's service account. With ingress: internal blocking direct external access, IAM also blocks direct hits from other services inside the VPC.

What Happens If You Set allUsers

A common antipattern is "it doesn't work, so I'll just grant invoker to allUsers." If you do that:

  • Even with ingress: internal left in place, any resource inside the VPC can hit it without an ID token
  • If you have ingress: all, anyone on the internet can hit it without an ID token

In other words, playwright-mcp effectively becomes a wild API. Since the Storage State carries Google-logged-in credentials, the damage isn't limited to data leakage—it can extend to all resources operable with that account's permissions. It's appropriate to keep checking grants of roles/run.invoker constantly during implementation.

Where ID Tokens Come From and How They're Refreshed

The official documentation lists multiple retrieval paths, but when calling Cloud Run from a service running on Cloud Run, in practice the source is consolidated into a single metadata server.

  • Querying the metadata server (http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/identity?audience=...) returns an ID token for the service account bound to the instance
  • The token's lifetime is about 1 hour, and you need to fetch a new one before it expires
  • Google's official auth libraries (in Go, google.golang.org/api/idtoken) hit the same metadata server internally and handle retrieval, caching, and refresh for you

The other options listed in the official docs (Workload Identity Federation and downloaded service account keys) are mechanisms for calling Cloud Run "from outside Google Cloud." In our case, where we run on Cloud Run, the metadata server is directly usable, so there's no reason to adopt them. Distributing SA keys as files in particular brings in the separate operational headache of key storage and rotation, which is even more reason to avoid it.

In implementation terms, the choice boils down to "hit the metadata server yourself" or "delegate to the auth library," but there's not much reason to choose the former. Including token-expiration handling and cache consistency under concurrent requests, leaning on the library results in fewer accidents.

Caller Code

On the calling side, you create a client by passing the audience (the receiving service's origin URL) to idtoken.NewClient. Specify https://<service>.run.app of the destination Cloud Run as audience. This is the value placed in the ID token's aud claim, which the receiving Cloud Run uses to determine "is this token addressed to me?"

client, _ := idtoken.NewClient(ctx, audience) // audience = "https://<backend>.run.app"
prefix := "/" + spec.PathID                    // e.g. "/playwright-mcp-team-a"

proxy := &httputil.ReverseProxy{
    Director: func(r *http.Request) {
        r.URL.Scheme = backendURL.Scheme
        r.URL.Host = backendURL.Host
        r.Host = backendURL.Host
        r.URL.Path = strings.TrimPrefix(r.URL.Path, prefix)
        r.Header.Del(authHeader) // don't forward the API key to the backend
    },
    Transport: client.Transport, // automatically attaches and refreshes ID tokens
}
Enter fullscreen mode Exit fullscreen mode

The key is passing client.Transport to ReverseProxy.Transport. With just this, every request the auth-proxy relays automatically gets an ID token (fetched from the metadata server) attached and refreshed. ReverseProxy can also pass through long-lived streaming responses like SSE as-is, so it pairs well with Streamable HTTP MCP.

A Stateful Caveat: maxScale=1

Authentication is now plugged, but Playwright MCP also has the operational constraint that it fundamentally can't scale out.

Why Streamable HTTP Is Stateful

The transport currently recommended for MCP is Streamable HTTP. To make sense of why this is stateful, you need to grasp two things: "the difference from a regular POST" and "what MCP is actually exchanging."

The Difference Between a Regular POST and SSE

Roughly speaking:

  • A regular HTTP POST is "exchanging letters." The client sends one letter, the server writes one reply, and that's it.
  • SSE (Server-Sent Events) is "a phone call." Once connected, the server can speak as many times as it wants, whenever it wants. The line stays open.

For example, consider asking Playwright MCP to "take a screenshot of this page." The internal processing is "navigate to page → wait for load → scroll → capture → encode," which takes a fair amount of time.

With a regular POST, what the client sees is something like:

client ──"take a shot"──▶ server
(10 seconds pass; nothing happens)
client ◀──"here's your shot (image data)"── server
Enter fullscreen mode Exit fullscreen mode

Until the entire body is complete, nothing reaches the client. Meanwhile, you can't even tell whether it's "dead or working," so it's not suited for long-running jobs.

With SSE, the same processing looks like:

client ──"take a shot"──▶ server
client ◀──"navigated to page"── server    (connection still open)
client ◀──"waiting for load"── server
client ◀──"scrolled"── server
client ◀──"here's your shot (image data)"── server
(server closes here)
Enter fullscreen mode Exit fullscreen mode

The actual response body is Content-Type: text/event-stream, with text appended bit by bit, like this:

data: {"jsonrpc":"2.0","method":"notifications/progress","params":{"progress":30}}

data: {"jsonrpc":"2.0","method":"notifications/progress","params":{"progress":70}}

data: {"jsonrpc":"2.0","id":1,"result":{"image":"..."}}

Enter fullscreen mode Exit fullscreen mode

A data: line plus one blank line is the boundary for one message. The client can process each message incrementally as it reads the response body.

MCP's spec says "if the response is short, you may return regular application/json" and "if you want to return multiple messages, you may use SSE," and the server switches based on the situation.

What Sessions Are For

That covers "one request's worth," but there's another concept one level above between MCP clients and servers: the session. The reason is that MCP itself is a stateful protocol. Specifically:

  • When a connection is opened, the client first sends initialize, negotiating each side's capabilities. It's here that "what tools this server has" and "what notifications it supports" are determined and assumed thereafter
  • The subscribe state of resources (like "notify me when this file changes") is also remembered by the server

The ID that links these states to "which client they belong to" is the Mcp-Session-Id header. The server issues it in the initialize response, and the client includes the same value in every subsequent request. It’s easier to picture as a cookie translated into an HTTP header.

What Playwright MCP Carries

A Playwright MCP session, on top of the MCP protocol state above, is tied to a live Chromium process + open pages + cookies + ongoing operations. These are live process state stuck to a particular Cloud Run instance's memory and OS resources, so transferring them to another instance isn't realistic.

In other words, it's not the kind of thing where "if you save the session ID somewhere, another instance can pick up where you left off," which is the key point.

Compatibility With Cloud Run Scaling

Cloud Run grows instances based on request count. If a Streamable HTTP client's second-or-later request lands on a different instance, of course no session exists there, and it fails with Session not found.

The most reliable countermeasure is not to grow the instances.

metadata:
  annotations:
    autoscaling.knative.dev/maxScale: "1"
    autoscaling.knative.dev/minScale: "0" # collapse to zero when not in use
Enter fullscreen mode Exit fullscreen mode

For internal batch use cases or low-headcount interactive use cases, one instance is usually enough. Leaving minScale=0 keeps cost down to just cold-start requests.

Note: the auth-proxy itself is stateless and could scale out, but in this case the caller is limited to a single n8n service and traffic is light, so I match it with maxScale=1.

"Why Not Session Affinity?"

You might think, "Rather than fixing the instance, can't we just stick the same client to the same instance?" Cloud Run does have session affinity, and it looks like it could work. But it doesn't help Playwright MCP. There are two reasons.

  1. Affinity is best-effort, and doesn't keep instances alive. The official docs explicitly say "do not use it to store server-side session data that needs to persist across requests and cannot easily be reconstructed." Affinity breaks at any of: scale-in, max concurrency, or CPU limits, and at that moment you lose the live Chromium process along with it. A session that holds "state that can't be reconstructed"—exactly our case—is the very use case the official docs name and recommend against.
  2. The identification paths don't line up. Cloud Run affinity identifies clients via a proprietary cookie issued by the GCLB, but MCP's session identifier is the Mcp-Session-Id header. The two are unrelated, and there's no guarantee an MCP client retains and sends back that cookie.

The conventional approach to surviving scale-out is "offload state to an external store like Redis, and keep instances themselves stateless." MCP's protocol state (capabilities and subscribe state) can be externalized this way, but a live Chromium process + open pages + ongoing operations isn't the kind of thing you can serialize and offload. Affinity is an optimization for apps that can "rebuild state when broken," and for Playwright MCP—where the live process itself is the session—it doesn't act as a fix, only as an optimization.

In the end, since you can't externalize the state, the only sure move is not to grow instances. When you need to scale, expand by "adding services per team" rather than "adding more replicas of one service."

Summary

Issue Solution
Sidecar is hit straight through from n8n Switch to going via auth-proxy and authenticate with Mcp-Auth-Key
Direct reach from the internet Eliminate external entry points with ingress: internal
Direct hits from other services in the VPC Grant roles/run.invoker only to the auth-proxy SA
Identity proof for service-to-service traffic Auto-attach and auto-refresh ID tokens with idtoken.NewClient
Session isolation between teams Route by the first URL segment and per-team API keys
Maintaining Google login state Mount Storage State via Secret Manager
Stateful and unscalable Pin instances with maxScale=1 (scale per team)

When you plug things at the four layers of network, IAM, service-to-service auth, and application, attack surfaces—each of which can't stand on its own—are eliminated together. This setup should work as a general-purpose pattern for safely exposing stateful MCP servers on Cloud Run, not just Playwright MCP. I hope it's useful for teams that want to expose an MCP server internally but are unsure how to wire up authentication.

Top comments (0)