DEV Community

Cover image for How I Solved WebSocket Authentication in FastAPI (And Why Depends() Wasn't Enough)
Damla Hamurcu
Damla Hamurcu

Posted on

How I Solved WebSocket Authentication in FastAPI (And Why Depends() Wasn't Enough)

I'm a developer transitioning from enterprise systems to building my own products. This is one of the real problems I solved along the way.


The Problem

I was building a multi-tenant RAG chatbot. One backend serving AI chat widgets across multiple client websites. Each widget needed to identify itself to the backend, and I assumed I'd handle that the way I always had: with a header.

The stack was simple: FastAPI backend and a Vite frontend widget. That's it. I also wanted real-time streaming so users could see the AI response being generated word by word and stop it mid-stream if needed. Like we all do with ChatGPT. That meant WebSocket connections instead of plain REST APIs.

In order to manage chat history, knowledge base, leads and automations, I needed three pieces of information from the start: tenant id to keep the bot in personality and constrain its knowledge base, session id to track the chat session, and anonymous user id to identify returning users. That, my friends, is what I'll be calling tenant context from now on.

As you'd expect, establishing tenant context starts with validating the tenant. The original plan was straightforward — send the API key in a custom header, capture it with FastAPI's Header() and Depends(), let the backend do its magic and establish the connection.

However. Browser's WebSocket API doesn't let you set custom headers. If you were connecting server-to-server, headers would work fine. But from a browser widget? No-go.

So there I was. On one hand, I wanted to stick with WebSocket connections to achieve the streaming experience. On the other hand... security.


The Naive Approach

The most obvious workaround is using a query parameter instead of a header:

ws://api.example.com/chat?api_key=pk_live_abc123
Enter fullscreen mode Exit fullscreen mode

This works. But look at what's sitting in that URL. Your API key, right there in server logs, browser history, and every proxy between the client and your backend. That's a security problem.

And it gets worse. The WebSocket handshake is an HTTP GET upgrade request. There's no request body. So you can't sneak credentials in the body either. Query params are basically your only option for passing data during the handshake.

Even if I accepted that risk, there was a second problem: timing.

By the time my WebSocket endpoint received the connection, the client was already in. The connection was established. But my backend still needed to validate the API key, look up the tenant, set the session, and build the full context. All of that was happening after the handshake.

The result? A chatbot that looked broken. The widget would connect, but the user couldn't actually chat for a few seconds while the backend scrambled to set things up behind the scenes. Not a great first impression.

There's also a real security concern here. Once you call await websocket.accept(), that client has an open connection consuming server resources like memory, a file descriptor, a slot in your connection manager. If you accept first and validate later, anyone can open thousands of connections with garbage credentials and exhaust your server. Classic denial-of-service vector.

It became clear pretty quickly that trying to do everything — authentication, context setup, and chat — within a single WebSocket connection was limiting and convoluted.


The "Aha" Moment

I split the process into two separate API calls.

Step 1: A plain REST POST request. The widget sends the API key in the header (where it belongs). The backend validates the tenant, creates or refreshes the session, and returns a short-lived JWT token — good for maybe 5 minutes, just long enough to establish the WebSocket connection.

Step 2: The widget opens a WebSocket connection with that token in the query.

POST https://api.example.com/session   →  returns token
 WS  ws://api.example.com/chat?token=eyJhbGciOi…  →  establishes connection
Enter fullscreen mode Exit fullscreen mode

Now the API key never appears in the WebSocket URL. The token in the query is short-lived, scoped to a specific session, and doesn't expose the actual credential. Even if someone intercepts it, it expires in minutes.

And here's a side benefit I didn't expect. While the frontend waits for the token from that first REST call, the UI can show a "Connecting..." indicator. Then when the WebSocket opens, the bot is immediately ready to chat, no awkward delay. The perceived latency actually dropped compared to the single-connection approach.

Two calls. More secure. Feels faster. I'll take it.


The Solution, Step by Step

Let me walk you through the actual implementation. I'm showing simplified versions here to keep the focus on the pattern. In production, you'll want to add UUID validation, session expiry handling, and tenant ownership checks on top of this.

Here's the flow:

Widget                          Backend
  │                                │
  │── POST /session ──────────────>│  (API key in header)
  │   Body: { session_id?, user_id? }
  │<── { token, session_id } ──────│
  │                                │
  │── WS /chat?token=eyJhbG... ──>│  (validate token before accept)
  │<════════ connected ════════════│
  │                                │
  │── { message: "hello" } ──────>│
  │<── { chunk: "Hi! How can..." } │  (streaming response)
Enter fullscreen mode Exit fullscreen mode

Step 1: The REST Endpoint

When the widget loads, it makes a POST request to create or refresh a session.

@router.post("/session")
async def create_or_refresh_session(
    request: SessionCreateRequest,
    tenant: Tenant = Depends(get_tenant_from_publishable_key),
    session_service: ChatSessionService = Depends(get_chat_session_service),
):
    if request.session_id:
        # Returning user — refresh the existing session
        result = await session_service.refresh_session(
            session_id=UUID(request.session_id),
            tenant_id=tenant.id,
        )
    else:
        # New visitor — create a fresh session
        result = await session_service.create_session(
            tenant_id=tenant.id,
            anonymous_user_id=(
                UUID(request.anonymous_user_id) 
                if request.anonymous_user_id else None
            ),
        )
    return SessionCreateResponse(**result)  # includes the JWT token
Enter fullscreen mode Exit fullscreen mode

Notice what's happening here. Depends(get_tenant_from_publishable_key) does the heavy lifting. It pulls the API key from the header, validates it against the database, and returns the tenant. Classic FastAPI dependency injection, clean and testable.

Here's what that dependency looks like under the hood:

async def get_publishable_key_from_header(
    x_api_key: Annotated[str | None, Header(alias="api-key")] = None,
) -> str:
    if not x_api_key:
        raise HTTPException(
            status_code=401,
            detail="Missing API key. Include api-key header.",
        )
    return x_api_key


async def get_tenant_from_publishable_key(
    api_key: str = Depends(get_publishable_key_from_header),
    tenant_service: TenantService = Depends(get_tenant_service),
) -> Tenant:
    tenant = await tenant_service.validate_key(api_key)
    if not tenant:
        raise HTTPException(status_code=401, detail="Invalid API key")
    return tenant
Enter fullscreen mode Exit fullscreen mode

The response includes a JWT containing the session id, tenant id, and anonymous user id — everything the WebSocket endpoint needs to establish context without another database lookup.

Step 2: The WebSocket Endpoint

Once the frontend has the token, it opens the WebSocket connection:

@router.websocket("/ws")
async def websocket_endpoint(
    websocket: WebSocket,
    token: str = Query(..., description="JWT from POST /session"),
    session_service: ChatSessionService = Depends(get_chat_session_service),
    connection_manager: ConnectionManager = Depends(get_connection_manager),
):
    # Validate BEFORE accepting the connection
    session_context = validate_and_decode_token(token)

    if not session_context:
        await websocket.close(code=4001, reason="Invalid or expired token")
        return

    # Token is valid — now we accept
    await websocket.accept()

    try:
        while True:
            data = await websocket.receive_json()
            # Handle messages — all business logic delegated to handlers
            await handle_message(data, session_context, websocket)

    except WebSocketDisconnect:
        logger.info(f"Disconnected: {session_context.session_id}")

    finally:
        await cleanup_session(session_context.session_id, websocket)
Enter fullscreen mode Exit fullscreen mode

The key line is right at the top. validate_and_decode_token(token) runs before websocket.accept(). If the token is invalid or expired, we close the connection immediately. The client never gets in. No wasted resources. No DoS vector.

Here's what the token validation looks like:

def validate_and_decode_token(token: str) -> SessionContext | None:
    try:
        payload = jwt.decode(token, SECRET_KEY, algorithms=["HS256"])
        return SessionContext(
            session_id=payload["session_id"],
            tenant_id=payload["tenant_id"],
            anonymous_user_id=payload.get("anonymous_user_id"),
        )
    except jwt.ExpiredSignatureError:
        return None
    except jwt.InvalidTokenError:
        return None
Enter fullscreen mode Exit fullscreen mode

Simple. Decode the JWT, extract the context, return it. If anything is wrong, return None and the WebSocket endpoint handles the rejection.

What the Frontend Does

For completeness, here's what the widget side looks like in essence:

// Step 1: Get the session token
const res = await fetch("https://api.example.com/session", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "api-key": "pk_live_abc123",  // safe in a header
  },
  body: JSON.stringify({
    session_id: existingSessionId || undefined,
    anonymous_user_id: visitorId || undefined,
  }),
});
const { token } = await res.json();

// Step 2: Connect WebSocket with the token
const ws = new WebSocket(
  `wss://api.example.com/ws?token=${token}`
);

ws.onopen = () => {
  console.log("Connected and ready to chat");
};
Enter fullscreen mode Exit fullscreen mode

Two requests. First one authenticates and sets up the session. Second one connects. By the time the WebSocket opens, the backend already knows exactly who it's talking to.


Why This Pattern Matters Beyond My Project

When I first split authentication into two calls, it felt like I was overcomplicating things. Two requests where one should work? Surely there's a simpler way.

There isn't. And once you see why, it becomes obvious.

This is just separation of concerns applied to WebSocket authentication. REST does what REST is good at: request-response with full HTTP features like headers, status codes, and standard error handling. WebSocket does what it's good at: persistent bidirectional communication. Each protocol handles the part it was designed for.

Once you see it that way, putting authentication inside the WebSocket connection feels like using a screwdriver to hammer a nail. It technically works. But you're fighting the tool.

This pattern also gives you reconnection for free. WebSocket connections drop. They just do. Mobile networks, laptop sleep, flaky Wi-Fi. When that happens, the widget calls the REST endpoint again, gets a fresh token, and reconnects. The user sees "Reconnecting..." for a second and then they're back. No re-authentication headache.

After I built this, I started noticing the same two-step pattern described in WebSocket authentication discussions everywhere. Turns out it's the standard approach. I just had to arrive at it the hard way.


What I'd Do Differently

If I could go back and talk to past-me at the start of this project, I'd say two things.

First: stop trying to do everything inside the WebSocket connection. I kept pushing authentication, context setup, and session management into the handshake because that's what felt "right" coming from enterprise apps where one connection means one authentication flow. It introduced delay, complexity, and a chatbot that looked broken on load. The moment I unlearned that assumption and just used two calls, everything clicked.

Second: think harder about token duration. I set it to 5 minutes and moved on. But this choice matters more than I initially gave it credit for. Too long, and you've essentially recreated the API-key-in-the-query problem — a token sitting in the URL for minutes is an unnecessary exposure window. Too short, and users on slow networks can't establish the WebSocket connection before the token expires. There's a sweet spot and it depends on your use case. Test it. Don't just pick a number and forget about it like I almost did.


Quick Recap

Here's the pattern in short:

  • Browser WebSocket API doesn't support custom headers. Don't fight it.
  • Don't put API keys in query params. They end up in logs, browser history, and proxies.
  • Split authentication into two steps. REST call first to validate and issue a short-lived JWT. WebSocket connection second, with that token.
  • Validate the token before accepting the WebSocket connection. This prevents unauthorized connections from consuming server resources.
  • Use the REST endpoint for reconnection too. When the WebSocket drops, the widget gets a fresh token and reconnects seamlessly.

Remember, it's two calls because that's the right way to do it.


If you've hit this same wall or solved it differently, I'd love to hear about it in the comments.

Top comments (0)