OAuth Token Vault Patterns for AI Agents

#ai #security #oauth #python

Your AI agent just got OAuth tokens for a user's Google Calendar. Now what?

If the answer is "store them in a variable and use them until they expire," you have a security problem. I have built production AI systems that handle user tokens for API calls, and the number of agents holding raw access tokens in memory with zero protection is genuinely alarming.

Let's fix that. Here are four battle-tested patterns for secure token management in AI agent systems.

The Problem: Agents Are Terrible Token Custodians

Most agent frameworks treat OAuth tokens like any other string. Fetch a token, stuff it in the agent's context or a plain database column, and make API calls. This creates three real problems:

Token leakage through logs and traces. Agents produce verbose logs. Raw tokens end up in debug output, error messages, and observability platforms.
Refresh race conditions. Two agent threads hit a token expiry at the same time. Both try to refresh. One gets a new token, the other gets an invalid grant error. The user's session breaks.
Blast radius. If your agent's memory or database is compromised, every user token is exposed in plaintext. That is not a theoretical risk - it is the default state of most agent systems today.

Pattern 1: Encrypted-at-Rest Vault

The simplest upgrade. Encrypt tokens before they hit storage, decrypt only when needed for an API call.

Use AES-256-GCM with envelope encryption. The token encryption key (DEK) is itself encrypted by a master key (KEK) stored in a KMS like AWS KMS or GCP Cloud KMS. This way, even if your database is dumped, the tokens are ciphertext.

from cryptography.hazmat.primitives.ciphers.aead import AESGCM
import os

def encrypt_token(token: str, dek: bytes) -> bytes:
    nonce = os.urandom(12)
    aesgcm = AESGCM(dek)
    ciphertext = aesgcm.encrypt(nonce, token.encode(), None)
    return nonce + ciphertext  # prepend nonce for decryption

def decrypt_token(encrypted: bytes, dek: bytes) -> str:
    nonce, ciphertext = encrypted[:12], encrypted[12:]
    aesgcm = AESGCM(dek)
    return aesgcm.decrypt(nonce, ciphertext, None).decode()

Good for: single-service agents where you control the full stack. Adds real protection with minimal architectural change.

Limitation: the agent still sees the raw token after decryption. If the agent process is compromised, tokens are exposed in memory.

Pattern 2: Token Broker Service

This is the pattern I recommend for most production systems. The agent never touches the raw token. Instead, a broker service sits between the agent and the target API.

The agent sends a request like "fetch this user's calendar events" with a session reference. The broker looks up the user's encrypted token, decrypts it, makes the API call, and returns the response. The agent only ever sees the API response data, never the credential.

// token-broker/src/proxy.ts
import express from "express";
import { decryptToken } from "./vault";
import { refreshIfExpired } from "./oauth";

const app = express();

app.post("/proxy", async (req, res) => {
  const { sessionId, targetUrl, method, body } = req.body;

  // Verify the agent's session is valid
  const session = await verifyAgentSession(sessionId);
  if (!session) return res.status(401).json({ error: "Invalid session" });

  // Retrieve and decrypt the user's token - agent never sees this
  let token = await decryptToken(session.userId, session.provider);
  token = await refreshIfExpired(token, session.userId, session.provider);

  // Make the actual API call
  const response = await fetch(targetUrl, {
    method: method || "GET",
    headers: {
      "Authorization": `Bearer ${token.accessToken}`,
      "Content-Type": "application/json",
    },
    body: body ? JSON.stringify(body) : undefined,
  });

  const data = await response.json();
  res.json({ status: response.status, data });
});

The broker handles refresh token rotation atomically, so you eliminate the race condition problem entirely. It also gives you a single chokepoint for audit logging - every token use goes through one service.

Good for: multi-agent systems, any system where agents run in untrusted or semi-trusted environments.

Pattern 3: Short-Lived Scoped Tokens (RFC 8693)

OAuth 2.0 Token Exchange (RFC 8693) lets you swap a broad token for a narrow, short-lived one. Your vault holds the user's refresh token. When an agent needs access, you exchange it for a token that is scoped to exactly what the agent needs and expires in minutes.

The agent gets a token that can read calendar events for the next 5 minutes. It cannot modify contacts, send emails, or do anything else. If that token leaks, the damage is minimal.

This works well with Google's domain-wide delegation and Azure's on-behalf-of flow. The key insight is that the agent's token should have the minimum scope and shortest lifetime that gets the job done.

Pattern 4: Hardware-Backed Keys (TPM/HSM)

For high-security contexts, like healthcare or financial services, you can bind token decryption to a hardware security module. The master key never leaves the HSM. Token decryption requires a call to the HSM, which provides tamper-evident audit logs and rate limiting at the hardware level.

This is the most expensive pattern and usually overkill for most agent systems. But if you are handling tokens that grant access to medical records or financial accounts, the compliance requirements often mandate it.

When to Use Which Pattern

Here is a quick decision matrix:

Scenario	Pattern	Why
Single agent, you control infra	Encrypted-at-rest vault	Simple, effective, low overhead
Multi-agent system, agents in sandboxes	Token broker	Agents never see tokens, atomic refresh
Agents need minimal access	Short-lived scoped tokens	Least privilege, time-bounded
Regulated industry (health, finance)	Hardware-backed keys	Compliance, tamper evidence
Production system with real users	Broker + scoped tokens	Best of both, defense in depth

For most teams shipping AI agents today, the token broker pattern gives you the best security-to-effort ratio. You get token isolation, centralized refresh logic, and audit logging in one service. Layer in short-lived scoped tokens when you need fine-grained access control.

The common thread across all four patterns is this: treat tokens as radioactive material. Minimize who touches them, minimize how long they are exposed, and log every access. Your agent's job is to call APIs on behalf of users. It does not need to be the custodian of their credentials.

Getting Started

If you are building an agent system today, start with Pattern 2. Stand up a simple broker service, move your token storage behind it, and add refresh logic. You can implement it in an afternoon and it immediately eliminates the largest class of token security issues.

The code examples above are starting points. In production, add rate limiting on the broker, mutual TLS between agent and broker, and token usage anomaly detection.

I build production AI systems that handle auth securely. If you are working on agent infrastructure and want to compare notes, find me at astraedus.dev.