Your AI agent just got OAuth tokens for a user's Google Calendar. Now what?
If the answer is "store them in a variable and use them until they expire," you have a security problem. I have built production AI systems that handle user tokens for API calls, and the number of agents holding raw access tokens in memory with zero protection is genuinely alarming.
Let's fix that. Here are four battle-tested patterns for secure token management in AI agent systems.
The Problem: Agents Are Terrible Token Custodians
Most agent frameworks treat OAuth tokens like any other string. Fetch a token, stuff it in the agent's context or a plain database column, and make API calls. This creates three real problems:
Token leakage through logs and traces. Agents produce verbose logs. Raw tokens end up in debug output, error messages, and observability platforms.
Refresh race conditions. Two agent threads hit a token expiry at the same time. Both try to refresh. One gets a new token, the other gets an invalid grant error. The user's session breaks.
Blast radius. If your agent's memory or database is compromised, every user token is exposed in plaintext. That is not a theoretical risk - it is the default state of most agent systems today.
Pattern 1: Encrypted-at-Rest Vault
The simplest upgrade. Encrypt tokens before they hit storage, decrypt only when needed for an API call.
Use AES-256-GCM with envelope encryption. The token encryption key (DEK) is itself encrypted by a master key (KEK) stored in a KMS like AWS KMS or GCP Cloud KMS. This way, even if your database is dumped, the tokens are ciphertext.
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
import os
def encrypt_token(token: str, dek: bytes) -> bytes:
nonce = os.urandom(12)
aesgcm = AESGCM(dek)
ciphertext = aesgcm.encrypt(nonce, token.encode(), None)
return nonce + ciphertext # prepend nonce for decryption
def decrypt_token(encrypted: bytes, dek: bytes) -> str:
nonce, ciphertext = encrypted[:12], encrypted[12:]
aesgcm = AESGCM(dek)
return aesgcm.decrypt(nonce, ciphertext, None).decode()
Good for: single-service agents where you control the full stack. Adds real protection with minimal architectural change.
Limitation: the agent still sees the raw token after decryption. If the agent process is compromised, tokens are exposed in memory.
Pattern 2: Token Broker Service
This is the pattern I recommend for most production systems. The agent never touches the raw token. Instead, a broker service sits between the agent and the target API.
The agent sends a request like "fetch this user's calendar events" with a session reference. The broker looks up the user's encrypted token, decrypts it, makes the API call, and returns the response. The agent only ever sees the API response data, never the credential.
// token-broker/src/proxy.ts
import express from "express";
import { decryptToken } from "./vault";
import { refreshIfExpired } from "./oauth";
const app = express();
app.post("/proxy", async (req, res) => {
const { sessionId, targetUrl, method, body } = req.body;
// Verify the agent's session is valid
const session = await verifyAgentSession(sessionId);
if (!session) return res.status(401).json({ error: "Invalid session" });
// Retrieve and decrypt the user's token - agent never sees this
let token = await decryptToken(session.userId, session.provider);
token = await refreshIfExpired(token, session.userId, session.provider);
// Make the actual API call
const response = await fetch(targetUrl, {
method: method || "GET",
headers: {
"Authorization": `Bearer ${token.accessToken}`,
"Content-Type": "application/json",
},
body: body ? JSON.stringify(body) : undefined,
});
const data = await response.json();
res.json({ status: response.status, data });
});
The broker handles refresh token rotation atomically, so you eliminate the race condition problem entirely. It also gives you a single chokepoint for audit logging - every token use goes through one service.
Good for: multi-agent systems, any system where agents run in untrusted or semi-trusted environments.
Pattern 3: Short-Lived Scoped Tokens (RFC 8693)
OAuth 2.0 Token Exchange (RFC 8693) lets you swap a broad token for a narrow, short-lived one. Your vault holds the user's refresh token. When an agent needs access, you exchange it for a token that is scoped to exactly what the agent needs and expires in minutes.
The agent gets a token that can read calendar events for the next 5 minutes. It cannot modify contacts, send emails, or do anything else. If that token leaks, the damage is minimal.
This works well with Google's domain-wide delegation and Azure's on-behalf-of flow. The key insight is that the agent's token should have the minimum scope and shortest lifetime that gets the job done.
Pattern 4: Hardware-Backed Keys (TPM/HSM)
For high-security contexts, like healthcare or financial services, you can bind token decryption to a hardware security module. The master key never leaves the HSM. Token decryption requires a call to the HSM, which provides tamper-evident audit logs and rate limiting at the hardware level.
This is the most expensive pattern and usually overkill for most agent systems. But if you are handling tokens that grant access to medical records or financial accounts, the compliance requirements often mandate it.
When to Use Which Pattern
Here is a quick decision matrix:
| Scenario | Pattern | Why |
|---|---|---|
| Single agent, you control infra | Encrypted-at-rest vault | Simple, effective, low overhead |
| Multi-agent system, agents in sandboxes | Token broker | Agents never see tokens, atomic refresh |
| Agents need minimal access | Short-lived scoped tokens | Least privilege, time-bounded |
| Regulated industry (health, finance) | Hardware-backed keys | Compliance, tamper evidence |
| Production system with real users | Broker + scoped tokens | Best of both, defense in depth |
For most teams shipping AI agents today, the token broker pattern gives you the best security-to-effort ratio. You get token isolation, centralized refresh logic, and audit logging in one service. Layer in short-lived scoped tokens when you need fine-grained access control.
The common thread across all four patterns is this: treat tokens as radioactive material. Minimize who touches them, minimize how long they are exposed, and log every access. Your agent's job is to call APIs on behalf of users. It does not need to be the custodian of their credentials.
Getting Started
If you are building an agent system today, start with Pattern 2. Stand up a simple broker service, move your token storage behind it, and add refresh logic. You can implement it in an afternoon and it immediately eliminates the largest class of token security issues.
The code examples above are starting points. In production, add rate limiting on the broker, mutual TLS between agent and broker, and token usage anomaly detection.
I build production AI systems that handle auth securely. If you are working on agent infrastructure and want to compare notes, find me at astraedus.dev.
Top comments (0)