DEV Community

Ramasankar Molleti
Ramasankar Molleti

Posted on

Anthropic Just Killed the API Key: A Deep Dive into Workload Identity Federation for Claude

TL;DR — Anthropic shipped Workload Identity Federation (WIF) for the Claude API. Your workloads now exchange a short-lived OIDC JWT from your IdP (EKS IRSA, GKE, AKS, GitHub Actions, Kubernetes, SPIFFE/SPIRE, Okta, Entra ID) for a short-lived sk-ant-oat01-... token via RFC 7523 jwt-bearer grant. Zero static secrets. But it's workload identity, not user delegation — and that distinction is where confused deputy bugs are about to start showing up.


Why this matters (and why I'm writing a sequel)

A few weeks back I wrote about draft-klrc-aiagent-auth — the IETF blueprint for agentic identity from engineers at AWS, Zscaler, Ping Identity, and Defakto Security. The thesis was straightforward: most teams securing AI agents with API keys are one breach away from disaster, and the fix is an 8-layer Agent Identity Management System (AIMS) built on SPIFFE for workload identity, WIMSE for proof tokens across proxies, OAuth Token Exchange for delegation, and Transaction Tokens for operation-scoped authorization.

That post was about the standard. This post is about the first major LLM provider to ship a production implementation of the bottom half of that stack.

If you're running Claude in a regulated environment — financial services, healthcare, gov — and you've been waiting for the day you can stop baking sk-ant-... keys into Kubernetes secrets, that day is here. But there's a subtle architectural trap, and it's easy to miss.

Let's walk through what shipped, the exchange flow, the SPIFFE integration, and the confused-deputy footgun.


What Anthropic actually built

The mental model is clean:

A service account has credentials minted for it on demand, instead of being a credential.

That one sentence captures the whole shift. An API key is a credential — possessing it is sufficient. A service account is a principal that gets credentials minted on demand from an attested workload identity. Possession of the principal isn't a thing. You have to be the workload.

Three resources in the Claude Console express the trust relationship:

1. Service Account (svac_...)

A non-human principal in your Anthropic org. No email, no password, no Console login. It's the identity a federated token acts as. Joins workspaces like a human member. Minted tokens inherit that workspace's rate limits and usage attribution.

2. Federation Issuer (fdis_...)

Registers an OIDC provider with two key fields:

  • Issuer URL — must match the iss claim in your IdP's JWTs exactly.
  • JWKS sourcediscovery (default, hits /.well-known/openid-configuration), explicit_url, or inline for air-gapped clusters.

One issuer per environment. Your prod EKS, your staging EKS, and GitHub Actions are three separate issuers.

3. Federation Rule (fdrl_...)

The bridge between issuer and service account: "when a JWT from issuer X has claims matching Y, mint a token for service account Z."

Match conditions:

  • subject_prefix — exact or trailing-* match
  • exact audience
  • exact claim values (key/value map)
  • a CEL condition expression for complex logic

All matchers must pass. There is no implicit rule search — the client specifies the rule ID in the exchange request, and Anthropic verifies the JWT satisfies that rule. This is a deliberate design choice that prevents "rule confusion" attacks where a token accidentally matches a more permissive rule.


The exchange flow

┌──────────────┐  1. Get JWT     ┌───────────┐
│   Workload   │ ──────────────▶ │  Your IdP │
│  (in pod)    │ ◀────────────── │  (SPIRE,  │
└──────┬───────┘   JWT-SVID      │   EKS,    │
       │                          │   GHA…)   │
       │ 2. POST /v1/oauth/token  └───────────┘
       │    (jwt-bearer grant)
       ▼
┌──────────────────────────────────────┐
│  Anthropic token endpoint            │
│  - Verify signature against JWKS     │
│  - Check exp/nbf/iat                 │
│  - Match against federation rule     │
│  - Mint sk-ant-oat01-... (≤ rule TTL)│
└──────────────────────────────────────┘
       │
       │ 3. Bearer token on every API call
       ▼
   api.anthropic.com/v1/messages
Enter fullscreen mode Exit fullscreen mode

Concretely, the SDK construction looks like this:

from anthropic import Anthropic, WorkloadIdentityCredentials, IdentityTokenFile

client = Anthropic(
    credentials=WorkloadIdentityCredentials(
        identity_token_provider=IdentityTokenFile(
            "/var/run/secrets/anthropic.com/token"
        ),
        federation_rule_id="fdrl_...",
        organization_id="00000000-0000-0000-0000-000000000000",
        service_account_id="svac_...",
        workspace_id="wrkspc_...",
    ),
)

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello, Claude"}],
)
Enter fullscreen mode Exit fullscreen mode

In production you'd skip the explicit constructor entirely and let the SDK resolve from environment variables — that's the recommended pattern. Ship the same container image everywhere, inject the env per environment:

ANTHROPIC_FEDERATION_RULE_ID=fdrl_...
ANTHROPIC_ORGANIZATION_ID=00000000-...
ANTHROPIC_SERVICE_ACCOUNT_ID=svac_...
ANTHROPIC_WORKSPACE_ID=wrkspc_...
ANTHROPIC_IDENTITY_TOKEN_FILE=/var/run/secrets/anthropic.com/token
Enter fullscreen mode Exit fullscreen mode

Then in code: client = Anthropic(). Done. The SDK reads the file, exchanges for a token, refreshes before expiry, retries on rotation.


Token lifetime — the smart part

Most "short-lived token" systems get this wrong. Anthropic got it right.

The minted token's lifetime is the lesser of:

  • The rule's token_lifetime_seconds (60s to 24h, default 1h)
  • Twice the remaining IdP JWT lifetime, with a 60-second floor

That second bound is what matters. It prevents an Anthropic token from significantly outliving the upstream identity it was derived from. If your SPIRE JWT-SVID has a 5-minute TTL (SPIRE's default), the Anthropic token can live at most 10 minutes regardless of what the rule says.

Upstream attestation is the binding constraint — exactly the property you want.

The SDK runs a two-tier refresh modeled on botocore:

Tier Trigger Behavior
Advisory expiry - 120s Best-effort exchange. Falls back to cached token on failure.
Mandatory expiry - 30s Failed exchange raises an error. Cached token too close to expiry.

And it re-reads ANTHROPIC_IDENTITY_TOKEN_FILE on every exchange, so rotated projected tokens (Kubernetes service-account tokens, SPIFFE JWT-SVIDs from spiffe-helper) get picked up transparently. No app restart. No human in the loop.


SPIFFE on Anthropic — the cleanest path

If you're running SPIRE, Anthropic has a first-class SPIFFE provider and the integration is genuinely well-designed. Here's the full setup.

SPIRE side

server.conf:

server {
    trust_domain         = "prod.example.com"
    jwt_issuer           = "https://oidc-discovery.prod.example.com"
    default_jwt_svid_ttl = "5m"
}
Enter fullscreen mode Exit fullscreen mode

Two non-obvious things here:

  1. jwt_issuer MUST equal the OIDC Discovery Provider's public URL — that exact string is what you register with Anthropic. Mismatch = 400 invalid_grant. This is the #1 cause of failed setups.
  2. default_jwt_svid_ttl ≤ 1 hour. Anthropic's token-exchange endpoint rejects identity tokens with longer lifetimes. SPIRE's default is fine.

OIDC Discovery Provider config:

domains = ["oidc-discovery.prod.example.com"]

server_api {
    address = "unix:///run/spire/sockets/private/api.sock"
}

acme {
    email        = "..."
    tos_accepted = true
}
Enter fullscreen mode Exit fullscreen mode

Workload registration entry:

spire-server entry create \
    -spiffeID spiffe://prod.example.com/ns/inference/sa/worker \
    -parentID spiffe://prod.example.com/spire/agent/k8s_psat/prod-cluster/NODE_UID \
    -selector k8s:ns:inference \
    -selector k8s:sa:worker
Enter fullscreen mode Exit fullscreen mode

For cluster-wide registration, parent to a node alias instead of a single agent ID — otherwise you're pinned to one node.

spiffe-helper sidecar config:

agent_address = "/run/spire/sockets/agent.sock"
cert_dir      = "/var/run/secrets/anthropic.com"
daemon_mode   = true

jwt_svids = [{
    jwt_audience       = "https://api.anthropic.com"
    jwt_svid_file_name = "token"
}]
Enter fullscreen mode Exit fullscreen mode

Anthropic side

Federation issuer:

{
  "name": "spire-prod",
  "issuer_url": "https://oidc-discovery.prod.example.com",
  "jwks_source": "discovery"
}
Enter fullscreen mode Exit fullscreen mode

Federation rule:

{
  "name": "spire-inference-worker",
  "issuer_id": "fdis_...",
  "match": {
    "subject_prefix": "spiffe://prod.example.com/ns/inference/sa/worker",
    "audience": "https://api.anthropic.com"
  },
  "target": {
    "type": "service_account",
    "service_account_id": "svac_..."
  },
  "workspace_id": "wrkspc_...",
  "oauth_scope": "workspace:developer",
  "token_lifetime_seconds": 600
}
Enter fullscreen mode Exit fullscreen mode

Kubernetes deployment — the volume detail nobody mentions

This is the one operational detail people miss:

volumes:
  - name: anthropic-token
    emptyDir:
      medium: Memory   # ← THIS LINE
Enter fullscreen mode Exit fullscreen mode

Use a memory-backed emptyDir shared between spiffe-helper and the application container. The bearer JWT-SVID never touches the node's disk. Same pattern as Vault Agent token sinks. Same reason: bearer tokens on persistent storage are a postmortem waiting to happen.

Validation before wiring up the SDK

Always validate the JWT-SVID claims before you trust your federation rule:

spire-agent api fetch jwt \
    -audience https://api.anthropic.com \
    -socketPath /run/spire/sockets/agent.sock \
  | awk '/^[[:space:]]*eyJ/{print $1; exit}' \
  | jq -rR 'split(".")[1] | gsub("-";"+") | gsub("_";"/") | @base64d | fromjson'
Enter fullscreen mode Exit fullscreen mode

Check:

  • iss matches the OIDC Discovery Provider URL you registered
  • sub is the workload's SPIFFE ID
  • aud contains https://api.anthropic.com

If any of those don't match what your federation rule expects, the exchange returns 400 invalid_grant with no useful diagnostic on the client side. Validate the claims first.

Three SPIFFE gotchas

1. Always set the audience matcher on the rule. Without it, the rule accepts JWT-SVIDs minted for any relying party. If the same workload also calls some other SaaS via SPIFFE, a token meant for that SaaS could exchange for an Anthropic token. Always pin audience.

2. Inline JWKS = you own rotation. SPIRE rotates signing keys frequently. If you registered the issuer with inline JWKS (air-gapped clusters), you must add new keys before workloads present them, and remove superseded keys after tokens signed with them expire. Stale keys in inline JWKS remain trusted indefinitely.

3. One issuer per trust domain. Each SPIRE trust domain has its own signing keys and OIDC Discovery Provider. Register each as a separate Anthropic federation issuer.


Mapping onto the IETF AIMS stack

This is where it gets interesting for anyone tracking the agentic identity standards work.

In draft-klrc-aiagent-auth, AIMS is 8 layers:

Identifiers → Credentials → Attestation → Provisioning → Authentication → Authorization → Observability → Policy

Anthropic WIF doesn't implement all eight. But it implements the bottom five correctly, and that's exactly the foundation the upper layers need.

AIMS Layer What it requires Anthropic WIF
Identifiers Cryptographic, runtime-issued SPIFFE ID via sub claim, or IdP-native subject
Credentials Short-lived, attested, no secrets at rest OIDC JWT exchanged for OAuth access token. Zero static secrets.
Attestation Identity bound to what the workload is Inherited from upstream IdP (SPIRE selectors, IRSA pod identity, GHA repo+workflow claims)
Provisioning Federated trust, declarative Console-configured issuer + rule. CEL for complex policy.
Authentication Standards-based, verifiable RFC 7523 JWT-bearer. JWKS validated. iss/aud/exp/nbf/iat enforced.
Authorization Scoped, least-privilege workspace:developer scope, workspace-bound, rate-limited
Observability Audit chain Service account attribution per request
Policy Centralized enforcement CEL match expressions, per-rule scoping

What's notably not here: the upper-layer agentic authorization primitives — OAuth Token Exchange with act claims, Transaction Tokens, Rich Authorization Requests (RAR), CAEP for real-time revocation.

That's not a criticism. Those belong at your gateway, not at the LLM provider's auth endpoint.

Which brings me to the most important point in this whole post.


The trap: this is workload auth, NOT user delegation

Here's the single most consequential thing to understand about Anthropic WIF, and it's hiding in plain sight: the caller is treated as a workload. There is no user delegation semantics here.

This isn't OAuth's authorization code flow. There's no user identity riding through the exchange. The federated token represents the workload that called Anthropic — not the user that asked the agent to do something. And in any real agentic deployment, a user is almost always at the top of the call chain.

That gap is where confused deputy bugs live.

Let me make this concrete.

Anthropic WIF answers: "Is this workload allowed to call Claude on behalf of this Anthropic service account, with these rate limits, in this workspace?"

Anthropic WIF does NOT answer: "Is Alice allowed to ask Claude to summarize Bob's salary data?"

There is no act claim. No user identity propagated to Anthropic. From Anthropic's perspective, every request from your gateway looks like the same service account. The user is invisible to them — by design, because that's how a workload-to-API trust boundary should work.

This is the classic confused deputy setup:

  1. Your AgentGateway holds workload credentials (now: WIF tokens) that grant access to Claude.
  2. Users delegate tasks to agents. Agents call through the gateway.
  3. If the gateway doesn't enforce user authorization at its own boundary, an authenticated agent acting for a low-privilege user can ask the LLM to operate on data that user shouldn't see.
  4. The LLM has no way to know.

The upstream WIF token only proves "the gateway said this is a legit workload call." It says nothing about which user triggered the call, what they're allowed to do, or whether the prompt content respects their authorization scope.

The layered model that actually works

┌───────────────────────────────────────────────────────────┐
│  USER                                                     │
│  ↓ (OIDC auth code flow, MFA, IdP session)                │
│  AGENT-FACING APP                                         │
│  ↓ (OAuth Token Exchange — adds `act` claim)              │
│  AGENT  ←── Transaction Token (RAR-scoped: "summarize     │
│  ↓                              doc X, max 1 LLM call")   │
│  AGENTGATEWAY  ←── enforces user policy + scope intersect │
│  ↓ (Anthropic WIF: SPIFFE JWT-SVID → sk-ant-oat01-...)    │
│  CLAUDE API                                               │
└───────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Read top-to-bottom: user identity rides through OAuth Token Exchange with act claims, Transaction Tokens scope the specific operation, the gateway enforces user-level authorization, and Anthropic WIF handles the workload-to-LLM hop. Each layer answers a different question. None are interchangeable.

If you skip the user layer, WIF is still a massive upgrade over API keys — you've eliminated stored secrets, gained short-lived tokens, gained per-workload attribution. But you have not solved agentic identity. You've solved infrastructure identity.


The migration trap that will bite you

Buried in the docs:

ANTHROPIC_API_KEY sits above the federation tiers, so a leftover key in the environment silently shadows federation.

Translation: you can configure WIF perfectly, deploy, smoke-test, ship — and still be using the old API key. Because credential precedence puts ANTHROPIC_API_KEY above the federation env vars, the federation code path simply never runs.

The migration sequence that actually works:

1. Stand up federation in parallel. Leave ANTHROPIC_API_KEY in place.
2. Run `ant auth status` from inside the workload.
   At this stage: the API key wins. That's expected.
3. Unset ANTHROPIC_API_KEY EVERYWHERE:
     - CI secrets
     - Container env (Deployment manifests, Helm values)
     - Shell profiles
     - Any sidecar that injects it
4. Re-run `ant auth status`. Confirm the federation source is selected.
5. ONLY NOW: revoke the API key in the Console.
Enter fullscreen mode Exit fullscreen mode

Step 3 is the high-risk step where audit chains catch leftover injections. I'd add: instrument your gateway logs to alert on requests carrying sk-ant-api03-... prefixes after cutover. If that prefix shows up after step 5, you have a stowaway. Could be a CronJob, a CI workflow, a debug pod, a contractor's laptop.


What this means for platform architecture

If you're running Claude in production today, three things change:

1. The threat model shifts from "key custody" to "issuer trust"

You're no longer worried about a static key leaking from a Vault transit engine, a CI log, a Slack message, or a developer laptop. You're worried about whether your IdP is correctly attesting workload identity.

The blast radius of compromise goes from "anyone with the key can be us" to "the attacker needs to compromise our IdP and satisfy the federation rule's match conditions during a short token window".

2. Audit and attribution become per-workload by default

Service account IDs flow into Anthropic's usage and rate limit attribution. Combined with your gateway logs, you can trace a single Claude inference back to: which workspace, which service account, which workload (via the federation rule + JWT sub), which user request (via correlation ID).

That's the audit chain regulators will eventually require for AI inference in regulated industries.

3. The gateway's job gets more important, not less

WIF closes the workload-to-LLM hop. The user-to-agent and agent-to-tool hops are still yours to enforce.

AgentGateway, with SPIFFE for workload mechanism and OAuth Token Exchange + RAR for user delegation, is where confused-deputy attacks get prevented. WIF is necessary but not sufficient.


Closing thought

Anthropic shipped Workload Identity Federation. RFC 7523 JWT-bearer grant. First-class support for AWS, GCP, Azure, GitHub Actions, Kubernetes, SPIFFE, Okta. Service account model. Short-lived tokens. Two-tier SDK refresh. Per-workload attribution.

For platform teams running Claude in regulated environments — especially on Kubernetes with SPIRE — this is the API key killer we've been waiting for. It maps cleanly onto the bottom five layers of the IETF AIMS stack.

But it is workload identity, not user delegation. The agent calling through your gateway still needs OAuth Token Exchange with act claims for user context, Transaction Tokens for operation scoping, and gateway-level policy enforcement to prevent confused deputy.

Static credentials die at the gateway. Dynamic attested tokens live at every hop.

The shift continues:

Stop asking "does it have the right key?"
Start asking "what IS this entity, do we trust it, what do we expect from it?"


References


If you found this useful, follow along — the next post in this series digs into AgentGateway implementation patterns: SPIFFE attestation, WIMSE proof tokens across proxies, and OAuth Token Exchange with act claim chaining for user delegation. The piece WIF doesn't solve.

Top comments (0)