Recently, I was building a developer-facing API and ran into a problem I couldn’t find a clean answer to anywhere. I needed to support long-running, fully automated, user-delegated access with no browser and no human in the loop, and OAuth2 had no clear answer. I landed on implementing API keys alongside OAuth2, but that decision has real implications on the authentication architecture, and I wanted to share it to hopefully save others from taking this long journey.
OAuth2 is, by most measures, the best authorization framework we have at internet scale. It standardizes how applications handle authentication across client types, enables SSO across your platform, defines how public clients should behave, and lets your teams avoid implementing their own auth logic from scratch. Compared to what came before it, it is a major step forward.
It was designed to be a framework, not a strict protocol. That flexibility was intentional, so every major identity provider could adapt it to their systems. This looseness left many things unspecified. The IETF OAuth working group has since produced more than 30 RFCs and extensions (PKCE, DPoP, and PAR) to fill the gaps, and the list continues to grow. One of those gaps is non-interactive user-delegated access.
There are existing options, but none quite fit.
The client credentials flow is truly headless and works well for machine-to-machine scenarios. The identity it issues, though, belongs to the application, not the user. There is no user in the picture. If you need to know which person triggered an action, client credentials cannot tell you.
The Device Authorization Grant gets you closer. A user approves access once from a browser or secondary device, and from that point forward, the client can operate headlessly using refresh tokens. For fully unattended automation, it breaks down. Token rotation, expiry, and revocation mean a human eventually has to show up again. It is headless at runtime, but not headless forever. When a rotation event is missed due to a network failure or a crashed process, the entire refresh chain is invalidated, and a human has to re-authorize. For automation that needs to run unattended indefinitely, that is the specific failure mode that kills it.
The gap is the intersection: long-running, fully automated, user-delegated access that involves no human within the loop and no browser. That specific combination is what OAuth2 does not answer cleanly. Emerging token exchange patterns and agent-specific delegation models are moving in this direction, but the friction for true zero-interaction automation remains.
The gap is not theoretical. Nearly every major developer-facing platform maintains long-lived, user-scoped credential paths alongside their OAuth flows. Not as legacy holdovers, but as deliberate choices for a class of access that OAuth alone does not cleanly serve.
While RFC 8693 (OAuth 2.0 Token Exchange) defines mechanics that could support exchanging external JWTs or even custom subject_token types (such as API keys) for internal tokens, production identity providers in practice still require significant custom extensions, or outright separate paths, for reliable external JWT exchange and especially for API key exchange. Major implementations do not yet deliver this use case in a plug-and-play way without either fighting upstream limitations or reintroducing complexity downstream.
The challenge is not the API key itself. It is what happens to your authentication architecture once you introduce one.
The dual authentication problem
Two authentication schemes means your API server has to handle both at the point of entry: validate a signed JWT from your IDP on one path, look up an API key on another. If your architecture stops there, one API server, no internal service calls, no plans to grow, this is not your problem. Close the tab, go ship something. But if your services talk to other services, and those services need to know who is calling, keep reading, because this is where it gets interesting.
The problem is what happens next. Your API server has verified the request and knows who is making the call. Now it needs to talk to an internal service, and that service needs to know who is calling. You cannot pass the credential forward because that service would need to implement the same dual validation, which would put you back in the same problem one layer deeper. So you extract the identity and pass it along in some form. A header, a forwarded value, whatever convention your stack has settled on.
Here is where the real cost shows up. You took a signed JWT with cryptographic guarantees and a verifiable chain of trust, and reduced it to a plain string that any service in your stack could have fabricated. The signature is gone. The guarantee is gone. The internal services receiving that request no longer operate on a verified credential; they operate on trust in your infrastructure. That is a meaningful step down in your security posture, and it compounds with every service hop in your call graph.
The natural next question is whether you can sidestep the dual authentication problem by issuing a single, long-lived credential. One credential type, one validation path, no normalization required. If the problem is having two schemes, eliminate one of them. It is a reasonable thought. The issue is what happens when you need to revoke access. An API key is just a database record. Delete it or mark it invalid, and it stops working. A self-contained JWT with a long expiry is still cryptographically valid regardless of what you want. The only mitigation is a blocklist, an external store of invalidated tokens that all services have to check on every request. That works, but you have reintroduced the database lookup you were trying to avoid with JWTs, plus the operational burden of keeping that list consistent across your stack. Most IDPs also cap access token lifetimes and do not support long-lived JWTs out of the box. What looked like a simplification turns out to be a harder problem than the one you started with.
Opaque tokens are another option worth addressing directly. Some IDPs issue them by default, and on the surface, they seem to sidestep the JWT revocation problem, since the token is just a reference to the authorization server’s control. The trade-off is that all services that receive an opaque token must call the authorization server to validate it. That is a per-request network dependency on your IDP for every internal service hop, which adds latency, creates an availability coupling you do not want inside your service mesh, and scales poorly as your call graph grows. Opaque tokens are a reasonable choice at the edge. They are a poor fit for internal service-to-service communication.
Credential normalization
The solution is normalization. Whatever credential arrives at the edge, an external access token or an API key, the gateway performs lightweight structural validation of the incoming credential: confirming the token type, format, and expiry. If that passes, it checks its cache. On a hit, it returns the cached internal token immediately. On a miss, it invokes the appropriate exchange flow on an internal OAuth2 authorization server, which handles identity resolution, token issuance, and claim normalization. Your downstream services never see the original credential. They receive a consistent internal token every time, from an issuer they trust, regardless of how the caller authenticated.
This sits in the critical path of every request, so performance is a legitimate concern. It can be addressed through caching. The internal token issued by the authorization layer is cached against the lifecycle of the incoming credential. Layer one and layer two caches for token lookups mean real work happens only on cache misses. A cache miss adds the cost of a token exchange against a co-located auth server, which is negligible for most traffic patterns. After that, the cached internal token serves subsequent requests without any additional round-trips for the duration of its TTL. This compares favorably to the opaque token model, in which every internal service hop requires a round-trip back to the authorization server.
The internal token TTL is configurable and does not have to mirror the external credential’s remaining lifetime. Shorter windows tighten the revocation exposure at the cost of more frequent exchanges. Longer windows reduce the auth server load but increase the gap between a revocation event and enforcement. This is the same sliding scale you navigate when configuring access token lifetimes on any OAuth2 server. Neither end is wrong; it is an operational choice based on your threat model and traffic patterns.
Revocation handling depends on your implementation choices. Short TTLs on the cached internal token naturally limit exposure windows to the five to fifteen minute range for high-security use cases. For faster enforcement, connect revocation events from your identity service to Redis pub/sub, key expiration notifications, or a lightweight revocation signal channel to trigger active cache invalidation. This keeps the pattern flexible: simple TTL-only for low-friction deployments, or event-driven invalidation for tighter security SLAs.
Claim enrichment is the other major benefit of this architecture. Because the authorization layer controls the internal token’s claim structure, you can normalize claim data across IDPs, add entitlements, unify user identifiers, or inject any context your internal services need. None of that is possible when credentials pass through unchanged.
The deeper value is insulation. The gateway establishes a hard boundary between two trust domains: the external world with its IDPs, credential types, and lifecycle variability, and the internal world operating on a stable token format that you own and control. Onboarding a second IDP or user pool gets absorbed at that boundary. Nothing downstream needs to be updated. Your internal services evolve against a contract you define, not against the shifting surface of your external identity providers. That separation of responsibility is what makes this pattern durable as systems grow.
Deployment
To intercept traffic before it reaches your services, you need a proxy layer. Service meshes provide exactly that extension point. This pattern is an Envoy capability at its core. Any service mesh that runs Envoy as its data plane supports the same ext_authz extension point natively. The implementation here uses Istio, but Consul is another example. If you are not running a service mesh at all, a standalone Envoy or NGINX deployment with an external auth filter works the same way.
How it works
Here is how the full flow works in practice with Istio as the proxy layer.
Istio’s ext_authz filter intercepts inbound requests at the proxy layer and delegates the auth decision to an external service before forwarding to the destination. That service can approve, deny, or modify the request, including rewriting headers.
The full flow:
- A caller sends a request with either a bearer token or an API key.
- Istio intercepts the request and forwards it to the token gateway via ext_authz (gRPC).
- The gateway performs basic validation of the incoming credential, checking expiry and structure, then looks it up in cache.
- If a cached internal token exists, the gateway returns it immediately with an Allow decision and rewrites the Authorization header.
- On a cache miss, the gateway forwards the credential to the authorization layer for exchange, which issues a new internal token.
- The new internal token is cached against the lifecycle of the incoming credential, and the Authorization header is rewritten.
- Istio forwards the modified request to the destination service.
- The destination service validates the internal token against the authorization layer’s JWKS endpoint.
If basic validation fails at step 3, the gateway returns a Deny response. The request never reaches the destination.
The two credential paths
The gateway handles two credential types:
-
Authorization: Bearer <token>where the token is a JWT issued by an external IDP (Okta, Auth0, Keycloak, or any OIDC-compliant provider) -
Authorization: Token <api-key>where the value is an API key issued by the gateway itself, permanently tied to a user identity at creation time (X-API-Keyis a common alternative; the header scheme is configurable)
Both arrive with different shapes and different trust models. Both leave as the same thing: a consistent internal token issued by the internal OAuth2 authorization server, validated by your services against its JWKS endpoint. How each credential type gets resolved is handled by the auth server through two dedicated exchange flows. Those flows are the subject of part two.
API key issuance and the user-identity binding are handled by an internal identity service, which also provides the validation endpoint that the gateway calls at exchange time. That service is covered in Part 2.
What your services see
A bearer token. From an internal issuer they trust. With claims they can use for their own authorization decisions.
That is the whole contract. Whether the original caller authenticated via PKCE in a browser, via a CLI using stored credentials, via an API key in a script, or via an access token from a completely different IDP, every service in your stack sees the same thing every time.
One token format. One trust relationship. The credential complexity lives at the boundary, and nowhere else.
A reference implementation is available at github.com/mberwanger/token-gateway. It covers the ext_authz integration and the credential normalization layer described here. In that implementation, the gateway delegates token issuance to a purpose-built OAuth2 authorization server sitting behind it. That server is the subject of part two, where we will walk through building it with two custom grant types: one for external token exchange and one for API key exchange.

Top comments (0)