DEV Community: Martin H Berwanger

Unified Authentication for OAuth2 and API Keys via Edge Token Normalization

Martin H Berwanger — Mon, 16 Mar 2026 13:46:11 +0000

Recently, I was building a developer-facing API and ran into a problem I couldn’t find a clean answer to anywhere. I needed to support long-running, fully automated, user-delegated access with no browser and no human in the loop, and OAuth2 had no clear answer. I landed on implementing API keys alongside OAuth2, but that decision has real implications on the authentication architecture, and I wanted to share it to hopefully save others from taking this long journey.

OAuth2 is, by most measures, the best authorization framework we have at internet scale. It standardizes how applications handle authentication across client types, enables SSO across your platform, defines how public clients should behave, and lets your teams avoid implementing their own auth logic from scratch. Compared to what came before it, it is a major step forward.

It was designed to be a framework, not a strict protocol. That flexibility was intentional, so every major identity provider could adapt it to their systems. This looseness left many things unspecified. The IETF OAuth working group has since produced more than 30 RFCs and extensions (PKCE, DPoP, and PAR) to fill the gaps, and the list continues to grow. One of those gaps is non-interactive user-delegated access.

There are existing options, but none quite fit.

The client credentials flow is truly headless and works well for machine-to-machine scenarios. The identity it issues, though, belongs to the application, not the user. There is no user in the picture. If you need to know which person triggered an action, client credentials cannot tell you.

The Device Authorization Grant gets you closer. A user approves access once from a browser or secondary device, and from that point forward, the client can operate headlessly using refresh tokens. For fully unattended automation, it breaks down. Token rotation, expiry, and revocation mean a human eventually has to show up again. It is headless at runtime, but not headless forever. When a rotation event is missed due to a network failure or a crashed process, the entire refresh chain is invalidated, and a human has to re-authorize. For automation that needs to run unattended indefinitely, that is the specific failure mode that kills it.

The gap is the intersection: long-running, fully automated, user-delegated access that involves no human within the loop and no browser. That specific combination is what OAuth2 does not answer cleanly. Emerging token exchange patterns and agent-specific delegation models are moving in this direction, but the friction for true zero-interaction automation remains.

The gap is not theoretical. Nearly every major developer-facing platform maintains long-lived, user-scoped credential paths alongside their OAuth flows. Not as legacy holdovers, but as deliberate choices for a class of access that OAuth alone does not cleanly serve.

While RFC 8693 (OAuth 2.0 Token Exchange) defines mechanics that could support exchanging external JWTs or even custom subject_token types (such as API keys) for internal tokens, production identity providers in practice still require significant custom extensions, or outright separate paths, for reliable external JWT exchange and especially for API key exchange. Major implementations do not yet deliver this use case in a plug-and-play way without either fighting upstream limitations or reintroducing complexity downstream.

The challenge is not the API key itself. It is what happens to your authentication architecture once you introduce one.

The dual authentication problem

Two authentication schemes means your API server has to handle both at the point of entry: validate a signed JWT from your IDP on one path, look up an API key on another. If your architecture stops there, one API server, no internal service calls, no plans to grow, this is not your problem. Close the tab, go ship something. But if your services talk to other services, and those services need to know who is calling, keep reading, because this is where it gets interesting.

The problem is what happens next. Your API server has verified the request and knows who is making the call. Now it needs to talk to an internal service, and that service needs to know who is calling. You cannot pass the credential forward because that service would need to implement the same dual validation, which would put you back in the same problem one layer deeper. So you extract the identity and pass it along in some form. A header, a forwarded value, whatever convention your stack has settled on.

Here is where the real cost shows up. You took a signed JWT with cryptographic guarantees and a verifiable chain of trust, and reduced it to a plain string that any service in your stack could have fabricated. The signature is gone. The guarantee is gone. The internal services receiving that request no longer operate on a verified credential; they operate on trust in your infrastructure. That is a meaningful step down in your security posture, and it compounds with every service hop in your call graph.

The natural next question is whether you can sidestep the dual authentication problem by issuing a single, long-lived credential. One credential type, one validation path, no normalization required. If the problem is having two schemes, eliminate one of them. It is a reasonable thought. The issue is what happens when you need to revoke access. An API key is just a database record. Delete it or mark it invalid, and it stops working. A self-contained JWT with a long expiry is still cryptographically valid regardless of what you want. The only mitigation is a blocklist, an external store of invalidated tokens that all services have to check on every request. That works, but you have reintroduced the database lookup you were trying to avoid with JWTs, plus the operational burden of keeping that list consistent across your stack. Most IDPs also cap access token lifetimes and do not support long-lived JWTs out of the box. What looked like a simplification turns out to be a harder problem than the one you started with.

Opaque tokens are another option worth addressing directly. Some IDPs issue them by default, and on the surface, they seem to sidestep the JWT revocation problem, since the token is just a reference to the authorization server’s control. The trade-off is that all services that receive an opaque token must call the authorization server to validate it. That is a per-request network dependency on your IDP for every internal service hop, which adds latency, creates an availability coupling you do not want inside your service mesh, and scales poorly as your call graph grows. Opaque tokens are a reasonable choice at the edge. They are a poor fit for internal service-to-service communication.

Credential normalization

The solution is normalization. Whatever credential arrives at the edge, an external access token or an API key, the gateway performs lightweight structural validation of the incoming credential: confirming the token type, format, and expiry. If that passes, it checks its cache. On a hit, it returns the cached internal token immediately. On a miss, it invokes the appropriate exchange flow on an internal OAuth2 authorization server, which handles identity resolution, token issuance, and claim normalization. Your downstream services never see the original credential. They receive a consistent internal token every time, from an issuer they trust, regardless of how the caller authenticated.

This sits in the critical path of every request, so performance is a legitimate concern. It can be addressed through caching. The internal token issued by the authorization layer is cached against the lifecycle of the incoming credential. Layer one and layer two caches for token lookups mean real work happens only on cache misses. A cache miss adds the cost of a token exchange against a co-located auth server, which is negligible for most traffic patterns. After that, the cached internal token serves subsequent requests without any additional round-trips for the duration of its TTL. This compares favorably to the opaque token model, in which every internal service hop requires a round-trip back to the authorization server.

The internal token TTL is configurable and does not have to mirror the external credential’s remaining lifetime. Shorter windows tighten the revocation exposure at the cost of more frequent exchanges. Longer windows reduce the auth server load but increase the gap between a revocation event and enforcement. This is the same sliding scale you navigate when configuring access token lifetimes on any OAuth2 server. Neither end is wrong; it is an operational choice based on your threat model and traffic patterns.

Revocation handling depends on your implementation choices. Short TTLs on the cached internal token naturally limit exposure windows to the five to fifteen minute range for high-security use cases. For faster enforcement, connect revocation events from your identity service to Redis pub/sub, key expiration notifications, or a lightweight revocation signal channel to trigger active cache invalidation. This keeps the pattern flexible: simple TTL-only for low-friction deployments, or event-driven invalidation for tighter security SLAs.

Claim enrichment is the other major benefit of this architecture. Because the authorization layer controls the internal token’s claim structure, you can normalize claim data across IDPs, add entitlements, unify user identifiers, or inject any context your internal services need. None of that is possible when credentials pass through unchanged.

The deeper value is insulation. The gateway establishes a hard boundary between two trust domains: the external world with its IDPs, credential types, and lifecycle variability, and the internal world operating on a stable token format that you own and control. Onboarding a second IDP or user pool gets absorbed at that boundary. Nothing downstream needs to be updated. Your internal services evolve against a contract you define, not against the shifting surface of your external identity providers. That separation of responsibility is what makes this pattern durable as systems grow.

Deployment

To intercept traffic before it reaches your services, you need a proxy layer. Service meshes provide exactly that extension point. This pattern is an Envoy capability at its core. Any service mesh that runs Envoy as its data plane supports the same ext_authz extension point natively. The implementation here uses Istio, but Consul is another example. If you are not running a service mesh at all, a standalone Envoy or NGINX deployment with an external auth filter works the same way.

How it works

Here is how the full flow works in practice with Istio as the proxy layer.

Istio’s ext_authz filter intercepts inbound requests at the proxy layer and delegates the auth decision to an external service before forwarding to the destination. That service can approve, deny, or modify the request, including rewriting headers.

The full flow:

A caller sends a request with either a bearer token or an API key.
Istio intercepts the request and forwards it to the token gateway via ext_authz (gRPC).
The gateway performs basic validation of the incoming credential, checking expiry and structure, then looks it up in cache.
If a cached internal token exists, the gateway returns it immediately with an Allow decision and rewrites the Authorization header.
On a cache miss, the gateway forwards the credential to the authorization layer for exchange, which issues a new internal token.
The new internal token is cached against the lifecycle of the incoming credential, and the Authorization header is rewritten.
Istio forwards the modified request to the destination service.
The destination service validates the internal token against the authorization layer’s JWKS endpoint.

If basic validation fails at step 3, the gateway returns a Deny response. The request never reaches the destination.

The two credential paths

The gateway handles two credential types:

Authorization: Bearer <token> where the token is a JWT issued by an external IDP (Okta, Auth0, Keycloak, or any OIDC-compliant provider)
Authorization: Token <api-key> where the value is an API key issued by the gateway itself, permanently tied to a user identity at creation time (X-API-Key is a common alternative; the header scheme is configurable)

Both arrive with different shapes and different trust models. Both leave as the same thing: a consistent internal token issued by the internal OAuth2 authorization server, validated by your services against its JWKS endpoint. How each credential type gets resolved is handled by the auth server through two dedicated exchange flows. Those flows are the subject of part two.

API key issuance and the user-identity binding are handled by an internal identity service, which also provides the validation endpoint that the gateway calls at exchange time. That service is covered in Part 2.

What your services see

A bearer token. From an internal issuer they trust. With claims they can use for their own authorization decisions.

That is the whole contract. Whether the original caller authenticated via PKCE in a browser, via a CLI using stored credentials, via an API key in a script, or via an access token from a completely different IDP, every service in your stack sees the same thing every time.

One token format. One trust relationship. The credential complexity lives at the boundary, and nowhere else.

A reference implementation is available at github.com/mberwanger/token-gateway. It covers the ext_authz integration and the credential normalization layer described here. In that implementation, the gateway delegates token issuance to a purpose-built OAuth2 authorization server sitting behind it. That server is the subject of part two, where we will walk through building it with two custom grant types: one for external token exchange and one for API key exchange.

Code Review Is Where the Value Is Now

Martin H Berwanger — Thu, 05 Mar 2026 18:25:59 +0000

With agentic coding tools now mainstream, there is an emerging conversation about which parts of how we build software still make sense. Rethinking the old ceremonies is healthy and necessary. But the velocity these tools introduce comes with a new pressure. Writing is no longer the constraint. Reviewing is. Teams with high AI adoption are merging pull requests at a significantly higher rate, but review time is climbing with it. The choice becomes: lower the bar to keep pace, or hold the line and become the bottleneck to your own delivery. Most organizations are quietly doing one or the other without fully admitting it. Neither is a real solution.

Some in the industry have landed on the conclusion: if the old gate can't keep up, remove it. Automate the review. Let AI write the code, and AI review it. Shift everything upstream to specs, and let the machines handle the rest.

I think we should be more measured in our approach. The question is not whether to use these tools. The question is what we give up when we remove human judgment and taste from the loop, and whether we are being honest with ourselves about that tradeoff.

Specs Don't Precede Understanding

The spec-driven development movement is gaining momentum. The idea is clean: humans write specifications, agents generate code to match them, and deterministic tests verify the output. Code review becomes unnecessary because the spec was the real artifact all along.

The problem is that specs don't precede understanding. They emerge from the process of building. Birgitta Böckeler at Thoughtworks put it well in her recent evaluation of SDD tools: the best way to stay in control of what you're building is small, iterative steps, and up-front spec design runs counter to that. The problem is not the spec. The problem is the assumption that a spec, no matter how detailed, can fully describe a problem space well enough to close the loop entirely.

A spec can capture what you know. But the things you don't know you don't know only emerge through the process of building and the friction of real scrutiny. No document written before the work begins can anticipate them. And when they go undiscovered, the consequences are not theoretical. They affect real users, real systems, and real businesses.

Every project has unknown unknowns. That is not new. What is new is the risk of removing the mechanism by which they get discovered. In traditional development, the process of building is itself a discovery process. You encounter the edge case, you realize the spec was incomplete, you resolve it. The iteration surfaces what the spec could not anticipate. In a fully automated pipeline, that loop is closed. The agent makes decisions silently. The automated reviewer checks what it was told to check. Nobody is deeply thinking through the problem space. The unknown unknowns do not surface until something breaks.

And even setting aside what the spec couldn't anticipate, there is the question of whether the agent faithfully implemented what it was given. Agentic workflows go off the rails. They make decisions that deviate from the spec in subtle ways that are hard to catch after the fact. The 2025 DORA report found that AI adoption continues to have a negative relationship with software delivery stability and that, without robust feedback loops and review processes, increased change volume leads directly to increased instability. That is not sentiment. That is production data across nearly 5,000 technology professionals.

The strongest counterargument is not that humans should be removed entirely, but that their role should shift to spec authorship and system design rather than diff review. That is a reasonable position. But it assumes the gap between a spec and its implementation is small enough to ignore. The data suggests otherwise.

The Reasoning Isn't in the Diff

AI systems make decisions constantly during implementation, and they rarely surface the rationale behind those decisions. Choosing one library over another, selecting an architectural pattern, picking a dependency. These aren't syntax errors. They won't be caught by a linter or a test suite. They represent judgment calls made silently, with no explanation attached.

In traditional development, you could walk over to the engineer who wrote the code and ask why they made a decision. That conversation surfaces context that never makes it into the diff. With an agentic workflow that session is gone. You cannot reconstruct the reasoning after the fact in any reliable way. A reviewer isn't just checking whether the code works. They're asking whether the decisions that produced it were sound.

Who Owns What Ships

There is a question that the fully automated pipeline does not answer: when something goes wrong, who is responsible?

In a spec-to-agent-to-production workflow, the spec writer defined the intent. The agent produced the implementation. The automated reviewer flagged nothing. And yet something broke in a way nobody anticipated. The spec writer may not have the depth to debug it. The agent has no accountability. The pipeline has no memory of why decisions were made.

Code review is not just a quality gate. It is how ownership gets established. When an engineer reviews a change and approves it, they are accepting responsibility for understanding what ships. That accountability is not a bureaucratic formality. It is what drives the careful thinking that catches problems before they reach production. Remove the reviewer and you do not just remove a checkpoint. You remove the person whose name is on it.

You cannot automate away accountability and expect quality to hold.

Invest in the Review, Not Around It

None of this is an argument that human code review is reliable by default. Rubber stamp approvals are real. Reviewers share mental models with authors and miss the same things. The current state of code review is not something worth defending as-is.

But comparing current human review to a fully automated pipeline is the wrong frame. Code review has historically been a secondary activity, something engineers fit around their primary work without dedicated tooling or support. The argument here is not to preserve that. It is to invest in something better: agentic tooling designed to give the engineer who owns a system the context they need to review with confidence. That is not the same activity as scanning a diff for eight seconds. And it is not something you can compare fairly to what we have today.

The SDLC has always been a funnel. Features get scoped, code gets written, changes get reviewed, and software gets shipped. What has changed is where the value is concentrated. Writing code used to be the hard part. It is no longer. The hard part is now the review: understanding what was built, whether it was built right, and whether it fits the system it is being added to. That is where engineering judgment lives. That is where investment should go.

What that looks like in practice is agentic tooling designed to augment the human reviewer, not replace them. The industry spent years investing in making code faster to write and underinvested in making review faster and sharper. That needs to change. Tools that surface how a change fits into the broader system. Tools that expose the reasoning behind implementation decisions so a reviewer can interrogate them. Tools that help the engineer who owns that service pressure test what shipped against what was intended. Not tooling that rubber stamps a diff. Tooling that maximizes truth seeking.

The engineer with accountability for a system should be equipped to ask hard questions quickly. Does this change respect the constraints of the system it touches? Does it introduce dependencies that weren't considered? Does it hold up under conditions the spec didn't anticipate? Those questions require judgment. The tooling should make that judgment faster and sharper, not obsolete.

The goal is not to keep up with the machines. The goal is to ship software that works, that holds up under real conditions, and that we understand well enough to own. Our customers depend on that. Staying in the loop is not a limitation. It is how we honor that responsibility.

The future of software engineering is judgment.

Sources:

Birgitta Böckeler, "Understanding Spec-Driven-Development: Kiro, spec-kit, and Tessl," martinfowler.com, October 2025: https://martinfowler.com/articles/exploring-gen-ai/sdd-3-tools.html
2025 DORA State of AI-assisted Software Development Report: https://cloud.google.com/blog/products/ai-machine-learning/announcing-the-2025-dora-report