DEV Community

AI gateways: why and how

Nicolas Fränkel on June 04, 2026

Before working for 2 years on the Apache APISIX API gateway, I was mainly oblivious to API gateways. It's only by working with them that I understo...

Read full post

FrancisTRᴅᴇᴠ (っ◔◡◔)っ • Jun 4

Interesting read! Well written! It took me a while since I got distracted by your Cover Image on this post lol (is that hulk?)

No offense to the Cover Image by the way! Just something I notice :)

Again, well done :D

xulingfeng • Jun 5

The governance angle is the part most people overlook with AI gateways — they focus on provider switching and cost optimization, not on whether the system can explain its decisions. Data sovereignty (Mistral vs Claude) is a very real concern too.

Curious about your testing approach: how do you verify that the gateway actually enforces the policies described, beyond just routing traffic correctly?

Nicolas Fränkel • Jun 6

how do you verify that the gateway actually enforces the policies described

How do you verify that any of your infrastructure component does its job?

xulingfeng • Jun 6

Fair point, but one thing keeps bugging me.
Traditional middleware enforces config — you test it once, you trust it. AI gateways have policies that are classification problems (PII scanning, output guardrails), not config. Model output is stochastic, so classifications have false negatives.
How do you audit trust in a probabilistic decision?
Have you seen any gateway tackle this distinction?

Nicolas Fränkel • Jun 8

I'm afraid that we disagree on your assumptions.

Gateways are gateways: glorified proxies that enforce a single entry point. At this point, what feature you use is up to you. Observability, authorization, etc. are common features, in both API and AI gateways.

The "probabilistic nature" comes from the back-end part, not from the gateway part.

xulingfeng • Jun 8

You're right that structurally, a gateway is a gateway — the proxy layer itself doesn't introduce probability. But the nature of the backend changes what the gateway needs to do. A traditional API gateway can assume deterministic responses and focus on routing, auth, rate limiting, and caching by exact key. An AI gateway, sitting in front of a probabilistic backend, also needs response validation, semantic caching, cost-per-token tracking, and content-based fallback. Not because the gateway became probabilistic — because the thing it's proxying is. Same architectural pattern, different operational surface area.

Mykola Kondratiuk • Jun 7

ran one of these in front of an agent fleet - rate limiting alone was worth it. policy enforcement was the bonus I did not expect to care about.

Nicolas Fränkel • Jun 8

Yes, gateways unlock so many benefits you don't expect.

You barely understand how you could work without them before.

Mykola Kondratiuk • Jun 8

cost attribution per agent was the one i did not see coming. zero visibility into which worker was burning budget until the gateway showed up. now it is the first thing i check on a long run.

NOVAInetwork • Jun 9

The Mistral reasoning_effort enum mismatch is the part of this story I would dig into more. Gateway abstractions promise unified APIs but the actual provider schemas drift constantly, especially around newer fields like reasoning controls.

Your CLAUDE_CODE_DISABLE_THINKING=1 workaround gets you working but at the cost of the exact capability you wanted from a stronger model.

The deeper question is whose job it is to normalize. Three options I have seen attempted: the gateway translates per-provider (Bifrost would have to ship a Mistral-specific adapter for the enum), the client adapts to a lowest-common-denominator schema (which is what your env var workaround amounts to), or the provider conforms upstream (Mistral could accept "medium" and silently downcast). All three are bad in different ways and the right answer probably depends on whether the gateway is supposed to be transparent or opinionated.

The Patriot Act framing at the top of your post is also doing more work than people might notice. Provider independence is usually pitched as cost or reliability. Data sovereignty is a different axis and I think it is going to be the actual driver for European enterprises in 2026, not the cost optimization that gets the marketing attention.

Nicolas Fränkel • Jun 13

Data sovereignty is a different axis and I think it is going to be the actual driver for European enterprises in 2026

Yes, the time has finally come. The fact that the US is acting more and more unfriendly is a big trigger.

NOVAInetwork • Jun 13

Agreed it's a separate axis. The technical consequence I keep hitting: sovereignty turns the gateway from a routing concern into a data-residency boundary, which forces the schema-normalization layer (the enum/param remapping we were just talking about) onto the customer's side of that boundary. Once it lives there, the gateway can't silently rewrite a request server-side, the normalization has to be auditable and local. That's a real design constraint, not a compliance checkbox, and it changes where you put the translation logic.

Mustafa ERBAY • Jun 4

I think AI gateways will eventually become a standard layer in enterprise AI architectures.
Not because organizations need access to more models, but because they need control.
Routing.
Governance.
Observability.
Budget enforcement.
Fallback strategies.
Provider independence.
The more AI becomes infrastructure, the more these concerns start looking less like AI problems and more like classic systems engineering problems.

Nicolas Fränkel • Jun 4

Pretty much.

Alex Shev • Jun 9

AI gateways make more sense once teams move past experiments. Routing is useful, but the bigger value is policy, observability, cost control, and having one place to understand what the app is asking models to do.

Jasmine Park • Jun 11

Good overview. The gateway benefit that gets underrated in these discussions is cost attribution: it is the one place every model call from every framework actually passes through, so it is the natural place to stamp team, feature, and request identifiers before the provider sees the call. We tried doing that attribution at the application layer first and every framework needed its own instrumentation; at the gateway it was one change. Routing and caching get the headlines but the accounting alone justified ours

Nicolas Fränkel • Jun 13

Indeed. I'm trying to champion an AI Gateway for this exact reason at my current company.

Dirk Porsche • Jun 10

Ditch Claude Code and use OpenCode ... it's even better and you can skip all that setup that will probably not yield good results anyway. Gemini strongly advises against this, because Claude Code and the Anthropic models are deeply entwined ... for a purpose ... locking-you-in ... so liberate yourself.

tokenmixai • Jun 9

Solid walkthrough — especially the Bifrost fallback config section.
The "self-hosted Go binary with built-in observability" reasoning is
exactly why teams pick it over LiteLLM-as-library.
One framing worth adding: the hosted-gateway space is now three lanes,
not two.

Self-hosted (Bifrost, LiteLLM-as-library): full control, $0 gateway cost, your engineering time
Hosted with markup (OpenRouter): zero ops, ~5.5% on top of provider rates
Hosted at pass-through rates: TokenMix sits here — same OpenAI- compatible endpoint pattern, ~300 models (Claude, OpenAI, Gemini, DeepSeek, Qwen, Kimi), but provider pricing without markup Most "AI gateway" posts default to comparing lane 1 vs lane 2. Lane 3 is the missing option for teams that don't want to run infra but also don't want a routing tax on every token. One thing worth adding to your governance section: cost attribution across providers is the actual reason most teams adopt a gateway — not the routing or fallback features. Single bill across all providers is the unsexy but real adoption driver. (Disclosure: I work on the TokenMix research side.)