DEV Community

Cover image for AI gateways: why and how

AI gateways: why and how

Nicolas Fränkel on June 04, 2026

Before working for 2 years on the Apache APISIX API gateway, I was mainly oblivious to API gateways. It's only by working with them that I understo...
Collapse
 
francistrdev profile image
FrancisTRᴅᴇᴠ (っ◔◡◔)っ

Interesting read! Well written! It took me a while since I got distracted by your Cover Image on this post lol (is that hulk?)

No offense to the Cover Image by the way! Just something I notice :)

Again, well done :D

Collapse
 
xulingfeng profile image
xulingfeng

The governance angle is the part most people overlook with AI gateways — they focus on provider switching and cost optimization, not on whether the system can explain its decisions. Data sovereignty (Mistral vs Claude) is a very real concern too.

Curious about your testing approach: how do you verify that the gateway actually enforces the policies described, beyond just routing traffic correctly?

Collapse
 
nfrankel profile image
Nicolas Fränkel

how do you verify that the gateway actually enforces the policies described

How do you verify that any of your infrastructure component does its job?

Collapse
 
xulingfeng profile image
xulingfeng

Fair point, but one thing keeps bugging me.
Traditional middleware enforces config — you test it once, you trust it. AI gateways have policies that are classification problems (PII scanning, output guardrails), not config. Model output is stochastic, so classifications have false negatives.
How do you audit trust in a probabilistic decision?
Have you seen any gateway tackle this distinction?

Thread Thread
 
nfrankel profile image
Nicolas Fränkel

I'm afraid that we disagree on your assumptions.

Gateways are gateways: glorified proxies that enforce a single entry point. At this point, what feature you use is up to you. Observability, authorization, etc. are common features, in both API and AI gateways.

The "probabilistic nature" comes from the back-end part, not from the gateway part.

Thread Thread
 
xulingfeng profile image
xulingfeng

You're right that structurally, a gateway is a gateway — the proxy layer itself doesn't introduce probability. But the nature of the backend changes what the gateway needs to do. A traditional API gateway can assume deterministic responses and focus on routing, auth, rate limiting, and caching by exact key. An AI gateway, sitting in front of a probabilistic backend, also needs response validation, semantic caching, cost-per-token tracking, and content-based fallback. Not because the gateway became probabilistic — because the thing it's proxying is. Same architectural pattern, different operational surface area.

Collapse
 
itskondrat profile image
Mykola Kondratiuk

ran one of these in front of an agent fleet - rate limiting alone was worth it. policy enforcement was the bonus I did not expect to care about.

Collapse
 
nfrankel profile image
Nicolas Fränkel

Yes, gateways unlock so many benefits you don't expect.

You barely understand how you could work without them before.

Collapse
 
itskondrat profile image
Mykola Kondratiuk

cost attribution per agent was the one i did not see coming. zero visibility into which worker was burning budget until the gateway showed up. now it is the first thing i check on a long run.

Collapse
 
merbayerp profile image
Mustafa ERBAY

I think AI gateways will eventually become a standard layer in enterprise AI architectures.
Not because organizations need access to more models, but because they need control.
Routing.
Governance.
Observability.
Budget enforcement.
Fallback strategies.
Provider independence.
The more AI becomes infrastructure, the more these concerns start looking less like AI problems and more like classic systems engineering problems.

Collapse
 
nfrankel profile image
Nicolas Fränkel

Pretty much.

Collapse
 
tokenmixai profile image
tokenmixai

Solid walkthrough — especially the Bifrost fallback config section.
The "self-hosted Go binary with built-in observability" reasoning is
exactly why teams pick it over LiteLLM-as-library.
One framing worth adding: the hosted-gateway space is now three lanes,
not two.

  1. Self-hosted (Bifrost, LiteLLM-as-library): full control, $0 gateway cost, your engineering time
  2. Hosted with markup (OpenRouter): zero ops, ~5.5% on top of provider rates
  3. Hosted at pass-through rates: TokenMix sits here — same OpenAI- compatible endpoint pattern, ~300 models (Claude, OpenAI, Gemini, DeepSeek, Qwen, Kimi), but provider pricing without markup Most "AI gateway" posts default to comparing lane 1 vs lane 2. Lane 3 is the missing option for teams that don't want to run infra but also don't want a routing tax on every token. One thing worth adding to your governance section: cost attribution across providers is the actual reason most teams adopt a gateway — not the routing or fallback features. Single bill across all providers is the unsexy but real adoption driver. (Disclosure: I work on the TokenMix research side.)
Collapse
 
0xdevc profile image
NOVAInetwork

The Mistral reasoning_effort enum mismatch is the part of this story I would dig into more. Gateway abstractions promise unified APIs but the actual provider schemas drift constantly, especially around newer fields like reasoning controls.

Your CLAUDE_CODE_DISABLE_THINKING=1 workaround gets you working but at the cost of the exact capability you wanted from a stronger model.

The deeper question is whose job it is to normalize. Three options I have seen attempted: the gateway translates per-provider (Bifrost would have to ship a Mistral-specific adapter for the enum), the client adapts to a lowest-common-denominator schema (which is what your env var workaround amounts to), or the provider conforms upstream (Mistral could accept "medium" and silently downcast). All three are bad in different ways and the right answer probably depends on whether the gateway is supposed to be transparent or opinionated.

The Patriot Act framing at the top of your post is also doing more work than people might notice. Provider independence is usually pitched as cost or reliability. Data sovereignty is a different axis and I think it is going to be the actual driver for European enterprises in 2026, not the cost optimization that gets the marketing attention.

Collapse
 
alexshev profile image
Alex Shev

AI gateways make more sense once teams move past experiments. Routing is useful, but the bigger value is policy, observability, cost control, and having one place to understand what the app is asking models to do.

Collapse
 
crenshinibon profile image
Dirk Porsche

Ditch Claude Code and use OpenCode ... it's even better and you can skip all that setup that will probably not yield good results anyway. Gemini strongly advises against this, because Claude Code and the Anthropic models are deeply entwined ... for a purpose ... locking-you-in ... so liberate yourself.

Collapse
 
jasmine_park_dev profile image
Jasmine Park

Good overview. The gateway benefit that gets underrated in these discussions is cost attribution: it is the one place every model call from every framework actually passes through, so it is the natural place to stamp team, feature, and request identifiers before the provider sees the call. We tried doing that attribution at the application layer first and every framework needed its own instrumentation; at the gateway it was one change. Routing and caching get the headlines but the accounting alone justified ours