LLM Gateway vs LLM Proxy vs LLM Router: What's the Difference?

#ai #go #backend #llm

Everyone calls their product a "gateway" now. LiteLLM markets itself as both a proxy and a gateway. Portkey is a gateway. Helicone's docs use proxy and gateway interchangeably. There's a well-cited Medium post by Bijit Ghosh that ranks on Google for this comparison — correct high-level definitions, but it stops before the implementation details that tell you what to actually choose and deploy.

Here's the precise version: three different layers, concrete Go code for each, and a decision framework based on team size.

TL;DR:

Proxy = transport layer. Pipes requests from your app to the provider
Router = decision layer. Chooses which model or provider handles the request
Gateway = policy layer. Auth, rate limits, budget enforcement, audit trails
They're not separate products — they're three layers of the same stack

The Proxy: Transport Layer

A proxy intercepts your HTTP request and forwards it to the provider. Your app changes one thing: the base_url.

// Before
client := openai.NewClient(apiKey)

// After — same SDK, same code, different URL
client := openai.NewClient(
    apiKey,
    openai.WithBaseURL("https://proxy.your-company.com/v1"),
)

A minimal Go proxy handler:

func (p *Proxy) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    // Swap client key → upstream provider key
    r.Header.Set("Authorization", "Bearer "+p.providerKey)

    target, _ := url.Parse("https://api.openai.com")
    proxy := httputil.NewSingleHostReverseProxy(target)
    proxy.ServeHTTP(w, r)
}

That's the core. A proxy doesn't decide anything — it doesn't choose GPT-4o over GPT-4o-mini, doesn't enforce rate limits. It pipes traffic. Everything else is built on top of this.

The Router: Decision Layer

A router decides which model and provider handle each request. It returns a routing decision; the proxy executes it. The router is pure business logic — no HTTP, no transport — which makes it testable independently and swappable without touching the proxy.

Cost-based routing (most valuable):

func (r *Router) Route(req *ChatRequest) RoutingDecision {
    complexity := r.estimateComplexity(req)

    switch {
    case complexity < 0.3:
        // Short, simple: classification, extraction, booleans
        return RoutingDecision{Model: "gpt-4o-mini", Provider: "openai"}
    case complexity < 0.7:
        // Medium: summarization, structured output
        return RoutingDecision{Model: "gpt-4o", Provider: "openai"}
    default:
        // Complex: multi-step reasoning, code generation
        return RoutingDecision{Model: "claude-opus-4-6", Provider: "anthropic"}
    }
}

Failover routing:

var providerChain = []RoutingDecision{
    {Model: "gpt-4o",            Provider: "openai"},
    {Model: "claude-sonnet-4-6", Provider: "anthropic"},
    {Model: "gemini-1.5-pro",    Provider: "google"},
}

func (r *Router) RouteWithFailover(req *ChatRequest) RoutingDecision {
    for _, candidate := range providerChain {
        if r.circuit.IsAvailable(candidate.Provider) {
            return candidate
        }
    }
    return providerChain[len(providerChain)-1]
}

Metadata-based routing (route by feature tag your app sets):

func (r *Router) RouteByTag(req *ChatRequest, headers http.Header) RoutingDecision {
    switch headers.Get("X-Feature") {
    case "support-bot":
        return RoutingDecision{Model: "gpt-4o-mini", Provider: "openai"}
    case "code-review":
        return RoutingDecision{Model: "claude-sonnet-4-6", Provider: "anthropic"}
    default:
        return r.Route(req)
    }
}

The Gateway: Policy Layer

A gateway adds policy enforcement above the router and proxy. The defining characteristic: the gateway has a concept of identity. It knows which team or user is sending each request and enforces rules based on that identity.

In Go, a gateway is a middleware chain wrapping the proxy:

func BuildGateway(proxy http.Handler) http.Handler {
    return chain(
        AuthMiddleware,      // validate key → resolve tenant identity
        RateLimitMiddleware, // per-tenant request + token rate limits
        BudgetMiddleware,    // per-team monthly spend enforcement
        AuditMiddleware,     // log every request with identity + decision
        proxy,
    )
}

func AuthMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        key := r.Header.Get("Authorization")
        tenant, err := db.LookupTenant(key)
        if err != nil {
            http.Error(w, "unauthorized", 401)
            return
        }
        r = r.WithContext(context.WithValue(r.Context(), tenantKey, tenant))
        r.Header.Set("Authorization", "Bearer "+tenant.ProviderKey)
        next.ServeHTTP(w, r)
    })
}

func BudgetMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        tenant := r.Context().Value(tenantKey).(*Tenant)
        if tenant.MonthlySpend >= tenant.BudgetLimit {
            http.Error(w, `{"error":"budget_exceeded"}`, 429)
            return
        }
        next.ServeHTTP(w, r)
    })
}

A proxy is stateless with respect to the caller. A gateway is not.

How Products Map to These Layers

Product	Proxy	Router	Gateway	Cost Intelligence
LiteLLM	✓	✓ (100+ providers)	Partial	—
Helicone	✓	—	Partial	Basic
Portkey	✓	✓	✓ (enterprise)	Basic
Langfuse	— (async only)	—	—	Basic
Preto	✓	✓	✓	✓ (recommendations)

One thing to know about Langfuse: it's an async observer — it doesn't sit in the request path. Zero proxy latency, but also no caching, routing, or real-time budget enforcement. A deliberate architectural choice — just a different layer entirely — fine if you only need post-hoc observability and don't need caching, routing, or budget enforcement.

What You Actually Need

One team, one model, under $2K/month → direct SDK calls. Add a proxy for logging once you have real traffic to observe.

Multiple models, cost visibility needed → proxy + router. One URL change gives you per-request cost attribution and the ability to route simple tasks to cheaper models. Teams typically see 20–40% cost reduction within the first week of enabling model routing.

Multiple teams, budget enforcement needed → gateway. The moment two teams share an API key and neither can see what the other spends, you have a governance problem. A bill spike hits. Nobody knows which team caused it. Nobody can be held accountable.

Compliance requirements (SOC 2, HIPAA, GDPR) → gateway with audit logging and PII controls. A gateway gives you the audit trail to prove it.

We're building Preto.ai — all three layers (proxy + router + gateway) plus cost intelligence in one URL change. Free up to 10K requests.