Most backend engineers have used all three. Few can explain the difference without stuttering.
And honestly? That's not entirely their fault. The tools themselves blur the lines — Nginx can be a proxy, a balancer, and a gateway simultaneously. Kong does all three. So does Envoy. Cloudflare will happily do all of that plus serve your coffee.
But the confusion isn't just about tooling. It's about not having a mental model for why these three things exist as separate concepts. Once you have that model, the tools stop being confusing and start being obvious.
This post gives you that model — clearly, technically, and without the hand-waving.
The Gateway Layer: What It Actually Is
Before your requests touch a single line of application code, they pass through a layer of infrastructure whose entire job is to control, inspect, protect, and route traffic. That's the Gateway Layer.
It sits in front of your backend. It is not your backend. And it has three distinct personas, each solving a different class of problem:
| Component | Core Problem It Solves |
|---|---|
| Reverse Proxy | Single-server protection and optimization |
| Load Balancer | Distributing load across multiple servers |
| API Gateway | Managing API complexity in microservices |
Think of it as a spectrum, not three separate boxes. As your system grows, you add layers. Let's walk through each one.
1. Reverse Proxy — The First Line of Defense
What it is
A reverse proxy sits in front of your origin server and intercepts all incoming requests before they ever reach your application. From the client's perspective, they're talking to one server. In reality, they're talking to a proxy.
Client → Reverse Proxy → Origin Server
What it actually does
SSL/TLS Termination
This is probably the most important thing a reverse proxy does, and the most underappreciated.
When the proxy terminates TLS, it handles the entire encrypted session — certificate validation, key exchange, cipher negotiation — and then forwards requests to your backend over plain HTTP on a secure private network. Your backend never has to manage certificates or touch cryptography.
The primary benefit today is operational, not computational. TLS 1.3 and hardware-accelerated AES-NI have significantly reduced the CPU cost of cryptographic work compared to earlier TLS versions. But the centralization benefit remains compelling at any scale: one certificate to manage, one place to renew, one place to enforce cipher policies and TLS version requirements. Rotate a cert in one place and every backend is covered. Enforce TLS 1.3-only in one config and every service inherits it.
CPU offloading is still relevant under very high connection volumes, but if someone asks you why you're doing SSL termination, "centralized certificate management and policy enforcement" is the honest first answer in 2025.
Caching
Your backend doesn't need to re-compute the same response 10,000 times. The proxy can cache common responses and serve them directly, dramatically reducing the load on your origin server. This is especially valuable for read-heavy APIs serving mostly static or semi-static data.
Compression
The proxy can gzip or brotli-compress response payloads before sending them to clients, reducing bandwidth consumption without touching your application code.
IP Obfuscation and Traffic Filtering
The proxy hides your server's real IP address from the public internet. Clients only ever see the proxy's IP. Beyond that, it can filter malicious traffic — blocking known bad actors, rejecting suspicious patterns, and acting as a basic security checkpoint before anything touches your application.
When is a reverse proxy sufficient?
If you're running a single server, or a small application that doesn't need horizontal scaling yet, a reverse proxy is all you need. It handles SSL, adds basic security, reduces load through caching, and insulates your origin server from the internet.
Once you need multiple servers, you graduate to the next layer.
2. Load Balancer — Scaling Horizontal
What it is
Before diving in: a load balancer and a reverse proxy are not the same abstraction, even though they're often conflated.
A reverse proxy is a traffic pattern — it mediates and hides access to backend servers. A load balancer is a function — it distributes traffic across multiple instances. These overlap in HTTP-based systems (most L7 load balancers are implemented as reverse proxies), but they're not equivalent.
AWS NLB, for example, is a load balancer that operates at L4 — it doesn't behave like a reverse proxy at all. HAProxy and Nginx do both. The distinction matters once you're choosing infrastructure rather than just reading about it.
With that said: the core problem a load balancer solves is how do you distribute traffic intelligently across multiple backend instances?
Client → Load Balancer → [Server 1]
→ [Server 2]
→ [Server 3]
Traffic Distribution Algorithms
Round Robin
Requests are sent to each server in sequence — 1, 2, 3, 1, 2, 3. Dead simple, works well when all your servers have similar specs and roughly equal request complexity.
Least Connections
Instead of cycling mechanically, the balancer sends each new request to whichever server currently has the fewest active connections. Smarter than round robin when your requests have variable processing time — a slow query shouldn't pile up on an already-struggling server.
Weighted Distribution
Not all servers are equal. If you have a 16-core machine and an 8-core machine in the same pool, you don't want equal traffic. Weights let you say "this server can handle twice the load — give it twice the requests." Useful when running mixed-hardware clusters or during gradual capacity upgrades.
IP Hashing
Uses the client's IP address to deterministically route them to the same backend server every time. This is useful for session affinity — cases where a user's session is stored locally on a server (WebSocket connections, in-memory caches, etc.).
Worth noting: modern systems generally prefer stateless architectures, where session state lives in an external store like Redis. That way, any server can handle any request. IP hashing is a workaround for systems that haven't made that transition yet.
Health Checks and Failover
This is where load balancers earn their keep in production.
A load balancer continuously sends health check probes to each server in the pool. If a server fails to respond — whether from a crash, an OOM kill, a deployment mishap, or a runaway process — the balancer automatically removes it from rotation. Requests stop being sent to the dead instance. Users don't see the failure.
When the server recovers, health checks pass again, and it's automatically re-added to the pool.
This is the foundational mechanism behind high availability. No manual intervention required, no on-call engineer frantically removing servers from a config file at 2am.
Layer 4 vs Layer 7: Choosing the Right Level
Load balancers can operate at two different network layers, and the choice has real performance implications.
Layer 4 (Transport Layer)
L4 balancers work at the TCP/UDP level. They route traffic based purely on IP addresses and ports — they never look inside the packet. This makes them extremely fast. Low overhead, minimal CPU usage, capable of handling millions of connections per second.
Use L4 when:
- You need raw throughput at the edge
- You're routing non-HTTP traffic (databases, game servers, VoIP, IoT protocols)
- You want SSL pass-through (the backend owns the cert end-to-end, required for some compliance scenarios)
- You're fronting a fleet of L7 balancers in a tiered architecture
Layer 7 (Application Layer)
L7 balancers understand HTTP. They can inspect headers, URL paths, query params, and even body content. This unlocks intelligent, content-aware routing:
- Route
/api/v2/*to a new cluster,/api/v1/*to legacy services - Send traffic from mobile clients to a specific backend tier
- Route based on custom headers, cookies, or authenticated user identity
The trade-off is overhead — inspecting packet content is more expensive than routing by IP. But for most web applications, L7 is the right default.
Common production pattern: Put an L4 balancer at the very edge to absorb raw traffic volume, then distribute it across a fleet of L7 balancers that handle content-aware routing. You get the throughput of L4 with the intelligence of L7.
3. API Gateway — Taming Microservices
The Problem It Solves
Here's what happens when you split a monolith into microservices:
You go from one codebase to twelve. Each team owns their service. Fast, independent, scalable — the dream.
Then reality arrives.
Every service needs authentication. Every service needs rate limiting. Every service needs logging. Every service needs request validation. Every team implements these things slightly differently, with different libraries, different error formats, different token validation logic.
Six months later, your "User Service" has JWT validation on version 3.1 of a library, your "Order Service" is on 2.8, and your "Notification Service" has a subtly wrong implementation that passed code review because the reviewer was rushing a deadline.
This is logic drift. And it's one of the most expensive problems in distributed systems — not because it breaks things immediately, but because it breaks things inconsistently, and inconsistent failures are the hardest to debug.
An API Gateway is the solution. It's a single entry point that handles all infrastructure concerns before any request touches a microservice.
Client → API Gateway → User Service
→ Order Service
→ Payment Service
→ Notification Service
What an API Gateway Does
Centralized Authentication and Authorization
The gateway validates OAuth2 tokens, JWTs, or API keys in one place. Invalid requests are rejected at the edge. Your microservices never see unauthenticated traffic. More importantly, they don't need to implement authentication — that's no longer their problem.
Rate Limiting
Define throttling rules once, enforce them everywhere. No per-service implementation, no inconsistent limits, no team accidentally shipping a service without rate limiting because it "wasn't in scope this sprint."
Request and Response Transformation
The gateway can translate between formats. Your legacy internal services speak XML? Your mobile clients send JSON? The gateway handles the translation. Services don't need to know how their clients represent data — they work in their native format and the gateway handles the conversion.
API Versioning and Traffic Routing
Route /v1/users to your stable legacy service, /v2/users to the new one being rolled out. Migrate traffic incrementally. Kill the old version when you're confident. Do all of this in one config, not scattered across twelve service codebases.
Unified Observability
Every request flows through the gateway. That means one centralized source of metrics, logs, and traces. When an incident happens, you don't have to correlate logs from twelve different services to understand what's failing. The gateway tells you.
Request Aggregation and Backend for Frontend (BFF)
This one gets skipped in most explainers, but it's a core gateway capability.
Consider a mobile dashboard that needs to display user info, recent orders, recommendations, and notifications — all in one screen load. Without a gateway, the mobile client makes four separate API calls, waits for four responses, and assembles the data itself. Over a mobile network, that round-trip cost compounds.
With an API Gateway acting as a BFF (Backend for Frontend), the client makes a single call to /dashboard. The gateway fans out to the relevant services in parallel, aggregates the responses, and returns one unified payload:
Client → GET /dashboard → API Gateway → User Service
→ Order Service
→ Recommendation Service
→ Notification Service
↓
Single combined response → Client
This keeps clients thin, reduces network chattiness, and lets backend services remain focused on their own domains instead of knowing what every client type needs.
What an API Gateway Costs You
Most articles about API Gateways read like product marketing. Let's be honest about the tradeoffs.
Single point of failure. If the gateway goes down, every service goes down with it. A misconfigured routing rule, a bad deployment, a gateway-level memory leak — any of these can take your entire platform offline. This makes gateway reliability a first-class engineering concern, not an afterthought. High availability, blue-green deployments, and exhaustive config validation are not optional.
Added latency. Every request gets an extra network hop. Under normal conditions, this is negligible — a well-configured gateway adds low single-digit milliseconds. Under high load or with a misconfigured gateway doing expensive work (complex transformations, slow auth lookups), that hop starts to matter.
Team bottlenecks. In practice, the team that owns the gateway config becomes a bottleneck. New service? You need a gateway route. New auth policy? Gateway team. Rate limit change? Gateway team. This can be mitigated with good self-service tooling and declarative config, but it's a real organizational friction point.
Configuration complexity at scale. A Kong or Apigee installation managing hundreds of routes, policies, plugins, and environments can become a product unto itself. It needs versioning, testing, staging, and on-call ownership. Factor this into your operational cost estimates.
None of these are reasons to avoid API Gateways — they're reasons to operate them deliberately.
Why the Lines Blur (And Why That's Fine)
Here's the honest answer to "why does everyone get confused about this?"
Because Nginx does all three. Kong does all three. Envoy does all three. AWS API Gateway, Traefik, Caddy, HAProxy — they all blur the lines.
You can configure Nginx as a dumb reverse proxy in ten lines. You can configure it as a full L7 load balancer. You can add Lua plugins and make it behave like an API gateway. Same binary, radically different behavior.
The important mental shift: stop thinking about these as three separate products. Think of them as a spectrum of capabilities.
The question is never "which one do I use?" The question is: "which capability do I need for the problem I'm solving right now?"
- Need SSL offloading and basic security? Enable proxy capabilities.
- Need to distribute load across replicas? Enable balancing capabilities.
- Need centralized auth and rate limiting for microservices? Enable gateway capabilities.
Sometimes you need all three from the same tool. Sometimes you need dedicated tools at each layer. Let your architecture's requirements drive the decision, not vendor marketing.
Here's a quick reference for how common tools map to capabilities:
| Tool | Reverse Proxy | Load Balancer | API Gateway |
|---|---|---|---|
| Nginx | ✅ | ✅ | Limited |
| HAProxy | ✅ | ✅ | Limited |
| Envoy | ✅ | ✅ | ✅ |
| Kong | ✅ | ✅ | ✅ |
| Traefik | ✅ | ✅ | ✅ |
| Cloudflare | ✅ | ✅ | Partial |
| AWS ALB | Partial | ✅ | Partial |
| AWS NLB | ❌ | ✅ | ❌ |
| AWS API Gateway | ❌ | Partial | ✅ |
AWS NLB is worth noting specifically — it's a load balancer that operates at L4 and does not behave like a reverse proxy, which is why the function/pattern distinction matters in practice.
What a Modern Production Architecture Actually Looks Like
When systems reach scale, these components don't replace each other — they layer.
Here's one common pattern:
Internet
↓
[CDN] — Static assets, edge caching, initial SSL termination, DDoS absorption
↓
[API Gateway] — Authentication, rate limiting, routing, observability
↓
[Load Balancer] — Traffic distribution across service clusters
↓
[Internal Proxies] — Service-to-service communication
↓
[Microservices]
That said — this is one common architecture, not the canonical one. In practice, teams arrive at different layering depending on their infrastructure choices. Other valid production patterns include:
# Kubernetes-native
Internet → CDN/WAF → Cloud Load Balancer → Ingress Controller → Services
# API Gateway at the edge
Internet → API Gateway (Envoy/Kong) → Kubernetes Ingress → Pods
# Cloudflare-heavy
Cloudflare → Envoy Gateway → Service Mesh (Istio) → Pods
The principles are consistent across all of them — edge, entry point, distribution, internal communication. The specific tools filling those roles vary.
If You Work in Kubernetes
These concepts don't disappear in Kubernetes — they get mapped to Kubernetes-specific primitives:
| Gateway Layer Concept | Kubernetes Equivalent |
|---|---|
| Reverse Proxy | Ingress Controller (Nginx Ingress, Traefik) |
| Load Balancer | Service (type: LoadBalancer) / Cloud LB |
| API Gateway | Gateway API / Kong Ingress / Ambassador |
| Internal Proxy | Envoy Sidecar (in service mesh setups) |
| Service Mesh | Istio / Linkerd / Cilium |
When you configure an Nginx Ingress rule, you're configuring reverse proxy behavior. When you define a Kubernetes Service, you're setting up internal load balancing. The vocabulary changes; the underlying concepts don't.
Each layer has a specific job:
- CDN: Handles static content and absorbs high-volume traffic before it hits your origin. Think Cloudflare, Fastly, CloudFront.
- API Gateway: The main entry point for dynamic requests. Handles auth and routes to the right service.
- Load Balancer: Distributes traffic within a service cluster, manages health checks and failover.
- Internal Proxies / Service Mesh: Manages service-to-service communication at scale with mTLS, circuit breaking, and retries.
You don't need all of this on day one. A startup running a single service needs a reverse proxy, not a service mesh. But knowing this map means you understand where you're headed as you scale, and you can design with that future in mind instead of painting yourself into corners.
Security Considerations Across Every Layer
The Gateway Layer is also your primary security perimeter. Here's where each component contributes:
Reverse Proxy
- Hides internal server IPs and architecture from the public internet
- Enforces HTTPS via SSL termination; manages certificate rotation centrally
- Can integrate a Web Application Firewall (WAF) to block SQLi, XSS, and known attack patterns
Load Balancer
- Health checks remove compromised or unresponsive instances automatically
- Can enforce IP allowlisting or geo-blocking at the traffic layer
- L4 pass-through mode preserves end-to-end TLS for compliance requirements
API Gateway
- Validates JWT/OAuth tokens, rejects unauthenticated traffic at the edge
- Enforces rate limits, preventing brute force and credential stuffing attacks
- Validates request payloads against OpenAPI schemas before forwarding
- Provides a complete audit trail: who called what, when, with what parameters
Internal Layer (Service Mesh)
- Mutual TLS (mTLS) between services: even on the internal network, every service must prove its identity
- Circuit breaking prevents a single failing service from cascading failures across the system
- Automatic retries with backoff, handled by the infrastructure — not your application code
The key principle: defense in depth. Each layer enforces its own set of controls. A request that somehow bypasses the gateway still hits load balancer rules. A request that somehow bypasses those still hits service-level auth. No single point of failure in your security posture.
Decision Framework: What Do You Actually Need?
Are you running a single server?
└─ YES → Reverse Proxy is sufficient.
(SSL termination, caching, basic security)
Are you scaling horizontally?
└─ YES → Add a Load Balancer.
(Round robin, health checks, failover)
Are you running multiple independent services?
└─ YES → Add an API Gateway.
(Auth, rate limiting, routing, observability)
Do you have 20+ services with complex inter-service communication?
└─ YES → Consider a Service Mesh.
(mTLS, circuit breaking, distributed tracing)
The cardinal rule: don't over-architect for problems you don't have yet. A two-person startup shipping an MVP does not need Kong, Istio, and a 4-layer CDN strategy. A fintech processing millions of daily transactions running 40 microservices does.
Design for where you are. Build with an eye on where you're going. Layer when you actually need the layer, not when a conference talk made it sound cool.
Key Takeaways
- A Reverse Proxy protects and optimizes a single backend server: SSL termination, caching, compression, IP hiding.
- A Load Balancer is a function that distributes traffic across multiple instances. In HTTP systems it's often implemented as a reverse proxy, but not always — AWS NLB is a load balancer that isn't. The two concepts solve different problems: the proxy hides and mediates access, the balancer distributes load.
- An API Gateway centralizes infrastructure concerns for microservices — auth, rate limiting, versioning, transformation, request aggregation (BFF) — and prevents logic drift across teams. It also introduces real costs: single point of failure, latency overhead, and team bottlenecks. Operate it deliberately.
- These are not mutually exclusive. Modern tools like Nginx, Kong, and Envoy implement all three capabilities. The distinction is conceptual, not product-based.
- In high-scale systems, these layer on top of each other — but there's no single universal arrangement. CDN → Gateway → Load Balancer → Proxies → Services is one common pattern; Kubernetes-native setups, Cloudflare-heavy stacks, and gateway-first architectures all look different.
- Security is not one layer's job. Every component in the gateway layer enforces its own controls.
The moment you stop asking "which tool is the right one?" and start asking "which capability solves my current engineering problem?" — the entire space becomes a lot less confusing.
If this was useful, consider sharing it with a backend engineer who's been staring at an Nginx config wondering why it has 400 lines. You might save them an afternoon.
Tags: #backend #architecture #webdev #devops #systemdesign
Top comments (0)