Microservices Patterns: A Practical Guide to Choosing the Right Architecture
Most teams don't fail at microservices because the pattern is wrong. They fail because they adopted all the patterns at once, before they understood which problems each one was actually solving.
I've watched teams ship a service mesh, an event bus, a saga orchestrator, and a CQRS read model in the same quarter — for a system that had fewer services and limited users. The result is what's now called a distributed monolith: all the operational complexity of microservices with none of the independent deployability or scalability.
This guide walks through the patterns that matter, what each one is actually for, and when not to use it. Code examples are in Python, with a decision matrix at the end so you can match the pattern to your problem.
Microservices vs. Distributed Monolith
A real microservice has three properties:
- Independently deployable. You can ship Service A without coordinating with the teams behind Services B, C, and D.
- Owns its data. No other service reads or writes its database directly. The only way in is through its API.
- Loosely coupled. A change to one service's internal schema doesn't break others.
If you can't deploy a service without redeploying three others, you don't have microservices — you have a monolith with network latency. That's almost always worse than the monolith you started with. Keep this test in mind as we go through each pattern: does this pattern preserve those three properties, or quietly erode them?
1. API Gateway
How it works
An API gateway is a single entry point that sits in front of your services and handles cross-cutting concerns: routing, authentication, rate limiting, request/response transformation, and aggregation.
# Simplified gateway routing logic
from fastapi import FastAPI, Request, HTTPException
import httpx
app = FastAPI()
SERVICE_ROUTES = {
"/users": "http://user-service:8000",
"/orders": "http://order-service:8000",
"/payments": "http://payment-service:8000",
}
@app.api_route("/{path:path}", methods=["GET", "POST", "PUT", "DELETE"])
async def gateway(path: str, request: Request):
# 1. Authenticate once at the edge
if not verify_jwt(request.headers.get("Authorization", "")):
raise HTTPException(401, "Invalid token")
# 2. Route to the right downstream service
prefix = "/" + path.split("/")[0]
upstream = SERVICE_ROUTES.get(prefix)
if not upstream:
raise HTTPException(404, "Unknown route")
# 3. Proxy the request
async with httpx.AsyncClient() as client:
response = await client.request(
request.method,
f"{upstream}/{path}",
content=await request.body(),
headers={k: v for k, v in request.headers.items() if k != "host"},
)
return response.json()
When to use it
Use a gateway as soon as you have more than two or three services that share clients. It centralizes auth, TLS termination, and rate limiting so each service doesn't reimplement them. The example above is illustrative — for production you'll pick between two deployment models.
Serverless vs. server-based gateway
There are two practical ways to run a gateway, and the choice has bigger operational implications than the choice of vendor.
Serverless gateway (AWS API Gateway, Azure API Management consumption tier, Google Cloud API Gateway, Cloudflare Workers). The cloud provider runs the gateway, you describe routes and policies declaratively, and you pay per request. There are no servers to patch, no capacity to size, and it scales from zero to bursty traffic without you tuning anything. The downsides: per-request pricing gets expensive at high sustained throughput, cold starts add latency on the first request after idle, you're locked into the provider's auth/transform model, and customizing behavior beyond what the console exposes usually means writing Lambda authorizers or Workers — which is its own operational surface. Reach for this when traffic is spiky or modest, when your team is small, or when you're already deep in one cloud and want the integration.
Server-based gateway (Kong, Envoy, Traefik, NGINX, HAProxy, Spring Cloud Gateway, EC Instances). You run the gateway as long-lived processes — usually as pods in Kubernetes or instances behind a load balancer. You get full control over plugins, request transformation, and observability, predictable cost at sustained scale (you're paying for capacity, not per request), and you can run the same gateway in any environment, including on-prem. The cost is real: you own patching, capacity planning, HA topology, and config rollout. Reach for this when you have sustained high traffic, complex routing or transformation needs, multi-cloud or hybrid requirements, or an existing platform team that already operates similar infrastructure.
# Kong declarative config example — server-based gateway
_format_version: "3.0"
services:
- name: order-service
url: http://order-service.default.svc.cluster.local:8000
routes:
- name: orders
paths: ["/orders"]
plugins:
- name: jwt
- name: rate-limiting
config:
minute: 100
A rough rule of thumb: under a few hundred requests per second with unpredictable traffic, the serverless model usually wins on total cost of ownership. Above that, with steady load, the server-based model wins on per-request cost and control. Whichever you pick, treat the gateway config as code — version it, review it, and deploy it through a pipeline.
Tradeoffs
The gateway is a single point of failure and a deployment bottleneck regardless of which model you choose. Every team that needs a new route has to coordinate with whoever owns the gateway config. Keep the gateway dumb — routing, auth, rate limiting — and push business logic into the services themselves. The moment you find yourself writing if-statements about specific user IDs in the gateway, you've made it part of your application.
2. Service Discovery
How it works
In a microservices system, service instances come and go — autoscaling spins them up, deployments roll them, failures kill them. Hardcoding http://order-service-prod-3:8000 won't survive the first deploy. Service discovery solves this with a registry: services register themselves on startup, and clients ask the registry "where is the order service right now?"
import httpx
import random
class ServiceRegistry:
"""Client-side discovery against Consul, etcd, or Kubernetes DNS."""
def __init__(self, registry_url: str):
self.registry_url = registry_url
self._cache: dict[str, list[str]] = {}
async def resolve(self, service_name: str) -> str:
if service_name not in self._cache:
async with httpx.AsyncClient() as client:
resp = await client.get(f"{self.registry_url}/v1/services/{service_name}")
self._cache[service_name] = resp.json()["instances"]
# Simple client-side load balancing
return random.choice(self._cache[service_name])
# In practice, refresh the cache periodically and remove failed instances
When to use it
You need service discovery the moment you have more than one instance of any service, or as soon as you're deploying to a dynamic environment (Kubernetes, ECS, Nomad). If you're on Kubernetes you get this for free — DNS resolves order-service.default.svc.cluster.local to a healthy pod. Don't build your own unless you have a reason.
Tradeoffs
Client-side discovery (each service queries the registry) reduces a network hop but spreads discovery logic everywhere. Server-side discovery (a load balancer fronts each service) is simpler for clients but adds a hop and another component to operate. On Kubernetes, the platform handles this — don't fight it.
3. Circuit Breaker
How it works
When a downstream service starts failing, the worst thing your service can do is keep hammering it with retries. You exhaust your own threads, queue up requests that will never succeed, and the failure cascades upstream until your whole system is down.
A circuit breaker watches the failure rate of calls to a downstream service. When failures cross a threshold, it "opens" — for a cooldown period, calls fail fast without ever hitting the downstream. After the cooldown, it lets a small number of test requests through; if they succeed, it closes again.
import time
from enum import Enum
class State(Enum):
CLOSED = "closed" # Normal operation
OPEN = "open" # Failing fast
HALF_OPEN = "half_open" # Testing recovery
class CircuitBreaker:
def __init__(self, failure_threshold: int = 5, recovery_timeout: int = 30):
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.failure_count = 0
self.last_failure_time = 0.0
self.state = State.CLOSED
def call(self, func, *args, **kwargs):
if self.state == State.OPEN:
if time.time() - self.last_failure_time > self.recovery_timeout:
self.state = State.HALF_OPEN
else:
raise RuntimeError("Circuit breaker is OPEN — failing fast")
try:
result = func(*args, **kwargs)
self._on_success()
return result
except Exception:
self._on_failure()
raise
def _on_success(self):
self.failure_count = 0
self.state = State.CLOSED
def _on_failure(self):
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = State.OPEN
When to use it
Use a circuit breaker around any synchronous call to a service you don't control fully — third-party APIs, services owned by another team, anything where a slow response from them shouldn't take you down. Pair it with timeouts (always set timeouts on outbound calls) and bulkheads (limit how many threads any one downstream can consume).
Tradeoffs
A circuit breaker shifts the failure from "slow timeout" to "fast error." That's almost always the right tradeoff, but your callers now need to handle the fast-fail case — usually with a fallback (cached data, a default response, a queued retry). Libraries like resilience4j, Polly, or Hystrix (now in maintenance mode — use resilience4j) implement this properly with metrics built in.
4. Saga — Distributed Transactions Without Distributed Locks
How it works
You can't run a SQL transaction across two services without distributed locks, and you don't want distributed locks. A saga replaces the ACID transaction with a sequence of local transactions, each of which has a defined compensating action that undoes it.
For example, booking a trip might be: reserve hotel → charge card → book flight. If the flight booking fails, the saga runs compensations in reverse: refund the card, release the hotel.
from dataclasses import dataclass
from typing import Callable
@dataclass
class SagaStep:
name: str
action: Callable
compensation: Callable
class Saga:
def __init__(self, steps: list[SagaStep]):
self.steps = steps
def execute(self, context: dict):
completed = []
try:
for step in self.steps:
result = step.action(context)
completed.append((step, result))
except Exception as e:
# Run compensations in reverse order
for step, result in reversed(completed):
try:
step.compensation(context, result)
except Exception as comp_err:
# Compensations should be idempotent and logged loudly
log.error(f"Compensation {step.name} failed: {comp_err}")
raise e
# Usage
saga = Saga([
SagaStep("reserve_hotel", reserve_hotel, release_hotel),
SagaStep("charge_card", charge_card, refund_card),
SagaStep("book_flight", book_flight, cancel_flight),
])
saga.execute({"user_id": "u_123", "trip_id": "t_456"})
Orchestration vs. choreography
Two flavors:
- Orchestration (above): a central coordinator drives each step. Easier to reason about and debug, but the orchestrator becomes a critical service.
- Choreography: each service listens for events and decides what to do next. No central point, but the flow is implicit — to understand it, you have to read every subscriber.
Start with orchestration. Move to choreography only when the orchestrator becomes a bottleneck or you have genuinely independent teams owning each step.
Tradeoffs
Sagas give you eventual consistency, not atomicity. There will be moments when a hotel is reserved but the card hasn't been charged yet — your system must tolerate that. Every compensation must be idempotent because it might be retried. And if a compensation itself fails, you need a manual intervention path; don't pretend the system can always self-heal.
5. Asynchronous Messaging
How it works
Synchronous calls couple availability: if Service B is down, Service A's request fails. Async messaging breaks that coupling. Service A publishes an event to a broker (Kafka, RabbitMQ, NATS, SQS), and Service B consumes it whenever it's ready.
# Producer
import json
from kafka import KafkaProducer
producer = KafkaProducer(
bootstrap_servers=["kafka:9092"],
value_serializer=lambda v: json.dumps(v).encode(),
acks="all", # Wait for all replicas — durability over latency
enable_idempotence=True, # Prevents duplicate messages from retries
)
def on_order_placed(order):
producer.send("orders.placed", {
"event_id": order.id,
"user_id": order.user_id,
"amount": order.amount,
"timestamp": order.created_at.isoformat(),
})
# Consumer (in a different service)
from kafka import KafkaConsumer
consumer = KafkaConsumer(
"orders.placed",
bootstrap_servers=["kafka:9092"],
group_id="inventory-service",
enable_auto_commit=False, # Commit only after successful processing
value_deserializer=lambda v: json.loads(v.decode()),
)
for message in consumer:
try:
reserve_inventory(message.value)
consumer.commit()
except Exception:
# Don't commit — message will be redelivered
log.exception("Failed to process order")
When to use it
Reach for async when the consumer doesn't need to respond immediately, when you have multiple consumers for the same event (one event, many subscribers), or when you want to decouple deployment lifecycles. Order placement → inventory update, audit log, fraud check, email notification — all naturally async.
Tradeoffs
Async is harder to debug. A request that used to be one HTTP call is now a chain of events across topics, with retries, dead-letter queues, and ordering caveats. You need distributed tracing (OpenTelemetry) to follow a request across services, and you need to design for at-least-once delivery — consumers must be idempotent because the same message will be redelivered occasionally.
6. Service Mesh
How it works
A service mesh (Istio, Linkerd, Consul Connect) puts a lightweight proxy (a "sidecar") next to every service instance. All traffic in and out flows through the sidecar, which handles mTLS, retries, timeouts, load balancing, observability, and traffic shifting — without any code in your service.
A typical Istio config that routes 10% of traffic to a new version of a service:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: order-service
spec:
hosts:
- order-service
http:
- route:
- destination:
host: order-service
subset: v1
weight: 90
- destination:
host: order-service
subset: v2
weight: 10
When to use it
A service mesh starts paying for itself somewhere around 20–30 services, when the cost of reimplementing mTLS, retries, and observability in every service exceeds the operational cost of running the mesh. Below that, a good HTTP client library and a few Kubernetes primitives will do.
Tradeoffs
A mesh is operationally heavy. You're adding a proxy to every pod (memory, CPU, latency), a control plane to manage, and a new failure mode to debug. The benefit is that all your services get consistent security, retry behavior, and telemetry for free. The cost is real complexity, and it's not the first thing you should reach for.
Decision Matrix
| Problem | Pattern | When to skip it |
|---|---|---|
| Clients shouldn't know about your internal services | API Gateway | You have only one or two services |
| Service instances are dynamic | Service Discovery | You're on Kubernetes (you already have it) |
| Downstream failures shouldn't take you down | Circuit Breaker | You only make in-process calls |
| Multi-service transaction without distributed locks | Saga | The operation fits in a single service's database |
| Decouple producers from consumers | Async messaging | You need a synchronous response and can tolerate the coupling |
| Multiple read patterns over the same data | CQRS | One model serves all read patterns well |
| Consistent security, retries, telemetry across many services | Service Mesh | You have fewer than ~20 services |
| Real-time data pipeline across services | Event streaming (Kafka) | Your throughput fits in a queue |
Common Mistakes Across All Patterns
1. Splitting by technical layer, not business capability. "Database service," "API service," "validation service" is not microservices — it's a layered monolith spread across machines. Split along business boundaries (orders, payments, inventory), where each service owns a coherent capability and its data.
2. Sharing a database between services. The moment two services read or write the same tables, you've coupled their schemas and deployment lifecycles. They're now one service in two processes. Each service owns its data; everything else goes through its API.
3. Synchronous chains. Service A calls B calls C calls D, all synchronously. The latency multiplies, the failure modes compound, and any one slow service freezes the whole chain. Break the chain with async messaging where you can.
4. No distributed tracing. Once you have more than three services, you cannot debug production without trace IDs that propagate through every call and event. Add OpenTelemetry on day one — retrofitting it later is painful.
5. Inconsistent error contracts. Each service returns errors in a different shape. Standardize early — pick one error format (RFC 7807 Problem Details is a good default) and enforce it across every service.
6. Microservices for a team of three. The cost of microservices is mostly organizational: independent deploys, on-call rotations, service ownership. If you have one team, a well-structured monolith ships faster and breaks less. Adopt microservices when your team structure starts forcing the split, not before.
Conclusion
Microservices are a response to organizational scale, not a default architecture. The patterns above exist because distributed systems introduce problems monoliths don't have — partial failure, eventual consistency, network latency — and each pattern solves one of those problems at the cost of complexity somewhere else.
Pick the smallest set of patterns that solves your actual problems. Add the next one only when the pain of not having it exceeds the cost of operating it. A two-service system with an API gateway and good observability beats a six-service system with a mesh, a saga orchestrator, and a CQRS pipeline that the team can't fully explain.
To recap:
- API Gateway when clients shouldn't see your internal topology.
- Service Discovery when instances move (and use what your platform gives you).
- Circuit Breaker around every synchronous call you don't fully control.
- Saga when a transaction has to span services — orchestration first, choreography later.
- Async messaging to decouple lifecycles and fan out events.
- Service Mesh once the cost of reimplementing security and observability per-service exceeds the cost of running the mesh.
Start with the decision matrix. Read the source material for whichever pattern you adopt — Sam Newman's Building Microservices, Chris Richardson's microservices.io, and the original Netflix and Amazon engineering blogs are far more honest about tradeoffs than most vendor documentation.
I wrote this after seeing a team try to implement a Service Mesh for a system with only 3 services. What’s the most 'over-engineered' setup you’ve encountered lately?
Top comments (1)
Which of these patterns do you think is the biggest 'trap' for teams new to microservices? I’d love to hear your 'distributed monolith' horror stories in the comments!