DEV Community

Cover image for Microservices Patterns
Shoumik Chakravarty
Shoumik Chakravarty

Posted on

Microservices Patterns

Microservices Patterns: A Practical Guide to Choosing the Right Architecture

Most teams don't fail at microservices because the pattern is wrong. They fail because they adopted all the patterns at once, before they understood which problems each one was actually solving.

I've watched teams ship a service mesh, an event bus, a saga orchestrator, and a CQRS read model in the same quarter — for a system that had fewer services and limited users. The result is what's now called a distributed monolith: all the operational complexity of microservices with none of the independent deployability or scalability.

This guide walks through the patterns that matter, what each one is actually for, and when not to use it. Code examples are in Python, with a decision matrix at the end so you can match the pattern to your problem.


Microservices vs. Distributed Monolith

A real microservice has three properties:

  • Independently deployable. You can ship Service A without coordinating with the teams behind Services B, C, and D.
  • Owns its data. No other service reads or writes its database directly. The only way in is through its API.
  • Loosely coupled. A change to one service's internal schema doesn't break others.

If you can't deploy a service without redeploying three others, you don't have microservices — you have a monolith with network latency. That's almost always worse than the monolith you started with. Keep this test in mind as we go through each pattern: does this pattern preserve those three properties, or quietly erode them?


1. API Gateway

How it works

An API gateway is a single entry point that sits in front of your services and handles cross-cutting concerns: routing, authentication, rate limiting, request/response transformation, and aggregation.

# Simplified gateway routing logic
from fastapi import FastAPI, Request, HTTPException
import httpx

app = FastAPI()

SERVICE_ROUTES = {
    "/users":   "http://user-service:8000",
    "/orders":  "http://order-service:8000",
    "/payments": "http://payment-service:8000",
}

@app.api_route("/{path:path}", methods=["GET", "POST", "PUT", "DELETE"])
async def gateway(path: str, request: Request):
    # 1. Authenticate once at the edge
    if not verify_jwt(request.headers.get("Authorization", "")):
        raise HTTPException(401, "Invalid token")

    # 2. Route to the right downstream service
    prefix = "/" + path.split("/")[0]
    upstream = SERVICE_ROUTES.get(prefix)
    if not upstream:
        raise HTTPException(404, "Unknown route")

    # 3. Proxy the request
    async with httpx.AsyncClient() as client:
        response = await client.request(
            request.method,
            f"{upstream}/{path}",
            content=await request.body(),
            headers={k: v for k, v in request.headers.items() if k != "host"},
        )
    return response.json()
Enter fullscreen mode Exit fullscreen mode

When to use it

Use a gateway as soon as you have more than two or three services that share clients. It centralizes auth, TLS termination, and rate limiting so each service doesn't reimplement them. The example above is illustrative — for production you'll pick between two deployment models.

Serverless vs. server-based gateway

There are two practical ways to run a gateway, and the choice has bigger operational implications than the choice of vendor.

Serverless gateway (AWS API Gateway, Azure API Management consumption tier, Google Cloud API Gateway, Cloudflare Workers). The cloud provider runs the gateway, you describe routes and policies declaratively, and you pay per request. There are no servers to patch, no capacity to size, and it scales from zero to bursty traffic without you tuning anything. The downsides: per-request pricing gets expensive at high sustained throughput, cold starts add latency on the first request after idle, you're locked into the provider's auth/transform model, and customizing behavior beyond what the console exposes usually means writing Lambda authorizers or Workers — which is its own operational surface. Reach for this when traffic is spiky or modest, when your team is small, or when you're already deep in one cloud and want the integration.

Server-based gateway (Kong, Envoy, Traefik, NGINX, HAProxy, Spring Cloud Gateway, EC Instances). You run the gateway as long-lived processes — usually as pods in Kubernetes or instances behind a load balancer. You get full control over plugins, request transformation, and observability, predictable cost at sustained scale (you're paying for capacity, not per request), and you can run the same gateway in any environment, including on-prem. The cost is real: you own patching, capacity planning, HA topology, and config rollout. Reach for this when you have sustained high traffic, complex routing or transformation needs, multi-cloud or hybrid requirements, or an existing platform team that already operates similar infrastructure.

# Kong declarative config example — server-based gateway
_format_version: "3.0"
services:
  - name: order-service
    url: http://order-service.default.svc.cluster.local:8000
    routes:
      - name: orders
        paths: ["/orders"]
    plugins:
      - name: jwt
      - name: rate-limiting
        config:
          minute: 100
Enter fullscreen mode Exit fullscreen mode

A rough rule of thumb: under a few hundred requests per second with unpredictable traffic, the serverless model usually wins on total cost of ownership. Above that, with steady load, the server-based model wins on per-request cost and control. Whichever you pick, treat the gateway config as code — version it, review it, and deploy it through a pipeline.

Tradeoffs

The gateway is a single point of failure and a deployment bottleneck regardless of which model you choose. Every team that needs a new route has to coordinate with whoever owns the gateway config. Keep the gateway dumb — routing, auth, rate limiting — and push business logic into the services themselves. The moment you find yourself writing if-statements about specific user IDs in the gateway, you've made it part of your application.


2. Service Discovery

How it works

In a microservices system, service instances come and go — autoscaling spins them up, deployments roll them, failures kill them. Hardcoding http://order-service-prod-3:8000 won't survive the first deploy. Service discovery solves this with a registry: services register themselves on startup, and clients ask the registry "where is the order service right now?"

import httpx
import random

class ServiceRegistry:
    """Client-side discovery against Consul, etcd, or Kubernetes DNS."""

    def __init__(self, registry_url: str):
        self.registry_url = registry_url
        self._cache: dict[str, list[str]] = {}

    async def resolve(self, service_name: str) -> str:
        if service_name not in self._cache:
            async with httpx.AsyncClient() as client:
                resp = await client.get(f"{self.registry_url}/v1/services/{service_name}")
                self._cache[service_name] = resp.json()["instances"]
        # Simple client-side load balancing
        return random.choice(self._cache[service_name])

# In practice, refresh the cache periodically and remove failed instances
Enter fullscreen mode Exit fullscreen mode

When to use it

You need service discovery the moment you have more than one instance of any service, or as soon as you're deploying to a dynamic environment (Kubernetes, ECS, Nomad). If you're on Kubernetes you get this for free — DNS resolves order-service.default.svc.cluster.local to a healthy pod. Don't build your own unless you have a reason.

Tradeoffs

Client-side discovery (each service queries the registry) reduces a network hop but spreads discovery logic everywhere. Server-side discovery (a load balancer fronts each service) is simpler for clients but adds a hop and another component to operate. On Kubernetes, the platform handles this — don't fight it.


3. Circuit Breaker

How it works

When a downstream service starts failing, the worst thing your service can do is keep hammering it with retries. You exhaust your own threads, queue up requests that will never succeed, and the failure cascades upstream until your whole system is down.

A circuit breaker watches the failure rate of calls to a downstream service. When failures cross a threshold, it "opens" — for a cooldown period, calls fail fast without ever hitting the downstream. After the cooldown, it lets a small number of test requests through; if they succeed, it closes again.

import time
from enum import Enum

class State(Enum):
    CLOSED = "closed"      # Normal operation
    OPEN = "open"          # Failing fast
    HALF_OPEN = "half_open"  # Testing recovery

class CircuitBreaker:
    def __init__(self, failure_threshold: int = 5, recovery_timeout: int = 30):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.failure_count = 0
        self.last_failure_time = 0.0
        self.state = State.CLOSED

    def call(self, func, *args, **kwargs):
        if self.state == State.OPEN:
            if time.time() - self.last_failure_time > self.recovery_timeout:
                self.state = State.HALF_OPEN
            else:
                raise RuntimeError("Circuit breaker is OPEN — failing fast")

        try:
            result = func(*args, **kwargs)
            self._on_success()
            return result
        except Exception:
            self._on_failure()
            raise

    def _on_success(self):
        self.failure_count = 0
        self.state = State.CLOSED

    def _on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold:
            self.state = State.OPEN
Enter fullscreen mode Exit fullscreen mode

When to use it

Use a circuit breaker around any synchronous call to a service you don't control fully — third-party APIs, services owned by another team, anything where a slow response from them shouldn't take you down. Pair it with timeouts (always set timeouts on outbound calls) and bulkheads (limit how many threads any one downstream can consume).

Tradeoffs

A circuit breaker shifts the failure from "slow timeout" to "fast error." That's almost always the right tradeoff, but your callers now need to handle the fast-fail case — usually with a fallback (cached data, a default response, a queued retry). Libraries like resilience4j, Polly, or Hystrix (now in maintenance mode — use resilience4j) implement this properly with metrics built in.


4. Saga — Distributed Transactions Without Distributed Locks

How it works

You can't run a SQL transaction across two services without distributed locks, and you don't want distributed locks. A saga replaces the ACID transaction with a sequence of local transactions, each of which has a defined compensating action that undoes it.

For example, booking a trip might be: reserve hotel → charge card → book flight. If the flight booking fails, the saga runs compensations in reverse: refund the card, release the hotel.

from dataclasses import dataclass
from typing import Callable

@dataclass
class SagaStep:
    name: str
    action: Callable
    compensation: Callable

class Saga:
    def __init__(self, steps: list[SagaStep]):
        self.steps = steps

    def execute(self, context: dict):
        completed = []
        try:
            for step in self.steps:
                result = step.action(context)
                completed.append((step, result))
        except Exception as e:
            # Run compensations in reverse order
            for step, result in reversed(completed):
                try:
                    step.compensation(context, result)
                except Exception as comp_err:
                    # Compensations should be idempotent and logged loudly
                    log.error(f"Compensation {step.name} failed: {comp_err}")
            raise e

# Usage
saga = Saga([
    SagaStep("reserve_hotel", reserve_hotel, release_hotel),
    SagaStep("charge_card",   charge_card,   refund_card),
    SagaStep("book_flight",   book_flight,   cancel_flight),
])
saga.execute({"user_id": "u_123", "trip_id": "t_456"})
Enter fullscreen mode Exit fullscreen mode

Orchestration vs. choreography

Two flavors:

  • Orchestration (above): a central coordinator drives each step. Easier to reason about and debug, but the orchestrator becomes a critical service.
  • Choreography: each service listens for events and decides what to do next. No central point, but the flow is implicit — to understand it, you have to read every subscriber.

Start with orchestration. Move to choreography only when the orchestrator becomes a bottleneck or you have genuinely independent teams owning each step.

Tradeoffs

Sagas give you eventual consistency, not atomicity. There will be moments when a hotel is reserved but the card hasn't been charged yet — your system must tolerate that. Every compensation must be idempotent because it might be retried. And if a compensation itself fails, you need a manual intervention path; don't pretend the system can always self-heal.


5. Asynchronous Messaging

How it works

Synchronous calls couple availability: if Service B is down, Service A's request fails. Async messaging breaks that coupling. Service A publishes an event to a broker (Kafka, RabbitMQ, NATS, SQS), and Service B consumes it whenever it's ready.

# Producer
import json
from kafka import KafkaProducer

producer = KafkaProducer(
    bootstrap_servers=["kafka:9092"],
    value_serializer=lambda v: json.dumps(v).encode(),
    acks="all",        # Wait for all replicas — durability over latency
    enable_idempotence=True,  # Prevents duplicate messages from retries
)

def on_order_placed(order):
    producer.send("orders.placed", {
        "event_id": order.id,
        "user_id": order.user_id,
        "amount": order.amount,
        "timestamp": order.created_at.isoformat(),
    })

# Consumer (in a different service)
from kafka import KafkaConsumer

consumer = KafkaConsumer(
    "orders.placed",
    bootstrap_servers=["kafka:9092"],
    group_id="inventory-service",
    enable_auto_commit=False,   # Commit only after successful processing
    value_deserializer=lambda v: json.loads(v.decode()),
)

for message in consumer:
    try:
        reserve_inventory(message.value)
        consumer.commit()
    except Exception:
        # Don't commit — message will be redelivered
        log.exception("Failed to process order")
Enter fullscreen mode Exit fullscreen mode

When to use it

Reach for async when the consumer doesn't need to respond immediately, when you have multiple consumers for the same event (one event, many subscribers), or when you want to decouple deployment lifecycles. Order placement → inventory update, audit log, fraud check, email notification — all naturally async.

Tradeoffs

Async is harder to debug. A request that used to be one HTTP call is now a chain of events across topics, with retries, dead-letter queues, and ordering caveats. You need distributed tracing (OpenTelemetry) to follow a request across services, and you need to design for at-least-once delivery — consumers must be idempotent because the same message will be redelivered occasionally.


6. Service Mesh

How it works

A service mesh (Istio, Linkerd, Consul Connect) puts a lightweight proxy (a "sidecar") next to every service instance. All traffic in and out flows through the sidecar, which handles mTLS, retries, timeouts, load balancing, observability, and traffic shifting — without any code in your service.

A typical Istio config that routes 10% of traffic to a new version of a service:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: order-service
spec:
  hosts:
    - order-service
  http:
    - route:
        - destination:
            host: order-service
            subset: v1
          weight: 90
        - destination:
            host: order-service
            subset: v2
          weight: 10
Enter fullscreen mode Exit fullscreen mode

When to use it

A service mesh starts paying for itself somewhere around 20–30 services, when the cost of reimplementing mTLS, retries, and observability in every service exceeds the operational cost of running the mesh. Below that, a good HTTP client library and a few Kubernetes primitives will do.

Tradeoffs

A mesh is operationally heavy. You're adding a proxy to every pod (memory, CPU, latency), a control plane to manage, and a new failure mode to debug. The benefit is that all your services get consistent security, retry behavior, and telemetry for free. The cost is real complexity, and it's not the first thing you should reach for.


Decision Matrix

Problem Pattern When to skip it
Clients shouldn't know about your internal services API Gateway You have only one or two services
Service instances are dynamic Service Discovery You're on Kubernetes (you already have it)
Downstream failures shouldn't take you down Circuit Breaker You only make in-process calls
Multi-service transaction without distributed locks Saga The operation fits in a single service's database
Decouple producers from consumers Async messaging You need a synchronous response and can tolerate the coupling
Multiple read patterns over the same data CQRS One model serves all read patterns well
Consistent security, retries, telemetry across many services Service Mesh You have fewer than ~20 services
Real-time data pipeline across services Event streaming (Kafka) Your throughput fits in a queue

Common Mistakes Across All Patterns

1. Splitting by technical layer, not business capability. "Database service," "API service," "validation service" is not microservices — it's a layered monolith spread across machines. Split along business boundaries (orders, payments, inventory), where each service owns a coherent capability and its data.

2. Sharing a database between services. The moment two services read or write the same tables, you've coupled their schemas and deployment lifecycles. They're now one service in two processes. Each service owns its data; everything else goes through its API.

3. Synchronous chains. Service A calls B calls C calls D, all synchronously. The latency multiplies, the failure modes compound, and any one slow service freezes the whole chain. Break the chain with async messaging where you can.

4. No distributed tracing. Once you have more than three services, you cannot debug production without trace IDs that propagate through every call and event. Add OpenTelemetry on day one — retrofitting it later is painful.

5. Inconsistent error contracts. Each service returns errors in a different shape. Standardize early — pick one error format (RFC 7807 Problem Details is a good default) and enforce it across every service.

6. Microservices for a team of three. The cost of microservices is mostly organizational: independent deploys, on-call rotations, service ownership. If you have one team, a well-structured monolith ships faster and breaks less. Adopt microservices when your team structure starts forcing the split, not before.


Conclusion

Microservices are a response to organizational scale, not a default architecture. The patterns above exist because distributed systems introduce problems monoliths don't have — partial failure, eventual consistency, network latency — and each pattern solves one of those problems at the cost of complexity somewhere else.

Pick the smallest set of patterns that solves your actual problems. Add the next one only when the pain of not having it exceeds the cost of operating it. A two-service system with an API gateway and good observability beats a six-service system with a mesh, a saga orchestrator, and a CQRS pipeline that the team can't fully explain.

To recap:

  • API Gateway when clients shouldn't see your internal topology.
  • Service Discovery when instances move (and use what your platform gives you).
  • Circuit Breaker around every synchronous call you don't fully control.
  • Saga when a transaction has to span services — orchestration first, choreography later.
  • Async messaging to decouple lifecycles and fan out events.
  • Service Mesh once the cost of reimplementing security and observability per-service exceeds the cost of running the mesh.

Start with the decision matrix. Read the source material for whichever pattern you adopt — Sam Newman's Building Microservices, Chris Richardson's microservices.io, and the original Netflix and Amazon engineering blogs are far more honest about tradeoffs than most vendor documentation.


I wrote this after seeing a team try to implement a Service Mesh for a system with only 3 services. What’s the most 'over-engineered' setup you’ve encountered lately?

Top comments (1)

Collapse
 
shoumik_chakravarty profile image
Shoumik Chakravarty

Which of these patterns do you think is the biggest 'trap' for teams new to microservices? I’d love to hear your 'distributed monolith' horror stories in the comments!