Covers: Client-Side vs Server-Side Discovery, Service Registries, Service Mesh (Istio/Envoy), Kubernetes DNS
The Problem That Didn't Exist in the Monolith
In a monolith, calling another module is simple: OrderService.create(data). It's a function call. The compiler resolves the address. It always works (assuming the code compiles).
In microservices, "calling another service" means: where is it, right now, on the network?
This sounds trivial until you consider what's actually happening in a production environment:
- Services run on dynamically allocated IPs (containers get new IPs every restart)
- Services scale up and down constantly (auto-scaling adds/removes instances every few minutes)
- Services deploy multiple times per day (new versions get new instances)
- A single logical service might have 50 running instances across multiple servers
Hardcoding IP addresses is impossible. Even a config file with IPs would be stale within minutes. This is the problem service discovery solves.
The Two Models of Service Discovery
Client-Side Discovery
The calling service queries a service registry directly, gets a list of healthy instances, and load-balances between them itself.
Order Service wants to call Payment Service:
1. Order Service → Service Registry: "Where is Payment Service?"
2. Service Registry → returns: [10.0.1.5:8080, 10.0.1.6:8080, 10.0.1.7:8080]
3. Order Service → picks one (round-robin/random) → 10.0.1.6:8080
4. Order Service → calls Payment Service directly at 10.0.1.6:8080
┌──────────────┐ 1. "Where's Payment Service?" ┌──────────────┐
│ Order Service │ ───────────────────────────────────► │ Registry │
│ │ ◄─────────────────────────────────── │ (Eureka) │
│ │ 2. [list of healthy instances] └──────────────┘
│ │
│ │ 3. Direct call (load-balanced ┌──────────────┐
│ │ ───── client-side) ──────────────────►│Payment Service│
└──────────────┘ │ (instance 2) │
└──────────────┘
Real example: Netflix Eureka
Every service registers itself with Eureka on startup:
@EnableEurekaClient
@SpringBootApplication
public class PaymentServiceApplication {
// On startup, this service registers with Eureka:
// "I'm payment-service, I'm at 10.0.1.6:8080, I'm healthy"
}
Other services query Eureka and use Ribbon (Netflix's client-side load balancer) to pick an instance and call it directly.
Advantages:
- No extra network hop (client calls the service directly)
- Client has full control over load-balancing strategy
Disadvantages:
- Every service needs discovery client logic — coupling every service to the registry's API and SDK
- Multi-language environments need a discovery library for each language
Server-Side Discovery
The calling service makes a request to a load balancer, which queries the registry and routes the request. The caller never sees individual instance addresses.
Order Service wants to call Payment Service:
1. Order Service → calls "payment-service.internal" (a fixed name)
2. Load Balancer → queries registry for healthy Payment instances
3. Load Balancer → routes to one instance
4. Response flows back through the Load Balancer to Order Service
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Order Service │ ──── "payment- │ Load Balancer │ ── queries ──────►│ Registry │
│ │ service" ─────►│ (AWS ALB) │ ◄── instance list─│ (AWS ECS) │
└──────────────┘ └───────┬──────┘ └──────────────┘
│
▼
┌──────────────┐
│Payment Service│
│ (instance 2)│
└──────────────┘
Real example: AWS ALB + ECS
ECS (container orchestration) automatically registers/deregisters container instances with the ALB's target group as they start/stop. The Order Service simply calls a fixed DNS name — payment-service.internal — and AWS handles everything else.
Advantages:
- Calling services need zero discovery logic — just call a fixed name
- Language-agnostic — works the same for Java, Python, Go, anything
- Centralized load-balancing logic, easier to update
Disadvantages:
- Extra network hop (through the load balancer)
- The load balancer itself must be highly available
Service Registry: The Source of Truth
Whichever model you use, there's a registry maintaining the live list of service instances. Popular implementations:
Consul (HashiCorp):
- Service registration via agent on each host
- Built-in health checking
- DNS and HTTP interfaces for querying
- Multi-datacenter support
etcd:
- Distributed key-value store (also used as Kubernetes' backing store)
- Services write their address to a key; watchers detect changes
- Strongly consistent (uses Raft consensus)
ZooKeeper:
- One of the oldest solutions (used by Kafka, Hadoop for coordination)
- Strong consistency guarantees
- More operationally complex than Consul/etcd
The registration lifecycle:
1. Service instance starts up
2. Registers itself: "I'm payment-service-7, at 10.0.1.6:8080, healthy"
3. Periodically sends heartbeats: "still alive"
4. Registry monitors heartbeats
5. If heartbeats stop (instance crashed) → registry marks instance unhealthy
6. After grace period → instance removed from registry entirely
Deregistration on graceful shutdown:
1. Instance receives SIGTERM (shutdown signal)
2. Instance explicitly deregisters from registry FIRST
3. Instance finishes in-flight requests (connection draining)
4. Instance exits
→ Other services stop routing new requests to it immediately,
rather than waiting for heartbeat timeout (which could take 30+ seconds)
This deregistration-on-failure detail matters a lot in interviews — the difference between graceful shutdown (instant deregistration) and crash (timeout-based detection) determines how quickly your system "heals" after instance churn.
Kubernetes: Service Discovery Built In
If you're running Kubernetes, you largely don't think about service discovery — it's built into the platform via DNS.
# Define a Service — a stable name for a set of pods
apiVersion: v1
kind: Service
metadata:
name: payment-service
spec:
selector:
app: payment # Matches pods with label app=payment
ports:
- port: 8080
Any pod in the cluster can now call:
http://payment-service:8080
Kubernetes DNS (CoreDNS) resolves "payment-service"
→ to the Service's virtual IP (ClusterIP)
→ kube-proxy load-balances to one of the matching pod IPs
How it works under the hood:
- Kubernetes maintains a list of "Endpoints" — the actual pod IPs matching the Service's selector
- As pods are created/destroyed (scaling, deployments, crashes), the Endpoints list updates automatically
-
kube-proxyon each node maintains iptables/IPVS rules that load-balance traffic to current Endpoints - DNS resolution + load balancing happens transparently — application code just calls
http://payment-service:8080
This is server-side discovery, fully managed by the platform. It's a major reason Kubernetes became the dominant orchestration platform — service discovery, one of the hardest microservices problems, is solved by default.
Service Mesh: Discovery Is Just the Beginning
Once you have many services, you face a recurring set of cross-cutting problems for every service-to-service call:
- How do I discover the target service? (discovery)
- Is this connection encrypted? (mTLS)
- What if the call fails — retry? How many times?
- What if the target is overloaded — circuit break?
- How do I trace this request across services?
- How do I roll out a new version to 5% of traffic first (canary)?
Implementing all of this inside every service's application code means every team reimplements (or imports a library for) the same logic, in every language they use.
A service mesh moves all of this into infrastructure — typically a sidecar proxy deployed alongside every service instance.
┌─────────────────────────┐ ┌─────────────────────────┐
│ Order Service Pod │ │ Payment Service Pod │
│ ┌───────────┐ ┌───────┐│ │┌───────┐ ┌───────────┐ │
│ │ Order │ │ Envoy ││ ││ Envoy │ │ Payment │ │
│ │ Container │◄┤Sidecar├┼─────┼┤Sidecar│◄─┤ Container │ │
│ └───────────┘ └───────┘│ │└───────┘ └───────────┘ │
└─────────────────────────┘ └─────────────────────────┘
Application code never talks to network directly —
Envoy sidecar intercepts ALL traffic in and out
Every request from Order Service to Payment Service actually goes:
Order Container → Order's Envoy sidecar → Payment's Envoy sidecar → Payment Container
The application code is unaware — it just makes a normal HTTP call to localhost or a service name. The sidecar handles everything else.
What Istio/Envoy Handles Transparently
mTLS (mutual TLS):
Every connection between services is automatically encrypted and authenticated — without any application code changes. Each service gets a cryptographic identity.
Retries with backoff:
# Istio VirtualService config — no app code changes needed
retries:
attempts: 3
perTryTimeout: 2s
retryOn: 5xx,connect-failure
Circuit breaking:
trafficPolicy:
connectionPool:
http:
maxRequestsPerConnection: 10
outlierDetection:
consecutive5xxErrors: 5
interval: 30s
baseEjectionTime: 30s
# After 5 consecutive 5xx errors, eject this instance for 30s
Traffic splitting (canary deployments):
http:
- route:
- destination:
host: payment-service
subset: v1
weight: 90
- destination:
host: payment-service
subset: v2 # new version
weight: 10 # 10% of traffic to test the new version
Distributed tracing:
Every sidecar automatically adds trace headers and reports spans to Jaeger/Zipkin — without any application instrumentation.
Service Mesh vs API Gateway: The Confusion Cleared Up
These get confused constantly. Here's the clean distinction:
External traffic
│
▼
┌──────────────┐
│ API Gateway │ ← North-South traffic
│ (Kong, ALB) │ (outside world → your cluster)
└──────┬───────┘
│
┌─────────────┼─────────────┐
▼ ▼ ▼
[Service A]──►[Service B]──►[Service C]
↑─────────────↑─────────────↑
Service Mesh (Istio) ← East-West traffic
(service ←→ service, (inside your cluster)
all sidecar-mediated)
API Gateway: Handles North-South traffic — requests entering your system from outside (browsers, mobile apps, partner integrations). Concerns: public authentication, rate limiting per API key, request transformation for external contracts.
Service Mesh: Handles East-West traffic — requests between your internal services. Concerns: mTLS, internal retries/circuit breaking, service-to-service authorization, internal observability.
They're complementary, not competing. A request might pass through the API Gateway once (entering the system) and then through the service mesh multiple times (as it's processed by several internal services).
The Cost of a Service Mesh
Service meshes solve real problems, but they're not free:
Resource overhead: Every pod now runs an extra sidecar container — additional CPU/memory per service instance. At thousands of pods, this is a meaningful infrastructure cost.
Latency overhead: Every call now passes through two sidecars (sender's and receiver's) instead of going directly. Typically adds 1-3ms per hop — usually negligible, but compounds across deep call chains.
Operational complexity: Istio itself is a complex distributed system. Debugging "why is this request slow" now involves understanding sidecar configuration, not just application code.
The honest guidance: Service meshes make sense when you have dozens to hundreds of services and the cross-cutting concerns (mTLS, retries, observability) are genuinely painful to implement per-service. For 5-10 services, the operational cost of running Istio often exceeds the benefit — application-level libraries (like Resilience4j for circuit breaking, covered in Topic 18) may be simpler.
Interview Scenario: "Client-Side vs Server-Side Discovery — Which Would You Choose?"
"It depends on the team's technology diversity and operational maturity. If the organization is running Kubernetes, server-side discovery via Kubernetes Services is essentially free — DNS-based, language-agnostic, and requires zero application code. I'd default to that.
Client-side discovery (like Eureka) made more sense in the pre-Kubernetes era, or in environments without a unified orchestration platform, because it avoids the extra network hop through a load balancer. But it requires every service — in every language — to integrate a discovery client, which becomes a maintenance burden in polyglot environments.
For the broader cross-cutting concerns — retries, circuit breaking, mTLS — I'd evaluate whether a service mesh is justified by the number of services. Below ~10-15 services, I'd handle these concerns with application-level libraries. Beyond that, the consistency and language-agnostic benefits of a service mesh like Istio typically outweigh its operational and latency overhead."
Key Takeaways
- Service discovery solves the problem of finding service instances in a dynamic environment where IPs change constantly.
- Client-side discovery (Eureka): caller queries registry, load-balances itself. No extra hop, but requires discovery libraries per language.
- Server-side discovery (AWS ALB, Kubernetes Services): caller hits a fixed name, infrastructure routes. Language-agnostic, adds one hop.
- Kubernetes provides server-side discovery via DNS automatically — a major reason for its dominance.
- Service mesh (Istio/Envoy) moves cross-cutting concerns — mTLS, retries, circuit breaking, tracing, canary routing — into sidecar proxies, out of application code.
- API Gateway handles North-South traffic (external → internal). Service Mesh handles East-West traffic (internal → internal). They're complementary.
- Service meshes add real overhead (resources, latency, complexity) — justify their use by service count and operational pain, not by trend-following.
What's Next
Topic 18 closes Day 6 with Fault Tolerance Patterns — Circuit Breakers, Retries with Exponential Backoff and Jitter, Bulkheads, and Timeouts. The patterns that determine whether a single failing service takes down your entire platform, or fails gracefully and recovers on its own.
Tags: system-design microservices service-mesh kubernetes backend distributed-systems interview-prep
Top comments (0)