Gokul G.K.

Posted on Dec 28, 2025

Let's talk about service discovery, service registry, and service mesh.

#microservices #serviceregistry #servicemesh #servicediscovery

How Do Our Microservices Find Each Other and Communicate With Each Other?

Imagine a system with over 50 microservices, each running on a different machine.
Instances scale up and down, IPs change, services crash and restart.

Yet somehow, everything still works.
So the natural questions arise:

How do microservices find each other?
How do they know where an upstream service is running?
How do they communicate securely and reliably at scale?

To answer these questions, we need to understand three key concepts:

Service Registry
Service Discovery
Service Mesh

The Core Problem

In a distributed microservices system:

Services don’t have fixed IP addresses
Instances are created and destroyed dynamically
Hardcoding endpoints is impossible
Manual configuration does not scale

So we need a dynamic, automated way for services to:

Know what services exist
Find where they are currently running
Communicate securely and reliably

Service registry

It's a database that will store network information for all registered instances. It's a key part of service discovery; the registry must be up-to-date and highly available.

Typical service registries (pre-Kubernetes):

Eureka
Consul
Zookeeper

Service Registration

Service instances need to be registered and deregistered with the service registry in all types of service discovery.

Self Registration

The service instance itself handles registration and deregistration with the service registry. It registers its address with the registry and immediately deregisters itself when the instance terminates.

Third-party Registration

The third-party registration pattern prevents service instances from self-registering. Instead, the registration and deregistration processes are managed by the service registrar, a component that monitors changes in the set of running instances.

Service Discovery

Service discovery is the mechanism by which a service finds the network location of another service at runtime.

The primary goal of service discovery is to simplify communication between microservices, enabling automatic connection through a dedicated system. Instead of hardcoding the network address, we tend to ask where the service is.

How does it work?

Service discovery involves three key functions: it allows an instance to register and announce, "I’m here!" It also provides a method to locate the service after registration. Finally, it updates any changes in the service's location.

In more detailed steps (a generalized version)

Service Instance Starts - A microservice instance starts on a VM. It gets an IP address dynamically (payment service)
Service Registers Itself - The service registers with a service registry. It sends: Service name, IP address, Port, Health info payment-service → 10.0.1.12:8080
Registry Stores Metadata - The registry maintains a list of healthy service instancesPeriodic heartbeats update health(orange box)
Client Requests a Service- A client service (order service) requests a call to the payment service. It queries the registry
Service Discovery Happens - Registry responds with:[10.0.1.12, 10.0.1.15, 10.0.1.18]
Load Balancing - Client picks one instance, makes the network call. This is client-side discovery
Health Updates - Failed instances are removed, Registry stays up-to-date

There are two main service discovery patterns: client-side discovery and server-side discovery.

Client-Side Discovery Pattern

In the client-side discovery pattern, the client/consumer service (in this case, the order service) is responsible for determining the network location and load-balancing the request between them.

It has an advantage because it saves an additional step that would have been needed with a dedicated load balancer. However, it also has a disadvantage as the Service Consumer must implement the load-balancing logic.

Server-side Discovery Pattern

In the server-side discovery pattern, there is an intermediary layer of router/load balancer. The client makes a discovery and communicates through this load balancer, which acts as an orchestrator.

In this approach, a dedicated component called the Load Balancer handles the task of load balancing. This is a major advantage, as it simplifies the responsibilities of the Service Consumer, which no longer has to manage the lookup process. Consequently, there's no need to implement discovery logic separately for each programming language or framework used by the Service Consumer.
It is essential to note that we must either set up and manage the Load Balancer ourselves or ensure it is already available in the deployment environment.

In Kubernetes-based systems, you usually do not need a separate service registry or discovery tool.
But Discovery Alone Is Not Enough

Even with discovery solved, new challenges appear:

How do services communicate securely?
How do we handle retries and failures?
How do we prevent cascading outages?
How do we apply consistent policies across services written in different languages?
This is where service mesh comes in.

Service mesh

A Service Mesh is a method for managing communication between individual services that comprise modern applications in a microservice-based architecture. It helps to add security, reliability, and observability.
Service Mesh enables security, observability, traffic management, and reliability.

The dedicated infrastructure layer of a service mesh complements an API gateway, serving a distinct purpose. While a service mesh manages communication between the various services within a system, an API gateway separates the internal workings of that system from the API that is presented to clients, which can include other systems within the organization or external clients.

The distinction between an API gateway and a service mesh is often described in terms of north-south (for the API gateway) versus east-west (for the service mesh) communication. However, this characterization may not fully capture the nuances of their respective functions.

How does a service mesh work?

Architecture: Data Plane vs. Control Plane
A service mesh is split into two distinct parts:

1. The Data Plane (The Muscle)
This consists of the thousands of intelligent sidecar proxies deployed throughout your cluster.

The Data Plane handles the actual data packets, performs load balancing, terminates TLS (encryption), enforces rate limits, and collects metrics (including latency and error rates). These proxies are highly optimized to add minimal latency (milliseconds).

2. The Control Plane (The Brain)
This is a centralized management server. The proxies are stateless; they require instructions on what to do.

The control plane manages and configures the proxies. The Control Plane then pushes this configuration to all the Data Plane proxies in real-time. It often acts as a Certificate Authority (CA), rotating security certificates for the proxies to use.

Step-by-Step: How a Request Works

Imagine Service A (Frontend) wants to call Service B (Inventory).

Outbound Intercept: Service A sends a request to Service B. The network rules trap this packet and redirect it to Service A's local sidecar proxy.

Discovery & Load Balancing: The sidecar asks, "Where is Service B?" It examines its configuration (from the Control Plane) and selects the best instance of Service B (e.g., the one with the lowest latency).

Encryption (mTLS): The sidecar encrypts the request using a mutual TLS certificate to ensure no one else can read it.

Transmission: The encrypted request travels over the network to the machine where Service B is running.

Inbound Intercept: The request is caught by Service B's sidecar proxy.

Decryption & Check: Service B's sidecar decrypts the packet and verifies that Service A is authorized to communicate with Service B (Access Control).

Delivery: If allowed, the sidecar forwards the clean request to the actual Service B application.