gRPC is a high-performance RPC framework widely used in microservices architectures. Because gRPC is built on HTTP/2, multiple RPCs are multiplexed over a single long-lived TCP connection. This makes traditional load balancing approaches such as placing gRPC services behind a simple L4 or L7 load balancer ineffective, often resulting in all traffic being sent to a single backend instance.
gRPC addresses this problem by supporting client-side load balancing natively. Instead of relying on an external proxy, the client itself discovers backend instances and decides where each RPC should be sent.
In this article, we’ll explore gRPC client-side load balancing in practice. We’ll build two simple Go services: an API Gateway that exposes a REST endpoint, and an Order Service that implements a GetOrder gRPC RPC. The Order Service will return both the order ID and the instance (container/pod) that handled the request, making load-balancing behavior easy to observe.
We’ll start by running the system using Docker Compose, observe how requests are distributed across multiple Order Service containers, and see what happens when instances go down or come back up. We’ll then configure retries to handle transient failures. Finally, we’ll deploy the same setup to a local Kubernetes cluster using KIND to understand how gRPC load balancing behaves in a Kubernetes environment.
Table of Contents
- Creating the Services
- gRPC Client-Side Load Balancing Basics
- Running with Docker Compose
- Client-Side Retries
- Running on Kubernetes with KIND
- Alternatives to Client-Side Load Balancing
- Conclusion
Creating the Services
Directory Structure
We’ll keep the code for both services minimal and focused purely on gRPC. Below is the directory structure we’ll be building:
.
├── api-gateway/
│ ├── cmd/
│ ├── proto/
│ └── internal/
│ ├── grpc/
│ └── http/
├── order-service/
│ ├── cmd/
│ ├── proto/
│ └── internal/
│ ├── grpc/
│ └── http/
├── k8s/
└── docker-compose.yaml
We use a REST endpoint in the API Gateway purely as a convenience layer. It allows us to trigger gRPC requests using simple
curlcommands and observe load-balancing behavior without requiring a dedicated gRPC client tool. This setup also mirrors real-world microservice architectures, where API gateways often expose HTTP/REST APIs while communicating with backend services over gRPC.
API Gateway
Let's start with the API Gateway and bootstrap a Go project:
mkdir api-gateway
cd api-gateway
go mod init github.com/sagarmaheshwary/go-grpc-load-balancing/api-gateway
Create the proto definition in proto/order/order.proto:
syntax = "proto3";
package order;
option go_package = "github.com/SagarMaheshwary/go-grpc-load-balancing/api-gateway/proto/order";
service OrderService {
rpc GetOrder(GetOrderRequest) returns (GetOrderResponse);
}
message GetOrderRequest {
string id = 1;
}
message GetOrderResponse {
string id = 1;
string served_by = 2;
}
To generate Go code from proto files, you’ll need to install:
- protoc compiler
- Go plugins for protobuf and gRPC:
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
Now we can generate the code with below command:
protoc --go_out=. --go_opt=paths=source_relative --go-grpc_out=. --go-grpc_opt=paths=source_relative ./proto/order/order.proto
Next, let’s create a minimal gRPC client that establishes a non-TLS connection to the server in internal/grpc/client.go:
package grpc
import (
"context"
"log"
proto "github.com/sagarmaheshwary/go-grpc-load-balancing/api-gateway/proto/order"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
)
func NewClient(ctx context.Context, url string) (*OrderClient, *grpc.ClientConn, error) {
conn, err := grpc.NewClient(url,
grpc.WithTransportCredentials(insecure.NewCredentials()),
)
if err != nil {
log.Println("Order gRPC client failed to connect", err)
return nil, nil, err
}
log.Println("Order gRPC client connected on " + url)
client := NewOrderClient(proto.NewOrderServiceClient(conn))
return client, conn, nil
}
and the handler for the GetOrder RPC in internal/grpc/handler.go:
package grpc
import (
"context"
proto "github.com/sagarmaheshwary/go-grpc-load-balancing/api-gateway/proto/order"
)
type OrderClient struct {
client proto.OrderServiceClient
}
func NewOrderClient(c proto.OrderServiceClient) *OrderClient {
return &OrderClient{
client: c,
}
}
func (o *OrderClient) GetOrder(ctx context.Context, in *proto.GetOrderRequest) (*proto.GetOrderResponse, error) {
res, err := o.client.GetOrder(ctx, in)
if err != nil {
return nil, err
}
return res, nil
}
Now let’s create a simple HTTP server using gin for the REST API in internal/http/server.go:
package http
import (
"net/http"
"github.com/gin-gonic/gin"
"github.com/sagarmaheshwary/go-grpc-load-balancing/api-gateway/internal/grpc"
)
type HTTPServer struct {
URL string
Server *http.Server
}
func NewServer(url string, grpcClient *grpc.OrderClient) *HTTPServer {
gin.SetMode(gin.ReleaseMode)
r := gin.New()
handler := &HTTPHandler{
GRPCClient: grpcClient,
}
r.GET("/livez", handler.Livez)
r.GET("/readyz", handler.Readyz)
r.GET("/orders/:id", handler.GetOrder)
return &HTTPServer{
URL: url,
Server: &http.Server{
Addr: url,
Handler: r,
},
}
}
func (h *HTTPServer) Serve() error {
return h.Server.ListenAndServe()
}
Create the HTTP handlers in internal/http/handler.go:
package http
import (
"net/http"
"github.com/gin-gonic/gin"
"github.com/sagarmaheshwary/go-grpc-load-balancing/api-gateway/internal/grpc"
proto "github.com/sagarmaheshwary/go-grpc-load-balancing/api-gateway/proto/order"
)
type HTTPHandler struct {
GRPCClient *grpc.OrderClient
}
func (h *HTTPHandler) Livez(c *gin.Context) {
c.JSON(http.StatusOK, map[string]string{"status": "ok"})
}
func (h *HTTPHandler) Readyz(c *gin.Context) {
c.JSON(http.StatusOK, map[string]string{"status": "ready"})
}
func (h *HTTPHandler) GetOrder(c *gin.Context) {
orderID := c.Param("id")
if orderID == "" {
c.JSON(http.StatusBadRequest, map[string]string{"error": "missing order id"})
return
}
req := &proto.GetOrderRequest{
Id: orderID,
}
res, err := h.GRPCClient.GetOrder(c.Request.Context(), req)
if err != nil {
// For demo simplicity, map all gRPC errors to 500.
// In real systems, you'd inspect status.Code(err).
c.JSON(http.StatusInternalServerError, map[string]string{"error": err.Error()})
return
}
c.JSON(http.StatusOK, res)
}
Apart from the /orders endpoint, we also expose /livez and /readyz endpoints for Kubernetes health probes. We’ll discuss their role in more detail later, when running the services on KIND.
Now let’s bootstrap the gRPC client and HTTP server in cmd/server/main.go, which serves as the API Gateway’s entrypoint:
package main
import (
"context"
"errors"
"log"
"net/http"
"os"
"os/signal"
"time"
"github.com/sagarmaheshwary/go-grpc-load-balancing/api-gateway/internal/grpc"
httpserver "github.com/sagarmaheshwary/go-grpc-load-balancing/api-gateway/internal/http"
)
var (
httpServerURL = "0.0.0.0:4000"
grpcOrderServiceURL = "order-service:5000"
)
func main() {
ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt)
defer stop()
grpcClient, conn, err := grpc.NewClient(ctx, grpcOrderServiceURL)
if err != nil {
//An API Gateway should not crash if the downstream service is down.
//However, for simplicity, we are logging the error and exiting.
log.Fatal(err)
}
defer conn.Close()
httpServer := httpserver.NewServer(httpServerURL, grpcClient)
go func() {
log.Println("Starting HTTP server on", httpServerURL)
err := httpServer.Serve()
if err != nil && !errors.Is(err, http.ErrServerClosed) {
stop()
}
}()
log.Println("API Gateway is running...")
<-ctx.Done()
log.Println("shutting down API Gateway...")
shutdownCtx, cancelShutdown := context.WithTimeout(context.Background(), 3*time.Second)
if err := httpServer.Server.Shutdown(shutdownCtx); err != nil {
log.Fatalf("failed to shutdown HTTP server: %v", err)
}
cancelShutdown()
log.Println("API Gateway stopped")
}
The main function also handles graceful shutdown by listening for the os.Interrupt (SIGINT) signal and properly shutting down the HTTP server and gRPC connection before exiting.
Order Service
The Order Service is similar to the API Gateway, except it exposes a gRPC server along with a lightweight HTTP server for health checks.
Let’s create the Order Service:
cd ..
mkdir order-service
cd order-service
go mod init github.com/sagarmaheshwary/go-grpc-load-balancing/order-service
For
order.proto, we reuse the proto file from the API Gateway, update thego_packagepath, and regenerate the code.
Create the GetOrder handler in internal/grpc/handler.go. This handler simply returns the same order ID along with a ServedBy value. The ServedBy field helps us visually confirm which instance handled the request during load-balancing experiments. It uses the INSTANCE_ID environment variable if present, or falls back to the hostname:
package grpc
import (
"context"
"log"
"os"
proto "github.com/sagarmaheshwary/go-grpc-load-balancing/order-service/proto/order"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
)
type OrderServer struct {
proto.OrderServiceServer
}
func (o *OrderServer) GetOrder(ctx context.Context, in *proto.GetOrderRequest) (*proto.GetOrderResponse, error) {
instanceID := os.Getenv("INSTANCE_ID")
if instanceID == "" {
instanceID, _ = os.Hostname()
}
log.Println("Order request received for ID:", in.Id, "served by instance:", instanceID)
return &proto.GetOrderResponse{
Id: in.Id,
ServedBy: instanceID,
}, nil
}
Next, create the gRPC server in internal/grpc/server.go and register the OrderServer:
package grpc
import (
"log"
"net"
proto "github.com/sagarmaheshwary/go-grpc-load-balancing/order-service/proto/order"
"google.golang.org/grpc"
)
type GRPCServer struct {
Server *grpc.Server
URL string
}
func NewServer(url string) *GRPCServer {
srv := grpc.NewServer()
proto.RegisterOrderServiceServer(srv, &OrderServer{})
return &GRPCServer{
Server: srv,
URL: url,
}
}
func (s *GRPCServer) Serve() error {
lis, err := net.Listen("tcp", s.URL)
if err != nil {
log.Printf("failed to listen on %s: %v", s.URL, err)
return err
}
return s.Server.Serve(lis)
}
We also need an HTTP server to expose health-check endpoints.
internal/http/handler:
package http
import (
"net/http"
"github.com/gin-gonic/gin"
)
func Livez(c *gin.Context) {
c.JSON(http.StatusOK, map[string]string{"status": "ok"})
}
func Readyz(c *gin.Context) {
c.JSON(http.StatusOK, map[string]string{"status": "ready"})
}
internal/http/server.go:
package http
import (
"net/http"
"github.com/gin-gonic/gin"
)
type HTTPServer struct {
URL string
Server *http.Server
}
func NewServer(url string) *HTTPServer {
gin.SetMode(gin.ReleaseMode)
r := gin.New()
r.GET("/livez", Livez)
r.GET("/readyz", Readyz)
return &HTTPServer{
URL: url,
Server: &http.Server{
Addr: url,
Handler: r,
},
}
}
func (h *HTTPServer) Serve() error {
return h.Server.ListenAndServe()
}
Finally, create the main function in cmd/server/main.go:
package main
import (
"context"
"errors"
"log"
"net/http"
"os"
"os/signal"
"time"
grpcserver "github.com/sagarmaheshwary/go-grpc-load-balancing/order-service/internal/grpc"
httpserver "github.com/sagarmaheshwary/go-grpc-load-balancing/order-service/internal/http"
"google.golang.org/grpc"
)
var (
httpServerURL = "0.0.0.0:4000"
grpcServerURL = "0.0.0.0:5000"
)
func main() {
ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt)
defer stop()
httpServer := httpserver.NewServer(httpServerURL)
go func() {
log.Println("Starting HTTP server on", httpServerURL)
err := httpServer.Serve()
if err != nil && !errors.Is(err, http.ErrServerClosed) {
stop()
}
}()
grpcServer := grpcserver.NewServer(grpcServerURL)
go func() {
log.Println("Starting gRPC server on", grpcServerURL)
if err := grpcServer.Serve(); err != nil && errors.Is(err, grpc.ErrServerStopped) {
stop()
}
}()
log.Println("Order Service is running...")
<-ctx.Done()
log.Println("shutting down Order Service...")
grpcServer.Server.GracefulStop()
shutdownCtx, cancelShutdown := context.WithTimeout(context.Background(), 3*time.Second)
if err := httpServer.Server.Shutdown(shutdownCtx); err != nil {
log.Fatalf("failed to shutdown HTTP server: %v", err)
}
cancelShutdown()
log.Println("Order Service stopped")
}
gRPC Client-Side Load Balancing Basics
In client-side load balancing, the gRPC client resolves multiple backend addresses, maintains a sub-connection to each backend, and uses a load balancing policy to decide which backend handles each RPC. The server is completely unaware of the load balancing; all logic lives inside the gRPC client.
This model is particularly well-suited for Kubernetes and modern microservices because it is more efficient, avoiding an extra proxy hop and works well with long-lived HTTP/2 connections, which gRPC relies on. For many gRPC workloads, this approach is preferred when you want direct client-to-backend communication and simple, transparent load balancing.
Service Discovery via DNS
The simplest and most common form of client-side load balancing uses DNS-based service discovery. When we configure the gRPC client with a DNS target like dns:///order-service:5000, the DNS resolver returns multiple A records, with each record representing a backend instance (pod or container). The gRPC client then creates a SubConn for each of these addresses, and the configured load balancing policy distributes RPCs across them.
In Docker Compose, this works because multiple containers can share the same hostname, and Docker DNS will return all container IPs. In Kubernetes, we achieve the same behavior using Headless Services (clusterIP: None), where each pod gets its own DNS A record that the gRPC client can discover and use for load balancing.
Load Balancing Policies in gRPC
gRPC supports multiple load balancing policies. The most important ones are:
Pick First (Default)
- Client picks one backend and sticks to it
- No real load balancing
- Bad fit for scalable systems
Round Robin
- RPCs are distributed across all healthy backends
- Most commonly used policy
- Ideal for stateless services
Weighted load balancing allows sending different proportions of traffic to different backends (for example, 90% to v1 and 10% to v2).
This is not supported with plain DNS-based client-side load balancing. In gRPC, weighted policies are typically implemented using xDS and Envoy and are commonly used for canary deployments and gradual rollouts.
We will use Round Robin policy for this article.
To enable it, update the internal/grpc/client.go in API Gateway:
conn, err := grpc.NewClient(url,
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithDefaultServiceConfig(`{
"loadBalancingPolicy":"round_robin"
}`),
)
See this proto to find out all the supported values for service config.
Running with Docker Compose
Dockerfile and Docker Compose
We start by creating a Dockerfile for each service:
FROM golang:1.25 AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
go build -o /app/main ./cmd/server/main.go
FROM alpine:3.22 AS production
WORKDIR /app
COPY --from=builder /app/main .
EXPOSE 4000 5000
CMD ["./main"]
Next, we create a docker-compose.yml in the root directory:
services:
api-gateway:
build:
context: ./api-gateway
target: production
ports:
- 4000:4000
order-service-1:
build:
context: ./order-service
target: production
hostname: order-service
environment:
- INSTANCE_ID=order-service-1
order-service-2:
build:
context: ./order-service
target: production
hostname: order-service
environment:
- INSTANCE_ID=order-service-2
order-service-3:
build:
context: ./order-service
target: production
hostname: order-service
environment:
- INSTANCE_ID=order-service-3
Here, we are creating three order-service containers and assigning them the same hostname order-service. Docker DNS will return all container IPs when queried, enabling client-side load balancing.
Testing Load Balancing
We can now start the services:
docker compose up
Making a request to the API Gateway:
curl localhost:4000/orders/1
Should return a response like:
{ "id": "1", "served_by": "order-service-1" }
Repeated requests will be distributed across all containers:
{"id":"1","served_by":"order-service-2"}
{"id":"1","served_by":"order-service-3"}
{"id":"1","served_by":"order-service-1"}
Each container also logs the request it serves:
order-service-1-1 | Order request received for ID: 1 served by instance: order-service-1
order-service-2-1 | Order request received for ID: 1 served by instance: order-service-2
order-service-3-1 | Order request received for ID: 1 served by instance: order-service-3
Inspecting gRPC Internal Logs
For a deeper look at client-side load balancing, we can enable verbose gRPC logs:
environment:
- GRPC_GO_LOG_SEVERITY_LEVEL=info
- GRPC_GO_LOG_VERBOSITY_LEVEL=99
On the first request, the gRPC client resolves DNS and establishes one SubConn per backend:
Subchannel picks a new address "172.19.0.5:5000"
Subchannel picks a new address "172.19.0.2:5000"
Subchannel picks a new address "172.19.0.4:5000"
This confirms that the client discovered all three containers and created independent connections to each.
Stopping and Restarting Backends
Let’s stop one backend container:
docker compose stop order-service-1
The gRPC client detects the closed TCP connection and logs:
Closing: connection error: desc = "error reading from server: EOF"
The affected SubChannel transitions through:
READY → IDLE → CONNECTING → TRANSIENT_FAILURE → SHUTDOWN
Eventually, it is removed from the load balancer, and traffic is routed only to healthy instances.
Restarting the container:
docker compose start order-service-1
Does not immediately change client behavior. gRPC maintains long-lived HTTP/2 connections and will not refresh DNS while all existing connections are healthy.
Improving Responsiveness with Keepalive
To make the client more responsive to backend changes, we can configure keepalives:
conn, err := grpc.NewClient(
url,
grpc.WithDefaultServiceConfig(`{"loadBalancingPolicy":"round_robin"}`),
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithKeepaliveParams(keepalive.ClientParameters{
Time: 30 * time.Second,
Timeout: 5 * time.Second,
PermitWithoutStream: false,
}),
)
Keepalive does not refresh DNS by itself. It periodically pings backends to detect unresponsive connections. When a backend fails, the resulting state change triggers internal reconnection logic, which may include DNS re-resolution—allowing new backends to be discovered earlier.
DNS-based client-side load balancing is reactive by design. It prioritizes connection stability over aggressive backend discovery. For most Kubernetes use cases like pod crashes, restarts, and rolling updates naturally trigger DNS refreshes.
Client-Side Retries
Why Client-Side Retries Matter
In a distributed system, failures are expected rather than exceptional. Even when client-side load balancing is in place, individual backends can fail due to pod crashes, container restarts during rolling deployments, transient network issues, or momentary server overload. Without retries, a single transient failure immediately propagates as an error to the caller, despite the presence of other healthy backends.
Client-side retries allow the gRPC client to handle these failures gracefully. When a request fails, the client can mark the backend as unhealthy, select another backend from the load balancer, and retry the RPC transparently. This improves overall availability without pushing retry logic into application code.
When to Retry
Retries at the load-balancing layer are intentionally conservative. Only the UNAVAILABLE gRPC status code is retried automatically. UNAVAILABLE indicates that the server could not be reached, the connection was dropped, or the backend was temporarily unable to serve requests. This usually means the request never reached application logic, making it safe to retry on another backend.
Other status codes are not retried at the LB level because retries could introduce correctness issues:
-
DEADLINE_EXCEEDED: server may still be processing the request -
ABORTED: often signals concurrency conflicts -
INVALID_ARGUMENTorFAILED_PRECONDITION: indicate client or business-logic errors where retries are ineffective
For application-level errors, a custom interceptor is recommended, applying retries only to explicitly idempotent RPCs.
Configuring Client-Side Retries in Go
In gRPC Go, retries are configured using the service config. Here, we extend our client configuration to enable retries for the OrderService:
conn, err := grpc.NewClient(url,
grpc.WithDefaultServiceConfig(`{
"loadBalancingPolicy":"round_robin",
"methodConfig":[
{
"name":[{"service":"order.OrderService"}],
"retryPolicy":{
"maxAttempts":3,
"initialBackoff":"0.1s",
"maxBackoff":"1s",
"backoffMultiplier":2,
"retryableStatusCodes":["UNAVAILABLE"]
}
}
]
}`),
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithKeepaliveParams(keepalive.ClientParameters{
Time: 30 * time.Second,
Timeout: 5 * time.Second,
PermitWithoutStream: false,
}),
)
This configuration allows up to three attempts per RPC (one original request + two retries) using exponential backoff: 100ms → 200ms → capped at 1s. Retries are restricted to UNAVAILABLE, ensuring safe and meaningful retry behavior.
Demonstrating Retries in Action
If one Order Service container becomes unhealthy, the gRPC client retries the request on another container selected by the round-robin policy without exposing failure to the caller.
We can simulate failure in the GetOrder RPC using the GRPC_FORCE_UNAVAILABLE environment variable:
func (o *OrderServer) GetOrder(ctx context.Context, in *proto.GetOrderRequest) (*proto.GetOrderResponse, error) {
if os.Getenv("GRPC_FORCE_UNAVAILABLE") == "true" {
log.Println("gRPC simulated failure, another instance will handle the request!")
return nil, status.Error(codes.Unavailable, "simulated failure")
}
instanceID := os.Getenv("INSTANCE_ID")
if instanceID == "" {
instanceID, _ = os.Hostname()
}
log.Println("Order request received for ID:", in.Id, "served by instance:", instanceID)
return &proto.GetOrderResponse{
Id: in.Id,
ServedBy: instanceID,
}, nil
}
Enable this mode for one container in docker-compose.yaml:
order-service-1:
build:
context: ./order-service
target: production
hostname: order-service
environment:
- INSTANCE_ID=order-service-1
- GRPC_FORCE_UNAVAILABLE=true
Observing Retry Behavior
With all containers running, multiple requests demonstrate automatic retries. Only healthy backends respond:
{"id":"1","served_by":"order-service-2"}
{"id":"1","served_by":"order-service-3"}
{"id":"1","served_by":"order-service-2"}
Logs from the containers confirm the process:
order-service-1-1 | 2025/12/16 17:46:39 gRPC simulated failure, another instance will handle the request!
order-service-2-1 | 2025/12/16 17:46:39 Order request received for ID: 1 served by instance: order-service-2
order-service-3-1 | 2025/12/16 17:46:42 Order request received for ID: 1 served by instance: order-service-3
order-service-1-1 | 2025/12/16 17:46:45 gRPC simulated failure, another instance will handle the request!
order-service-2-1 | 2025/12/16 17:46:45 Order request received for ID: 1 served by instance: order-service-2
This illustrates how client-side retries and round-robin load balancing work together: failures are isolated, healthy backends continue serving traffic, and the system remains resilient without additional application-level retry logic.
Running on Kubernetes with KIND
Now let’s run the same setup on Kubernetes and see how client-side gRPC load balancing works using a Headless Service to return all Pod IPs to the client.
We’ll use KIND (Kubernetes IN Docker) to run a local Kubernetes cluster. Make sure you have KIND and kubectl installed.
We’ll keep the manifests minimal and focus only on what’s required for gRPC-based client-side load balancing.
API Gateway manifests
For the API Gateway, we’ll create k8s/api-gateway.yaml with a Deployment and a standard ClusterIP Service:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-gateway
labels:
app: api-gateway
spec:
replicas: 1
selector:
matchLabels:
app: api-gateway
template:
metadata:
labels:
app: api-gateway
spec:
containers:
- name: api-gateway
image: api-gateway:latest
imagePullPolicy: IfNotPresent
livenessProbe:
httpGet:
path: /livez
port: 4000
readinessProbe:
httpGet:
path: /readyz
port: 4000
env: # Optional env variables for gRPC logging
- name: GRPC_GO_LOG_SEVERITY_LEVEL
value: "info"
- name: GRPC_GO_LOG_VERBOSITY_LEVEL
value: "99"
---
apiVersion: v1
kind: Service
metadata:
name: api-gateway
labels:
app: api-gateway
spec:
type: ClusterIP
selector:
app: api-gateway
ports:
- name: http
protocol: TCP
port: 4000
targetPort: 4000
This Service is only used to expose the API Gateway internally and for port-forwarding during local testing.
Order Service manifests (Headless Service)
Next, we’ll create k8s/order-service.yaml for the Order Service:
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
labels:
app: order-service
spec:
replicas: 3
selector:
matchLabels:
app: order-service
template:
metadata:
labels:
app: order-service
spec:
containers:
- name: order-service
image: order-service:latest
imagePullPolicy: IfNotPresent
livenessProbe:
httpGet:
path: /livez
port: 4000
readinessProbe:
httpGet:
path: /readyz
port: 4000
---
apiVersion: v1
kind: Service
metadata:
name: order-service
labels:
app: order-service
spec:
clusterIP: None
selector:
app: order-service
ports:
- name: grpc
protocol: TCP
port: 5000
targetPort: 5000
We’re running three replicas of the Order Service.
The key detail here is clusterIP: None, which makes this a Headless Service.
A Headless Service does not allocate a virtual ClusterIP. Instead, Kubernetes publishes one DNS record per Pod IP. When the gRPC client resolves order-service, it receives a list of all backing Pod IPs, which is exactly what we want for client-side load balancing.
Why probes matter for DNS-based LB
Although liveness and readiness probes are important for all services, they are especially important here:
- When a Pod is created and becomes Ready, it is added to DNS
- When a Pod becomes NotReady or is terminated, it is removed from DNS
- The gRPC client will only see Ready Pods when it re-resolves DNS
In this demo, /livez and /readyz simply return 200 OK.
In a real system, /readyz would check service dependencies like Redis, Kafka, RabbitMQ, or databases and return 503 if the service is not ready to handle traffic.
Creating the KIND cluster
Let’s create a local Kubernetes cluster using KIND:
kind create cluster
kubectl config set-context kind-kind
One advantage of KIND is that it can load Docker images directly, so we don’t need a container registry. This works because our Deployments use imagePullPolicy: IfNotPresent.
Build and load the images:
docker build -t api-gateway:latest ./api-gateway
docker build -t order-service:latest ./order-service
kind load docker-image api-gateway:latest
kind load docker-image order-service:latest
Apply all manifests:
kubectl apply -f k8s/
Watch the Pods come up:
kubectl get pods -w
Once everything is ready, port-forward the API Gateway so we can access it from our host machine:
kubectl port-forward svc/api-gateway 4000:4000
Verifying load balancing
Now let’s send a few requests to the API Gateway:
curl localhost:4000/orders/1
We should see responses coming from different Pods:
{"id":"1","served_by":"order-service-644849c549-prxms"}
{"id":"1","served_by":"order-service-644849c549-lbbp7"}
{"id":"1","served_by":"order-service-644849c549-cp8cp"}
In Kubernetes, os.Hostname() resolves to the Pod name, which makes it easy to see which Pod handled each request.
We can also verify this via logs:
kubectl logs -l app=order-service -f
2025/12/17 08:43:07 Order request received for ID: 1 served by instance: order-service-644849c549-prxms
2025/12/17 08:43:08 Order request received for ID: 1 served by instance: order-service-644849c549-lbbp7
2025/12/17 08:43:08 Order request received for ID: 1 served by instance: order-service-644849c549-cp8cp
Testing retries with a faulty Pod
To test retries on the UNAVAILABLE status, we’ll deploy a deliberately faulty Order Service instance.
Create k8s/order-service-faulty.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service-faulty
labels:
app: order-service-faulty
spec:
replicas: 1
selector:
matchLabels:
app: order-service
type: faulty
template:
metadata:
labels:
app: order-service
type: faulty
spec:
containers:
- name: order-service
image: order-service:latest
imagePullPolicy: IfNotPresent
livenessProbe:
httpGet:
path: /livez
port: 4000
readinessProbe:
httpGet:
path: /readyz
port: 4000
env:
- name: INSTANCE_ID
value: order-service-faulty
- name: GRPC_FORCE_UNAVAILABLE
value: "true"
Both Deployments share the label app: order-service, so the Headless Service will now return four Pod IPs: three healthy Pods and one faulty one.
Apply the manifest:
kubectl apply -f k8s/order-service-faulty.yaml
Once the Pod is ready, scale down the original Order Service to trigger a DNS update on the client:
kubectl scale --replicas 2 deployment order-service
Make a few requests and check the logs again:
2025/12/17 08:46:51 gRPC simulated failure, another instance will handle the request!
2025/12/17 08:46:51 Order request received for ID: 1 served by instance: order-service-644849c549-prxms
2025/12/17 08:46:52 Order request received for ID: 1 served by instance: order-service-644849c549-lbbp7
2025/12/17 08:47:02 gRPC simulated failure, another instance will handle the request!
2025/12/17 08:47:02 Order request received for ID: 1 served by instance: order-service-644849c549-prxms
The faulty Pod consistently returns UNAVAILABLE, and the gRPC client transparently retries the request on a healthy Pod using the round-robin policy.
With this, we’ve validated gRPC client-side load balancing using DNS on both Docker Compose (for quick local experimentation) and Kubernetes with KIND, showing that the same principles apply in production-like environments.
Alternatives to Client-Side Load Balancing
Client-side load balancing using DNS works well because it is simple, transparent, and has very few moving parts. For many internal microservice setups, especially stateless services running on Kubernetes, this approach is often sufficient and production-ready.
However, DNS-based client-side load balancing has clear limits. The client reacts to failures and topology changes, but it does not actively manage traffic. We don’t get instant rebalancing, fine-grained routing rules, or full visibility into per-request behavior beyond what the client exposes.
A common alternative is proxy-based load balancing, for example with Envoy, typically configured through xDS APIs. The mental model here is different: instead of each client making load-balancing decisions, traffic flows through a dedicated proxy with a global view of the backend topology. This enables more advanced capabilities such as dynamic configuration updates, traffic shifting, consistent retries and timeouts, and richer observability.
For many teams, DNS-based client-side load balancing is a pragmatic first step. More advanced solutions can be introduced later, once the system’s complexity and traffic patterns actually justify them.
Conclusion
Thanks for reading! In this article, we explored how gRPC client-side load balancing works using DNS-based discovery, first with Docker Compose and then on Kubernetes using a Headless Service and KIND. We looked at how gRPC manages long-lived connections, how failures trigger re-resolution, and how retries on UNAVAILABLE help route traffic away from unhealthy backends.
While DNS-based client-side load balancing doesn’t aggressively discover new backends, it provides a practical, production-proven approach for many internal microservice workloads and is a great way to understand the fundamentals before moving to more advanced approaches like xDS or service meshes.
If you found this useful, I’d love to hear your thoughts in the comments, and feel free to check out the full working example on GitHub: https://github.com/SagarMaheshwary/go-grpc-load-balancing
Top comments (0)