DEV Community

Cover image for Understanding gRPC Client-Side Load Balancing with DNS
Sagar Maheshwary
Sagar Maheshwary

Posted on

Understanding gRPC Client-Side Load Balancing with DNS

gRPC is a high-performance RPC framework widely used in microservices architectures. Because gRPC is built on HTTP/2, multiple RPCs are multiplexed over a single long-lived TCP connection. This makes traditional load balancing approaches such as placing gRPC services behind a simple L4 or L7 load balancer ineffective, often resulting in all traffic being sent to a single backend instance.

gRPC addresses this problem by supporting client-side load balancing natively. Instead of relying on an external proxy, the client itself discovers backend instances and decides where each RPC should be sent.

In this article, we’ll explore gRPC client-side load balancing in practice. We’ll build two simple Go services: an API Gateway that exposes a REST endpoint, and an Order Service that implements a GetOrder gRPC RPC. The Order Service will return both the order ID and the instance (container/pod) that handled the request, making load-balancing behavior easy to observe.

We’ll start by running the system using Docker Compose, observe how requests are distributed across multiple Order Service containers, and see what happens when instances go down or come back up. We’ll then configure retries to handle transient failures. Finally, we’ll deploy the same setup to a local Kubernetes cluster using KIND to understand how gRPC load balancing behaves in a Kubernetes environment.

Table of Contents

Creating the Services

Directory Structure

We’ll keep the code for both services minimal and focused purely on gRPC. Below is the directory structure we’ll be building:

.
├── api-gateway/
│   ├── cmd/
│   ├── proto/
│   └── internal/
│     ├── grpc/
│     └── http/
├── order-service/
│   ├── cmd/
│   ├── proto/
│   └── internal/
│     ├── grpc/
│     └── http/
├── k8s/
└── docker-compose.yaml
Enter fullscreen mode Exit fullscreen mode

We use a REST endpoint in the API Gateway purely as a convenience layer. It allows us to trigger gRPC requests using simple curl commands and observe load-balancing behavior without requiring a dedicated gRPC client tool. This setup also mirrors real-world microservice architectures, where API gateways often expose HTTP/REST APIs while communicating with backend services over gRPC.

API Gateway

Let's start with the API Gateway and bootstrap a Go project:

mkdir api-gateway
cd api-gateway
go mod init github.com/sagarmaheshwary/go-grpc-load-balancing/api-gateway
Enter fullscreen mode Exit fullscreen mode

Create the proto definition in proto/order/order.proto:

syntax = "proto3";

package order;

option go_package = "github.com/SagarMaheshwary/go-grpc-load-balancing/api-gateway/proto/order";

service OrderService {
  rpc GetOrder(GetOrderRequest) returns (GetOrderResponse);
}

message GetOrderRequest {
  string id = 1;
}

message GetOrderResponse {
  string id = 1;
  string served_by = 2;
}
Enter fullscreen mode Exit fullscreen mode

To generate Go code from proto files, you’ll need to install:

  1. protoc compiler
  2. Go plugins for protobuf and gRPC:
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
Enter fullscreen mode Exit fullscreen mode

Now we can generate the code with below command:

protoc --go_out=. --go_opt=paths=source_relative --go-grpc_out=. --go-grpc_opt=paths=source_relative ./proto/order/order.proto
Enter fullscreen mode Exit fullscreen mode

Next, let’s create a minimal gRPC client that establishes a non-TLS connection to the server in internal/grpc/client.go:

package grpc

import (
    "context"
    "log"

    proto "github.com/sagarmaheshwary/go-grpc-load-balancing/api-gateway/proto/order"
    "google.golang.org/grpc"
    "google.golang.org/grpc/credentials/insecure"
)

func NewClient(ctx context.Context, url string) (*OrderClient, *grpc.ClientConn, error) {
    conn, err := grpc.NewClient(url,
        grpc.WithTransportCredentials(insecure.NewCredentials()),
    )

    if err != nil {
        log.Println("Order gRPC client failed to connect", err)
        return nil, nil, err
    }

    log.Println("Order gRPC client connected on " + url)
    client := NewOrderClient(proto.NewOrderServiceClient(conn))

    return client, conn, nil
}
Enter fullscreen mode Exit fullscreen mode

and the handler for the GetOrder RPC in internal/grpc/handler.go:

package grpc

import (
    "context"

    proto "github.com/sagarmaheshwary/go-grpc-load-balancing/api-gateway/proto/order"
)

type OrderClient struct {
    client proto.OrderServiceClient
}

func NewOrderClient(c proto.OrderServiceClient) *OrderClient {
    return &OrderClient{
        client: c,
    }
}

func (o *OrderClient) GetOrder(ctx context.Context, in *proto.GetOrderRequest) (*proto.GetOrderResponse, error) {
    res, err := o.client.GetOrder(ctx, in)
    if err != nil {
        return nil, err
    }

    return res, nil
}
Enter fullscreen mode Exit fullscreen mode

Now let’s create a simple HTTP server using gin for the REST API in internal/http/server.go:

package http

import (
    "net/http"

    "github.com/gin-gonic/gin"
    "github.com/sagarmaheshwary/go-grpc-load-balancing/api-gateway/internal/grpc"
)

type HTTPServer struct {
    URL    string
    Server *http.Server
}

func NewServer(url string, grpcClient *grpc.OrderClient) *HTTPServer {
    gin.SetMode(gin.ReleaseMode)
    r := gin.New()

    handler := &HTTPHandler{
        GRPCClient: grpcClient,
    }

    r.GET("/livez", handler.Livez)
    r.GET("/readyz", handler.Readyz)
    r.GET("/orders/:id", handler.GetOrder)

    return &HTTPServer{
        URL: url,
        Server: &http.Server{
            Addr:    url,
            Handler: r,
        },
    }
}

func (h *HTTPServer) Serve() error {
    return h.Server.ListenAndServe()
}
Enter fullscreen mode Exit fullscreen mode

Create the HTTP handlers in internal/http/handler.go:

package http

import (
    "net/http"

    "github.com/gin-gonic/gin"
    "github.com/sagarmaheshwary/go-grpc-load-balancing/api-gateway/internal/grpc"
    proto "github.com/sagarmaheshwary/go-grpc-load-balancing/api-gateway/proto/order"
)

type HTTPHandler struct {
    GRPCClient *grpc.OrderClient
}

func (h *HTTPHandler) Livez(c *gin.Context) {
    c.JSON(http.StatusOK, map[string]string{"status": "ok"})
}

func (h *HTTPHandler) Readyz(c *gin.Context) {
    c.JSON(http.StatusOK, map[string]string{"status": "ready"})
}

func (h *HTTPHandler) GetOrder(c *gin.Context) {
    orderID := c.Param("id")
    if orderID == "" {
        c.JSON(http.StatusBadRequest, map[string]string{"error": "missing order id"})
        return
    }

    req := &proto.GetOrderRequest{
        Id: orderID,
    }
    res, err := h.GRPCClient.GetOrder(c.Request.Context(), req)
    if err != nil {
        // For demo simplicity, map all gRPC errors to 500.
        // In real systems, you'd inspect status.Code(err).
        c.JSON(http.StatusInternalServerError, map[string]string{"error": err.Error()})
        return
    }

    c.JSON(http.StatusOK, res)
}
Enter fullscreen mode Exit fullscreen mode

Apart from the /orders endpoint, we also expose /livez and /readyz endpoints for Kubernetes health probes. We’ll discuss their role in more detail later, when running the services on KIND.

Now let’s bootstrap the gRPC client and HTTP server in cmd/server/main.go, which serves as the API Gateway’s entrypoint:

package main

import (
    "context"
    "errors"
    "log"
    "net/http"
    "os"
    "os/signal"
    "time"

    "github.com/sagarmaheshwary/go-grpc-load-balancing/api-gateway/internal/grpc"
    httpserver "github.com/sagarmaheshwary/go-grpc-load-balancing/api-gateway/internal/http"
)

var (
    httpServerURL       = "0.0.0.0:4000"
    grpcOrderServiceURL = "order-service:5000"
)

func main() {
    ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt)
    defer stop()

    grpcClient, conn, err := grpc.NewClient(ctx, grpcOrderServiceURL)
    if err != nil {
        //An API Gateway should not crash if the downstream service is down.
        //However, for simplicity, we are logging the error and exiting.
        log.Fatal(err)
    }
    defer conn.Close()

    httpServer := httpserver.NewServer(httpServerURL, grpcClient)
    go func() {
        log.Println("Starting HTTP server on", httpServerURL)
        err := httpServer.Serve()
        if err != nil && !errors.Is(err, http.ErrServerClosed) {
            stop()
        }
    }()

    log.Println("API Gateway is running...")

    <-ctx.Done()
    log.Println("shutting down API Gateway...")

    shutdownCtx, cancelShutdown := context.WithTimeout(context.Background(), 3*time.Second)
    if err := httpServer.Server.Shutdown(shutdownCtx); err != nil {
        log.Fatalf("failed to shutdown HTTP server: %v", err)
    }
    cancelShutdown()

    log.Println("API Gateway stopped")
}
Enter fullscreen mode Exit fullscreen mode

The main function also handles graceful shutdown by listening for the os.Interrupt (SIGINT) signal and properly shutting down the HTTP server and gRPC connection before exiting.

Order Service

The Order Service is similar to the API Gateway, except it exposes a gRPC server along with a lightweight HTTP server for health checks.

Let’s create the Order Service:

cd ..
mkdir order-service
cd order-service
go mod init github.com/sagarmaheshwary/go-grpc-load-balancing/order-service
Enter fullscreen mode Exit fullscreen mode

For order.proto, we reuse the proto file from the API Gateway, update the go_package path, and regenerate the code.

Create the GetOrder handler in internal/grpc/handler.go. This handler simply returns the same order ID along with a ServedBy value. The ServedBy field helps us visually confirm which instance handled the request during load-balancing experiments. It uses the INSTANCE_ID environment variable if present, or falls back to the hostname:

package grpc

import (
    "context"
    "log"
    "os"

    proto "github.com/sagarmaheshwary/go-grpc-load-balancing/order-service/proto/order"
    "google.golang.org/grpc/codes"
    "google.golang.org/grpc/status"
)

type OrderServer struct {
    proto.OrderServiceServer
}

func (o *OrderServer) GetOrder(ctx context.Context, in *proto.GetOrderRequest) (*proto.GetOrderResponse, error) {
    instanceID := os.Getenv("INSTANCE_ID")
    if instanceID == "" {
        instanceID, _ = os.Hostname()
    }

    log.Println("Order request received for ID:", in.Id, "served by instance:", instanceID)

    return &proto.GetOrderResponse{
        Id:       in.Id,
        ServedBy: instanceID,
    }, nil
}
Enter fullscreen mode Exit fullscreen mode

Next, create the gRPC server in internal/grpc/server.go and register the OrderServer:

package grpc

import (
    "log"
    "net"

    proto "github.com/sagarmaheshwary/go-grpc-load-balancing/order-service/proto/order"
    "google.golang.org/grpc"
)

type GRPCServer struct {
    Server *grpc.Server
    URL    string
}

func NewServer(url string) *GRPCServer {
    srv := grpc.NewServer()

    proto.RegisterOrderServiceServer(srv, &OrderServer{})

    return &GRPCServer{
        Server: srv,
        URL:    url,
    }
}

func (s *GRPCServer) Serve() error {
    lis, err := net.Listen("tcp", s.URL)
    if err != nil {
        log.Printf("failed to listen on %s: %v", s.URL, err)
        return err
    }
    return s.Server.Serve(lis)
}
Enter fullscreen mode Exit fullscreen mode

We also need an HTTP server to expose health-check endpoints.

internal/http/handler:

package http

import (
    "net/http"

    "github.com/gin-gonic/gin"
)

func Livez(c *gin.Context) {
    c.JSON(http.StatusOK, map[string]string{"status": "ok"})
}

func Readyz(c *gin.Context) {
    c.JSON(http.StatusOK, map[string]string{"status": "ready"})
}
Enter fullscreen mode Exit fullscreen mode

internal/http/server.go:

package http

import (
    "net/http"

    "github.com/gin-gonic/gin"
)

type HTTPServer struct {
    URL    string
    Server *http.Server
}

func NewServer(url string) *HTTPServer {
    gin.SetMode(gin.ReleaseMode)
    r := gin.New()

    r.GET("/livez", Livez)
    r.GET("/readyz", Readyz)

    return &HTTPServer{
        URL: url,
        Server: &http.Server{
            Addr:    url,
            Handler: r,
        },
    }
}

func (h *HTTPServer) Serve() error {
    return h.Server.ListenAndServe()
}
Enter fullscreen mode Exit fullscreen mode

Finally, create the main function in cmd/server/main.go:

package main

import (
    "context"
    "errors"
    "log"
    "net/http"
    "os"
    "os/signal"
    "time"

    grpcserver "github.com/sagarmaheshwary/go-grpc-load-balancing/order-service/internal/grpc"
    httpserver "github.com/sagarmaheshwary/go-grpc-load-balancing/order-service/internal/http"
    "google.golang.org/grpc"
)

var (
    httpServerURL = "0.0.0.0:4000"
    grpcServerURL = "0.0.0.0:5000"
)

func main() {
    ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt)
    defer stop()

    httpServer := httpserver.NewServer(httpServerURL)
    go func() {
        log.Println("Starting HTTP server on", httpServerURL)
        err := httpServer.Serve()
        if err != nil && !errors.Is(err, http.ErrServerClosed) {
            stop()
        }
    }()

    grpcServer := grpcserver.NewServer(grpcServerURL)
    go func() {
        log.Println("Starting gRPC server on", grpcServerURL)
        if err := grpcServer.Serve(); err != nil && errors.Is(err, grpc.ErrServerStopped) {
            stop()
        }
    }()

    log.Println("Order Service is running...")

    <-ctx.Done()
    log.Println("shutting down Order Service...")

    grpcServer.Server.GracefulStop()

    shutdownCtx, cancelShutdown := context.WithTimeout(context.Background(), 3*time.Second)
    if err := httpServer.Server.Shutdown(shutdownCtx); err != nil {
        log.Fatalf("failed to shutdown HTTP server: %v", err)
    }
    cancelShutdown()

    log.Println("Order Service stopped")
}
Enter fullscreen mode Exit fullscreen mode

gRPC Client-Side Load Balancing Basics

In client-side load balancing, the gRPC client resolves multiple backend addresses, maintains a sub-connection to each backend, and uses a load balancing policy to decide which backend handles each RPC. The server is completely unaware of the load balancing; all logic lives inside the gRPC client.

This model is particularly well-suited for Kubernetes and modern microservices because it is more efficient, avoiding an extra proxy hop and works well with long-lived HTTP/2 connections, which gRPC relies on. For many gRPC workloads, this approach is preferred when you want direct client-to-backend communication and simple, transparent load balancing.

Service Discovery via DNS

The simplest and most common form of client-side load balancing uses DNS-based service discovery. When we configure the gRPC client with a DNS target like dns:///order-service:5000, the DNS resolver returns multiple A records, with each record representing a backend instance (pod or container). The gRPC client then creates a SubConn for each of these addresses, and the configured load balancing policy distributes RPCs across them.

In Docker Compose, this works because multiple containers can share the same hostname, and Docker DNS will return all container IPs. In Kubernetes, we achieve the same behavior using Headless Services (clusterIP: None), where each pod gets its own DNS A record that the gRPC client can discover and use for load balancing.

Load Balancing Policies in gRPC

gRPC supports multiple load balancing policies. The most important ones are:

Pick First (Default)

  • Client picks one backend and sticks to it
  • No real load balancing
  • Bad fit for scalable systems

Round Robin

  • RPCs are distributed across all healthy backends
  • Most commonly used policy
  • Ideal for stateless services

Weighted load balancing allows sending different proportions of traffic to different backends (for example, 90% to v1 and 10% to v2).

This is not supported with plain DNS-based client-side load balancing. In gRPC, weighted policies are typically implemented using xDS and Envoy and are commonly used for canary deployments and gradual rollouts.

We will use Round Robin policy for this article.

To enable it, update the internal/grpc/client.go in API Gateway:

conn, err := grpc.NewClient(url,
    grpc.WithTransportCredentials(insecure.NewCredentials()),
    grpc.WithDefaultServiceConfig(`{
        "loadBalancingPolicy":"round_robin"
    }`),
)
Enter fullscreen mode Exit fullscreen mode

See this proto to find out all the supported values for service config.

Running with Docker Compose

Dockerfile and Docker Compose

We start by creating a Dockerfile for each service:

FROM golang:1.25 AS builder

WORKDIR /app

COPY go.mod go.sum ./
RUN go mod download

COPY . .

RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
    go build -o /app/main ./cmd/server/main.go

FROM alpine:3.22 AS production

WORKDIR /app
COPY --from=builder /app/main .

EXPOSE 4000 5000

CMD ["./main"]
Enter fullscreen mode Exit fullscreen mode

Next, we create a docker-compose.yml in the root directory:

services:
  api-gateway:
    build:
      context: ./api-gateway
      target: production
    ports:
      - 4000:4000

  order-service-1:
    build:
      context: ./order-service
      target: production
    hostname: order-service
    environment:
      - INSTANCE_ID=order-service-1

  order-service-2:
    build:
      context: ./order-service
      target: production
    hostname: order-service
    environment:
      - INSTANCE_ID=order-service-2

  order-service-3:
    build:
      context: ./order-service
      target: production
    hostname: order-service
    environment:
      - INSTANCE_ID=order-service-3
Enter fullscreen mode Exit fullscreen mode

Here, we are creating three order-service containers and assigning them the same hostname order-service. Docker DNS will return all container IPs when queried, enabling client-side load balancing.

Testing Load Balancing

We can now start the services:

docker compose up
Enter fullscreen mode Exit fullscreen mode

Making a request to the API Gateway:

curl localhost:4000/orders/1
Enter fullscreen mode Exit fullscreen mode

Should return a response like:

{ "id": "1", "served_by": "order-service-1" }
Enter fullscreen mode Exit fullscreen mode

Repeated requests will be distributed across all containers:

{"id":"1","served_by":"order-service-2"}
{"id":"1","served_by":"order-service-3"}
{"id":"1","served_by":"order-service-1"}
Enter fullscreen mode Exit fullscreen mode

Each container also logs the request it serves:

order-service-1-1  | Order request received for ID: 1 served by instance: order-service-1
order-service-2-1  | Order request received for ID: 1 served by instance: order-service-2
order-service-3-1  | Order request received for ID: 1 served by instance: order-service-3
Enter fullscreen mode Exit fullscreen mode

Inspecting gRPC Internal Logs

For a deeper look at client-side load balancing, we can enable verbose gRPC logs:

environment:
  - GRPC_GO_LOG_SEVERITY_LEVEL=info
  - GRPC_GO_LOG_VERBOSITY_LEVEL=99
Enter fullscreen mode Exit fullscreen mode

On the first request, the gRPC client resolves DNS and establishes one SubConn per backend:

Subchannel picks a new address "172.19.0.5:5000"
Subchannel picks a new address "172.19.0.2:5000"
Subchannel picks a new address "172.19.0.4:5000"
Enter fullscreen mode Exit fullscreen mode

This confirms that the client discovered all three containers and created independent connections to each.

Stopping and Restarting Backends

Let’s stop one backend container:

docker compose stop order-service-1
Enter fullscreen mode Exit fullscreen mode

The gRPC client detects the closed TCP connection and logs:

Closing: connection error: desc = "error reading from server: EOF"
Enter fullscreen mode Exit fullscreen mode

The affected SubChannel transitions through:

READY → IDLE → CONNECTING → TRANSIENT_FAILURE → SHUTDOWN
Enter fullscreen mode Exit fullscreen mode

Eventually, it is removed from the load balancer, and traffic is routed only to healthy instances.

Restarting the container:

docker compose start order-service-1
Enter fullscreen mode Exit fullscreen mode

Does not immediately change client behavior. gRPC maintains long-lived HTTP/2 connections and will not refresh DNS while all existing connections are healthy.

Improving Responsiveness with Keepalive

To make the client more responsive to backend changes, we can configure keepalives:

conn, err := grpc.NewClient(
    url,
    grpc.WithDefaultServiceConfig(`{"loadBalancingPolicy":"round_robin"}`),
    grpc.WithTransportCredentials(insecure.NewCredentials()),
    grpc.WithKeepaliveParams(keepalive.ClientParameters{
        Time:                30 * time.Second,
        Timeout:             5 * time.Second,
        PermitWithoutStream: false,
    }),
)
Enter fullscreen mode Exit fullscreen mode

Keepalive does not refresh DNS by itself. It periodically pings backends to detect unresponsive connections. When a backend fails, the resulting state change triggers internal reconnection logic, which may include DNS re-resolution—allowing new backends to be discovered earlier.

DNS-based client-side load balancing is reactive by design. It prioritizes connection stability over aggressive backend discovery. For most Kubernetes use cases like pod crashes, restarts, and rolling updates naturally trigger DNS refreshes.

Client-Side Retries

Why Client-Side Retries Matter

In a distributed system, failures are expected rather than exceptional. Even when client-side load balancing is in place, individual backends can fail due to pod crashes, container restarts during rolling deployments, transient network issues, or momentary server overload. Without retries, a single transient failure immediately propagates as an error to the caller, despite the presence of other healthy backends.

Client-side retries allow the gRPC client to handle these failures gracefully. When a request fails, the client can mark the backend as unhealthy, select another backend from the load balancer, and retry the RPC transparently. This improves overall availability without pushing retry logic into application code.

When to Retry

Retries at the load-balancing layer are intentionally conservative. Only the UNAVAILABLE gRPC status code is retried automatically. UNAVAILABLE indicates that the server could not be reached, the connection was dropped, or the backend was temporarily unable to serve requests. This usually means the request never reached application logic, making it safe to retry on another backend.

Other status codes are not retried at the LB level because retries could introduce correctness issues:

  • DEADLINE_EXCEEDED: server may still be processing the request
  • ABORTED: often signals concurrency conflicts
  • INVALID_ARGUMENT or FAILED_PRECONDITION: indicate client or business-logic errors where retries are ineffective

For application-level errors, a custom interceptor is recommended, applying retries only to explicitly idempotent RPCs.

Configuring Client-Side Retries in Go

In gRPC Go, retries are configured using the service config. Here, we extend our client configuration to enable retries for the OrderService:

conn, err := grpc.NewClient(url,
    grpc.WithDefaultServiceConfig(`{
        "loadBalancingPolicy":"round_robin",
        "methodConfig":[
            {
                "name":[{"service":"order.OrderService"}],
                "retryPolicy":{
                    "maxAttempts":3,
                    "initialBackoff":"0.1s",
                    "maxBackoff":"1s",
                    "backoffMultiplier":2,
                    "retryableStatusCodes":["UNAVAILABLE"]
                }
            }
        ]
    }`),
    grpc.WithTransportCredentials(insecure.NewCredentials()),
    grpc.WithKeepaliveParams(keepalive.ClientParameters{
        Time:                30 * time.Second,
        Timeout:             5 * time.Second,
        PermitWithoutStream: false,
    }),
)
Enter fullscreen mode Exit fullscreen mode

This configuration allows up to three attempts per RPC (one original request + two retries) using exponential backoff: 100ms → 200ms → capped at 1s. Retries are restricted to UNAVAILABLE, ensuring safe and meaningful retry behavior.

Demonstrating Retries in Action

If one Order Service container becomes unhealthy, the gRPC client retries the request on another container selected by the round-robin policy without exposing failure to the caller.

We can simulate failure in the GetOrder RPC using the GRPC_FORCE_UNAVAILABLE environment variable:

func (o *OrderServer) GetOrder(ctx context.Context, in *proto.GetOrderRequest) (*proto.GetOrderResponse, error) {
    if os.Getenv("GRPC_FORCE_UNAVAILABLE") == "true" {
        log.Println("gRPC simulated failure, another instance will handle the request!")
        return nil, status.Error(codes.Unavailable, "simulated failure")
    }

    instanceID := os.Getenv("INSTANCE_ID")
    if instanceID == "" {
        instanceID, _ = os.Hostname()
    }

    log.Println("Order request received for ID:", in.Id, "served by instance:", instanceID)

    return &proto.GetOrderResponse{
        Id:       in.Id,
        ServedBy: instanceID,
    }, nil
}
Enter fullscreen mode Exit fullscreen mode

Enable this mode for one container in docker-compose.yaml:

order-service-1:
    build:
        context: ./order-service
        target: production
    hostname: order-service
    environment:
        - INSTANCE_ID=order-service-1
        - GRPC_FORCE_UNAVAILABLE=true
Enter fullscreen mode Exit fullscreen mode

Observing Retry Behavior

With all containers running, multiple requests demonstrate automatic retries. Only healthy backends respond:

{"id":"1","served_by":"order-service-2"}
{"id":"1","served_by":"order-service-3"}
{"id":"1","served_by":"order-service-2"}
Enter fullscreen mode Exit fullscreen mode

Logs from the containers confirm the process:

order-service-1-1  | 2025/12/16 17:46:39 gRPC simulated failure, another instance will handle the request!
order-service-2-1  | 2025/12/16 17:46:39 Order request received for ID: 1 served by instance: order-service-2
order-service-3-1  | 2025/12/16 17:46:42 Order request received for ID: 1 served by instance: order-service-3
order-service-1-1  | 2025/12/16 17:46:45 gRPC simulated failure, another instance will handle the request!
order-service-2-1  | 2025/12/16 17:46:45 Order request received for ID: 1 served by instance: order-service-2
Enter fullscreen mode Exit fullscreen mode

This illustrates how client-side retries and round-robin load balancing work together: failures are isolated, healthy backends continue serving traffic, and the system remains resilient without additional application-level retry logic.

Running on Kubernetes with KIND

Now let’s run the same setup on Kubernetes and see how client-side gRPC load balancing works using a Headless Service to return all Pod IPs to the client.

We’ll use KIND (Kubernetes IN Docker) to run a local Kubernetes cluster. Make sure you have KIND and kubectl installed.

We’ll keep the manifests minimal and focus only on what’s required for gRPC-based client-side load balancing.

API Gateway manifests

For the API Gateway, we’ll create k8s/api-gateway.yaml with a Deployment and a standard ClusterIP Service:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-gateway
  labels:
    app: api-gateway
spec:
  replicas: 1
  selector:
    matchLabels:
      app: api-gateway
  template:
    metadata:
      labels:
        app: api-gateway
    spec:
      containers:
        - name: api-gateway
          image: api-gateway:latest
          imagePullPolicy: IfNotPresent
          livenessProbe:
            httpGet:
              path: /livez
              port: 4000
          readinessProbe:
            httpGet:
              path: /readyz
              port: 4000
          env: # Optional env variables for gRPC logging
            - name: GRPC_GO_LOG_SEVERITY_LEVEL
              value: "info"
            - name: GRPC_GO_LOG_VERBOSITY_LEVEL
              value: "99"

---
apiVersion: v1
kind: Service
metadata:
  name: api-gateway
  labels:
    app: api-gateway
spec:
  type: ClusterIP
  selector:
    app: api-gateway
  ports:
    - name: http
      protocol: TCP
      port: 4000
      targetPort: 4000
Enter fullscreen mode Exit fullscreen mode

This Service is only used to expose the API Gateway internally and for port-forwarding during local testing.

Order Service manifests (Headless Service)

Next, we’ll create k8s/order-service.yaml for the Order Service:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
  labels:
    app: order-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: order-service
  template:
    metadata:
      labels:
        app: order-service
    spec:
      containers:
        - name: order-service
          image: order-service:latest
          imagePullPolicy: IfNotPresent
          livenessProbe:
            httpGet:
              path: /livez
              port: 4000
          readinessProbe:
            httpGet:
              path: /readyz
              port: 4000

---
apiVersion: v1
kind: Service
metadata:
  name: order-service
  labels:
    app: order-service
spec:
  clusterIP: None
  selector:
    app: order-service
  ports:
    - name: grpc
      protocol: TCP
      port: 5000
      targetPort: 5000
Enter fullscreen mode Exit fullscreen mode

We’re running three replicas of the Order Service.
The key detail here is clusterIP: None, which makes this a Headless Service.

A Headless Service does not allocate a virtual ClusterIP. Instead, Kubernetes publishes one DNS record per Pod IP. When the gRPC client resolves order-service, it receives a list of all backing Pod IPs, which is exactly what we want for client-side load balancing.

Why probes matter for DNS-based LB

Although liveness and readiness probes are important for all services, they are especially important here:

  • When a Pod is created and becomes Ready, it is added to DNS
  • When a Pod becomes NotReady or is terminated, it is removed from DNS
  • The gRPC client will only see Ready Pods when it re-resolves DNS

In this demo, /livez and /readyz simply return 200 OK.
In a real system, /readyz would check service dependencies like Redis, Kafka, RabbitMQ, or databases and return 503 if the service is not ready to handle traffic.

Creating the KIND cluster

Let’s create a local Kubernetes cluster using KIND:

kind create cluster
kubectl config set-context kind-kind
Enter fullscreen mode Exit fullscreen mode

One advantage of KIND is that it can load Docker images directly, so we don’t need a container registry. This works because our Deployments use imagePullPolicy: IfNotPresent.

Build and load the images:

docker build -t api-gateway:latest ./api-gateway
docker build -t order-service:latest ./order-service

kind load docker-image api-gateway:latest
kind load docker-image order-service:latest
Enter fullscreen mode Exit fullscreen mode

Apply all manifests:

kubectl apply -f k8s/
Enter fullscreen mode Exit fullscreen mode

Watch the Pods come up:

kubectl get pods -w
Enter fullscreen mode Exit fullscreen mode

Once everything is ready, port-forward the API Gateway so we can access it from our host machine:

kubectl port-forward svc/api-gateway 4000:4000
Enter fullscreen mode Exit fullscreen mode

Verifying load balancing

Now let’s send a few requests to the API Gateway:

curl localhost:4000/orders/1
Enter fullscreen mode Exit fullscreen mode

We should see responses coming from different Pods:

{"id":"1","served_by":"order-service-644849c549-prxms"}
{"id":"1","served_by":"order-service-644849c549-lbbp7"}
{"id":"1","served_by":"order-service-644849c549-cp8cp"}
Enter fullscreen mode Exit fullscreen mode

In Kubernetes, os.Hostname() resolves to the Pod name, which makes it easy to see which Pod handled each request.

We can also verify this via logs:

kubectl logs -l app=order-service -f
Enter fullscreen mode Exit fullscreen mode
2025/12/17 08:43:07 Order request received for ID: 1 served by instance: order-service-644849c549-prxms
2025/12/17 08:43:08 Order request received for ID: 1 served by instance: order-service-644849c549-lbbp7
2025/12/17 08:43:08 Order request received for ID: 1 served by instance: order-service-644849c549-cp8cp
Enter fullscreen mode Exit fullscreen mode

Testing retries with a faulty Pod

To test retries on the UNAVAILABLE status, we’ll deploy a deliberately faulty Order Service instance.

Create k8s/order-service-faulty.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service-faulty
  labels:
    app: order-service-faulty
spec:
  replicas: 1
  selector:
    matchLabels:
      app: order-service
      type: faulty
  template:
    metadata:
      labels:
        app: order-service
        type: faulty
    spec:
      containers:
        - name: order-service
          image: order-service:latest
          imagePullPolicy: IfNotPresent
          livenessProbe:
            httpGet:
              path: /livez
              port: 4000
          readinessProbe:
            httpGet:
              path: /readyz
              port: 4000
          env:
            - name: INSTANCE_ID
              value: order-service-faulty
            - name: GRPC_FORCE_UNAVAILABLE
              value: "true"
Enter fullscreen mode Exit fullscreen mode

Both Deployments share the label app: order-service, so the Headless Service will now return four Pod IPs: three healthy Pods and one faulty one.

Apply the manifest:

kubectl apply -f k8s/order-service-faulty.yaml
Enter fullscreen mode Exit fullscreen mode

Once the Pod is ready, scale down the original Order Service to trigger a DNS update on the client:

kubectl scale --replicas 2 deployment order-service
Enter fullscreen mode Exit fullscreen mode

Make a few requests and check the logs again:

2025/12/17 08:46:51 gRPC simulated failure, another instance will handle the request!
2025/12/17 08:46:51 Order request received for ID: 1 served by instance: order-service-644849c549-prxms
2025/12/17 08:46:52 Order request received for ID: 1 served by instance: order-service-644849c549-lbbp7
2025/12/17 08:47:02 gRPC simulated failure, another instance will handle the request!
2025/12/17 08:47:02 Order request received for ID: 1 served by instance: order-service-644849c549-prxms
Enter fullscreen mode Exit fullscreen mode

The faulty Pod consistently returns UNAVAILABLE, and the gRPC client transparently retries the request on a healthy Pod using the round-robin policy.

With this, we’ve validated gRPC client-side load balancing using DNS on both Docker Compose (for quick local experimentation) and Kubernetes with KIND, showing that the same principles apply in production-like environments.

Alternatives to Client-Side Load Balancing

Client-side load balancing using DNS works well because it is simple, transparent, and has very few moving parts. For many internal microservice setups, especially stateless services running on Kubernetes, this approach is often sufficient and production-ready.

However, DNS-based client-side load balancing has clear limits. The client reacts to failures and topology changes, but it does not actively manage traffic. We don’t get instant rebalancing, fine-grained routing rules, or full visibility into per-request behavior beyond what the client exposes.

A common alternative is proxy-based load balancing, for example with Envoy, typically configured through xDS APIs. The mental model here is different: instead of each client making load-balancing decisions, traffic flows through a dedicated proxy with a global view of the backend topology. This enables more advanced capabilities such as dynamic configuration updates, traffic shifting, consistent retries and timeouts, and richer observability.

For many teams, DNS-based client-side load balancing is a pragmatic first step. More advanced solutions can be introduced later, once the system’s complexity and traffic patterns actually justify them.

Conclusion

Thanks for reading! In this article, we explored how gRPC client-side load balancing works using DNS-based discovery, first with Docker Compose and then on Kubernetes using a Headless Service and KIND. We looked at how gRPC manages long-lived connections, how failures trigger re-resolution, and how retries on UNAVAILABLE help route traffic away from unhealthy backends.

While DNS-based client-side load balancing doesn’t aggressively discover new backends, it provides a practical, production-proven approach for many internal microservice workloads and is a great way to understand the fundamentals before moving to more advanced approaches like xDS or service meshes.

If you found this useful, I’d love to hear your thoughts in the comments, and feel free to check out the full working example on GitHub: https://github.com/SagarMaheshwary/go-grpc-load-balancing

Top comments (0)