Sagar Maheshwary

Posted on Nov 1 • Edited on Nov 15

Designing Production-Ready Microservices in Go Part 3 — Redis, Healthchecks, Observability (Metrics & Tracing)

#go #microservices #architecture #tutorial

In Part Two, we integrated Postgres using GORM, along with migrations and seeders, discussed Service layer pattern, created UserService example which we integrated with SayHello RPC, and ended the article with an integration test using testcontainers.

In this part, we will look at Redis integration for caching, Healthchecks for service monitoring, Observability with Prometheus metrics and OpenTelemetry tracing to make our microservice production-ready. We’ll also walk through a sample Grafana dashboard for metrics visualization and spin up a second service to demonstrate end-to-end distributed tracing in action.

Redis Integration
Healthchecks
Observability (Metrics and Tracing)
Wrapping Up the Series

Redis Integration

Caching is an important part of scaling any service. It reduces database load and speeds up response times by storing frequently accessed data in a fast in-memory store. We’ll use Redis, the most popular in-memory key–value store, for this purpose.

Redis supports far more than simple caching — features like Pub/Sub messaging, geospatial queries, and time series are built in — but for this project, we’ll keep things focused on basic caching functions that can easily be extended later.

Setup

We start by updating our config.go with a new Redis struct:

type Config struct {
    GRPCServer *GRPCServer `validate:"required"`
    Database   *Database   `validate:"required"`
    Redis      *Redis      `validate:"required"`
}

type Redis struct {
    Addr         string        `validate:"required"`
    Password     string        `validate:"required"`
    DB           int           `validate:"gte=0"`
    DialTimeout  time.Duration `validate:"gte=0"`
    ReadTimeout  time.Duration `validate:"gte=0"`
    WriteTimeout time.Duration `validate:"gte=0"`
    PoolSize     int           `validate:"gte=0"`
    MinIdleConns int           `validate:"gte=0"`
}

Then we load it inside NewConfigWithOptions():

    cfg := &Config{
        Redis: &Redis{
            Addr:         getEnv("REDIS_ADDR", "localhost:6379"),
            Password:     getEnv("REDIS_PASSWORD", "default"),
            DB:           getEnvInt("REDIS_DB", 0),
            DialTimeout:  getEnvDuration("REDIS_DIAL_TIMEOUT", time.Second*5),
            ReadTimeout:  getEnvDuration("REDIS_READ_TIMEOUT", time.Second*3),
            WriteTimeout: getEnvDuration("REDIS_WRITE_TIMEOUT", time.Second*3),
            PoolSize:     getEnvInt("REDIS_POOL_SIZE", 20),
            MinIdleConns: getEnvInt("REDIS_MIN_IDLE_CONNECTIONS", 5),
        },
        //...
    }

Next, we’ll create internal/cache/cache.go where our Redis implementation will live:

package cache

import (
  "fmt"
    "time"
    "context"
    "github.com/redis/go-redis/v9"
    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/config"
    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/logger"
)

type CacheService interface {
    Set(ctx context.Context, key string, value interface{}, expiration time.Duration) error
    Get(ctx context.Context, key string) (string, error)
    Delete(ctx context.Context, key string) error
    Ping(ctx context.Context) error
    Close() error
}

type RedisCache struct {
    client *redis.Client
}

type Opts struct {
    Config *config.Redis
    Logger logger.Logger
}

func NewRedisCache(ctx context.Context, opts *Opts) (CacheService, error) {
    cfg := opts.Config
    rdb := redis.NewClient(&redis.Options{
        Addr:         cfg.Addr,
        Password:     cfg.Password,
        DB:           cfg.DB,
        DialTimeout:  cfg.DialTimeout,
        ReadTimeout:  cfg.ReadTimeout,
        WriteTimeout: cfg.WriteTimeout,
        PoolSize:     cfg.PoolSize,
        MinIdleConns: cfg.MinIdleConns,
    })

    r := &RedisCache{client: rdb}

    if err := r.Ping(ctx); err != nil {
        return nil, fmt.Errorf("failed to connect to redis: %v", err)
    }

    opts.Logger.Info("Redis connected")

    return r, nil
}

func (r *RedisCache) Set(ctx context.Context, key string, value interface{}, expiration time.Duration) error {
    return r.client.Set(ctx, key, value, expiration).Err()
}

func (r *RedisCache) Get(ctx context.Context, key string) (string, error) {
    return r.client.Get(ctx, key).Result()
}

func (r *RedisCache) Delete(ctx context.Context, key string) error {
    return r.client.Del(ctx, key).Err()
}

func (r *RedisCache) Ping(ctx context.Context) error {
    return r.client.Ping(ctx).Err()
}

func (r *RedisCache) Close() error {
    if r == nil || r.client == nil {
        return fmt.Errorf("cannot close: redis is not initialized")
    }
    return r.client.Close()
}

Here, we’ve followed the same modular pattern used across other components. We define a CacheService interface that can be easily mocked in unit tests, and a concrete RedisCache struct implementing basic caching functions along with Ping for health checks and Close for graceful shutdown. Finally, we provide a NewRedisCache() constructor to instantiate the cache service cleanly.

Implementation

Now we initialize our cache service in main.go:

package main

import (
  "os"
    "context"
    "os/signal"
    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/config"
    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/cache"
    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/logger"
)

func main() {
    ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt)
    defer stop()

    log := logger.NewZerologLogger("info", os.Stderr)

    cfg, err := config.NewConfig(log)
    if err != nil {
        log.Fatal(err.Error())
    }

    redisCache, err := cache.NewRedisCache(ctx, &cache.Opts{
        Config: cfg.Redis,
        Logger: log,
    })
    if err != nil {
        log.Fatal(err.Error())
    }

    //...Database, gRPC server etc

  <-ctx.Done()

  // Gracefully close redis client
    if err := redisCache.Close(); err != nil {
        log.Error("failed to close cache client", logger.Field{Key: "error", Value: err.Error()})
    }
}

This setup ensures that Redis starts alongside our other dependencies and closes gracefully when the service shuts down.

To integrate caching into our application logic, we’ll update the UserService.FindByID method. It will now first look for the user data in Redis before querying the database. If the record isn’t found in cache, it’s fetched from the database and then written back to Redis for subsequent requests.

Since we’ve added another dependency, we’ll extend the UserService constructor to accept the cache service:

package service

import (
    "gorm.io/gorm"
    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/cache"
    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/database"
)

type userService struct {
    database *gorm.DB
    cache    cache.CacheService
}

type UserServiceOpts struct {
    Database database.DatabaseService
    Cache    cache.CacheService
}

func NewUserService(opts *UserServiceOpts) UserService {
    return &userService{
        database: opts.Database.DB(),
        cache:    opts.Cache,
    }
}

Next, we update our gRPC server to pass the cache instance from main.go:

type Opts struct {
    Config   *config.GRPCServer
    Logger   logger.Logger
    Database database.DatabaseService
    Cache    cache.CacheService
}

func NewServer(opts *Opts) *GRPCServer {
    srv := grpc.NewServer(grpc.UnaryInterceptor(interceptor.LoggerInterceptor(opts.Logger)))

    helloworld.RegisterGreeterServer(srv, handler.NewGreeterServer(
        service.NewUserService(&service.UserServiceOpts{
            Database: opts.Database,
            Cache:    opts.Cache,
        }),
    ))

    return &GRPCServer{
        Server: srv,
        Config: opts.Config,
        Logger: opts.Logger,
    }
}

Finally, here’s how caching looks inside FindByID:

func (s *userService) FindByID(ctx context.Context, id uint) (*model.User, error) {
    cacheKey := fmt.Sprintf("user:%d", id)

    // Try cache
    if cached, err := s.cache.Get(ctx, cacheKey); err == nil && cached != "" {
        var u model.User

        if err := json.Unmarshal([]byte(cached), &u); err != nil {
            return nil, err
        }

        return &u, nil
    }

    u := &model.User{}
    if err := s.database.First(u, id).Error; err != nil {
        return nil, err
    }
    data, _ := json.Marshal(u)

    if err := s.cache.Set(ctx, cacheKey, data, time.Minute); err != nil { // cache for 1 minute
        return nil, err
    }

    return u, nil
}

Now, when you call the SayHello RPC, the user data will come from Redis on the second request (after being cached during the first). The cache expires after one minute, after which a new request will repopulate it. You can also log cache hits/misses to observe behavior.

Integration Test

Since we added caching to UserService, we’ll update our integration tests accordingly. To ensure realistic conditions, we’ll spin up a real Redis instance using Testcontainers. Let’s start by creating a helper function SetupRedis in testutils:

package testutils

import (
  "io"
  "fmt"
    "log"
    "time"
    "testing"
    "context"
    "github.com/stretchr/testify/require"
    "github.com/testcontainers/testcontainers-go"
    "github.com/testcontainers/testcontainers-go/modules/redis"
    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/cache"
    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/config"
    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/logger"
)

func SetupRedis(t *testing.T) cache.CacheService {
    ctx := context.Background()

    redisContainer, err := redis.Run(ctx,
        "redis:7.2",
        redis.WithSnapshotting(10, 1),
        redis.WithLogLevel(redis.LogLevelVerbose),
    )
    require.NoError(t, err)

    t.Cleanup(func() {
        if err := testcontainers.TerminateContainer(redisContainer); err != nil {
            log.Printf("failed to terminate container: %s", err)
        }
    })

    host, _ := redisContainer.Host(ctx)
    port, _ := redisContainer.MappedPort(ctx, "6379")

    redisCache, err := cache.NewRedisCache(ctx, &cache.Opts{
        Config: &config.Redis{
            Addr:         fmt.Sprintf("%s:%s", host, port.Port()),
            Password:     "",
            DB:           0,
            DialTimeout:  time.Second * 5,
            ReadTimeout:  time.Second * 3,
            WriteTimeout: time.Second * 3,
            PoolSize:     20,
            MinIdleConns: 5,
        },
        Logger: logger.NewZerologLogger("info", io.Discard),
    })
    require.NoError(t, err)

    return redisCache
}

And then update the FindByID test to include Redis:

package service_test

import (
    "context"
    "testing"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/service"
    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/database/model"
    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/tests/testutils"
)

func TestUserService_FindByID(t *testing.T) {
    db := testutils.SetupPostgres(t)
    redis := testutils.SetupRedis(t)

    // Seed test data
    u := &model.User{Name: "Alice", Email: "alice@example.com"}
    require.NoError(t, db.DB().Create(u).Error)

    userService := service.NewUserService(&service.UserServiceOpts{
        Database: db,
        Cache:    redis,
    })

    got, err := userService.FindByID(context.Background(), u.ID)
    require.NoError(t, err)

    assert.Equal(t, "Alice", got.Name)
    assert.Equal(t, "alice@example.com", got.Email)
}

For this test, we only need to confirm that FindByID behaves correctly — whether the data came from the database or cache isn’t important here. The goal is to ensure the service works seamlessly with both persistence and caching layers under test conditions.

Healthchecks

Service monitoring is critical in production environments. Although gRPC includes a built-in health check protocol, it only exposes a single RPC endpoint. This makes it less flexible if you want to differentiate between basic service liveness and full readiness to serve traffic.

In this project, we’ll implement two simple REST endpoints — /livez and /readyz — for clarity and flexibility.

We’ll start by defining a HealthService in internal/service/health.go. This service will be used by our /readyz endpoint to verify that all dependencies (like the database and cache) are healthy before declaring the service ready.

package service

import (
    "context"

    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/cache"
    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/database"
    "gorm.io/gorm"
)

type HealthService interface {
    Check(ctx context.Context) HealthStatus
    SetReady(ready bool)
}

type HealthStatus struct {
    Status  string            `json:"status"`
    Details map[string]string `json:"details,omitempty"`
}

type healthService struct {
    database *gorm.DB
    cache    cache.CacheService
    ready    atomic.Bool
}

type HealthServiceOpts struct {
    Database database.DatabaseService
    Cache    cache.CacheService
}

func NewHealthService(opts *HealthServiceOpts) HealthService {
    h := &healthService{
        database: opts.Database.DB(),
        cache:    opts.Cache,
    }
    h.ready.Store(true)
    return h
}

func (h *healthService) SetReady(ready bool) {
    h.ready.Store(ready)
}

func (h *healthService) Check(ctx context.Context) HealthStatus {
    if !h.ready.Load() {
        return HealthStatus{
            Status:  "unready",
            Details: map[string]string{"service": "shutting down"},
        }
    }

    status := HealthStatus{Status: "ready", Details: map[string]string{}}

    if err := h.database.Exec("SELECT 1").Error; err != nil {
        status.Status = "unready"
        status.Details["database"] = err.Error()
    } else {
        status.Details["database"] = "ok"
    }

    if err := h.cache.Ping(ctx); err != nil {
        status.Status = "unready"
        status.Details["cache"] = err.Error()
    } else {
        status.Details["cache"] = "ok"
    }

    return status
}

The Check method reports dependency health via /readyz. The SetReady method lets us manually toggle readiness — useful during startup and graceful shutdown.

Next, we’ll implement the REST handlers for /livez and /readyz inside internal/transports/http/server/handler/health.go:

package handler

import (
    "encoding/json"
    "net/http"

    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/logger"
    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/service"
)

type Opts struct {
    HealthService service.HealthService
    Logger        logger.Logger
}

type HealthHandler struct {
    healthService service.HealthService
    logger        logger.Logger
}

func NewHealthHandler(opts *Opts) *HealthHandler {
    return &HealthHandler{healthService: opts.HealthService, logger: opts.Logger}
}

func (h *HealthHandler) Livez(w http.ResponseWriter, r *http.Request) {
    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(http.StatusOK)

    writeJSON(w, map[string]string{"status": "ok"}, h.logger)
}

func (h *HealthHandler) Readyz(w http.ResponseWriter, r *http.Request) {
    status := h.healthService.Check(r.Context())
    w.Header().Set("Content-Type", "application/json")

    if status.Status == "ready" {
        w.WriteHeader(http.StatusOK)
    } else {
        w.WriteHeader(http.StatusServiceUnavailable)
    }

    writeJSON(w, status, h.logger)
}

//common JSON helper (can be separated in a file if you add more apis)
func writeJSON(w http.ResponseWriter, data any, log logger.Logger) {
    w.Header().Set("Content-Type", "application/json")
    if err := json.NewEncoder(w).Encode(data); err != nil {
        log.Error("failed to write JSON: " + err.Error())
    }
}

The /livez handler simply returns "ok" with an HTTP 200 — it’s only responsible for confirming that the service process is running.

The /readyz handler, on the other hand, uses the HealthService to confirm that all required dependencies (like the database and Redis) are operational.

Now, let’s wire up our HTTP server in internal/transports/http/server/server.go:

package server

import (
    "net"
    "net/http"

    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/cache"
    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/config"
    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/database"
    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/logger"
    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/service"
)

type Opts struct {
    Config   *config.HTTPServer
    Logger   logger.Logger
    Database database.DatabaseService
    Cache    cache.CacheService
    Health   service.HealthService
}

type HTTPServer struct {
    Config *config.HTTPServer
    Server *http.Server
    Logger logger.Logger
}

func NewServer(opts *Opts) *HTTPServer {
    mux := http.NewServeMux()

    healthHandler := handler.NewHealthHandler(&handler.Opts{
        HealthService: opts.Health,
        Logger:        opts.Logger,
    })

    mux.HandleFunc("/livez", healthHandler.Livez)
    mux.HandleFunc("/readyz", healthHandler.Readyz)

    return &HTTPServer{
        Config: opts.Config,
        Server: &http.Server{
            Addr:    opts.Config.URL,
            Handler: mux,
        },
        Logger: opts.Logger,
    }
}

func (h *HTTPServer) ServeListener(listener net.Listener) error {
    h.Logger.Info("HTTP server started", logger.Field{Key: "address", Value: listener.Addr().String()})
    if err := h.Server.Serve(listener); err != nil && err != http.ErrServerClosed {
        h.Logger.Error("HTTP server failed", logger.Field{Key: "error", Value: err.Error()})
        return err
    }
    return nil
}

func (h *HTTPServer) Serve() error {
    listener, err := net.Listen("tcp", h.Config.URL)
    if err != nil {
        h.Logger.Error("Failed to create HTTP listener",
            logger.Field{Key: "address", Value: h.Config.URL},
            logger.Field{Key: "error", Value: err.Error()},
        )
        return err
    }

    return h.ServeListener(listener)
}

This follows the same structure as our GRPCServer. We define both ServeListener (for custom listeners) and Serve (for normal TCP-based startup from configuration).

Next, update your configuration to fetch the HTTP server’s host and port from environment variables:

type Config struct {
    GRPCServer *GRPCServer `validate:"required"`
    HTTPServer *HTTPServer `validate:"required"`
    Database   *Database   `validate:"required"`
    Redis      *Redis      `validate:"required"`
}

//Define struct
type HTTPServer struct {
    URL string `validate:"required,hostname_port"`
}

//Load from env
func NewConfigWithOptions(opts LoaderOptions) (*Config, error) {
  //...

  cfg := &Config{
    HTTPServer: &HTTPServer{
            URL: getEnv("HTTP_SERVER_URL", ":4000"),
        },
    //...
  }

  //...
}

Finally, let’s initialize and start the HTTP server in main.go:

package main

func main() {
  ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt)
    defer stop()

  log := logger.NewZerologLogger("info", os.Stderr)

    cfg, err := config.NewConfig(log)
    if err != nil {
        log.Fatal(err.Error())
    }

  //...

    healthService := service.NewHealthService(&service.HealthServiceOpts{
        Database: db,
        Cache:    redisCache,
    })

  httpServer := httpserver.NewServer(&httpserver.Opts{
        Config:   cfg.HTTPServer,
        Logger:   log,
        Database: db,
        Cache:    redisCache,
        Metrics:  metricsService,
        Health:   healthService,
    })
    go func() {
        err = httpServer.Serve()
        if err != nil && !errors.Is(err, http.ErrServerClosed) {
            stop()
        }
    }()

  <-ctx.Done()

  // Marking the service as not ready (/readyz) during shutdown.
    healthService.SetReady(false)

  //shutdown grpc, db, redis etc

    // Shut down the health server last so it can continue responding to liveness checks (/livez).
  if err := httpServer.Server.Shutdown(ctx); err != nil {
        log.Error("failed to close http server", logger.Field{Key: "error", Value: err.Error()})
    }
}

Once everything is wired up, we can test the endpoints locally with curl:

Livez:

curl localhost:4000/livez

Expected response:

{ "status": "ok" }

Readyz:

curl localhost:4000/readyz

Expected response:

{ "status": "ready", "details": { "cache": "ok", "database": "ok" } }

With these two endpoints, you now have both basic and dependency-aware healthchecks that can be used by Kubernetes probes, load balancers, or monitoring tools.

Observability (Metrics and Tracing)

In Part 1, we added structured logging to capture what’s happening inside our services. Logs are great for understanding what happened in a specific instance — but in a distributed system, we also need to understand how things behave and how requests flow across multiple services.

That’s where metrics and tracing come in.

Metrics capture quantitative data about our services — like request counts, error rates, and latency. They’re lightweight, easy to visualize, and ideal for alerting and performance dashboards.
Tracing gives us a request-level view of our system — showing how a request moves between microservices and how much time is spent in each hop.

Together, logging, metrics, and tracing give us full observability into our system’s behavior. In this part, we’ll implement metrics and tracing, then spin up Prometheus, Grafana, and Jaeger to see them in action.

Prometheus Metrics

Prometheus is a popular open-source monitoring system that scrapes metrics from your services, stores them as time-series data, and makes them queryable through PromQL. We'll use the official prometheus/client_golang library to instrument our services and exposes metrics for scraping.

We’ll begin by creating a MetricsService inside internal/observability/metrics/metrics.go. This service will handle Prometheus setup, expose default metrics, and provide a simple way to register custom collectors.

package metrics

import (
    "net/http"

    "github.com/prometheus/client_golang/prometheus"
    promcollectors "github.com/prometheus/client_golang/prometheus/collectors"
    "github.com/prometheus/client_golang/prometheus/promhttp"
    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/config"
)

type MetricsService interface {
    Register(collectors ...MetricCollector)
    RegisterDefault()
    Handler() http.Handler
}

type metricsService struct {
    Config   *config.Metrics
    Registry *prometheus.Registry
}

func NewMetricsService(cfg *config.Metrics, collectors ...MetricCollector) MetricsService {
    registry := prometheus.NewRegistry()

    m := &metricsService{
        Registry: registry,
        Config:   cfg,
    }

    if cfg.EnableDefaultMetrics {
        m.RegisterDefault()
    }
    m.Register(collectors...)

    return m
}

func (m *metricsService) Register(collectors ...MetricCollector) {
    for _, c := range collectors {
        c.Register(m.Registry)
    }
}

func (m *metricsService) RegisterDefault() {
    m.Registry.MustRegister(
        promcollectors.NewGoCollector(),
        promcollectors.NewProcessCollector(promcollectors.ProcessCollectorOpts{}),
    )
}

func (m *metricsService) Handler() http.Handler {
    return promhttp.HandlerFor(
        m.Registry,
        promhttp.HandlerOpts{},
    )
}

To keep metrics modular, we define a small MetricCollector interface. Each component — like gRPC, HTTP, or Redis — can implement this interface to expose its own metrics while keeping code organized.

type MetricCollector interface {
    Register(r *prometheus.Registry)
}

Next, we add gRPC-specific metrics inside a new metrics/grpc.go file. This includes counters and histograms for request counts, and latency — useful for tracking performance trends across gRPC calls.

package metrics

import "github.com/prometheus/client_golang/prometheus"

var (
    GRPCRequestCounter = prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "grpc_requests_total",
            Help: "Total number of gRPC requests received, labeled by method and status.",
        },
        []string{"method", "status"},
    )

    GRPCRequestLatency = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "grpc_request_duration_seconds",
            Help:    "Histogram of gRPC request latencies (seconds).",
            Buckets: prometheus.DefBuckets,
        },
        []string{"method"},
    )
)

type GRPCMetrics struct{}

func (GRPCMetrics) Register(r *prometheus.Registry) {
    r.MustRegister(GRPCRequestCounter, GRPCRequestLatency)
}

We then create a gRPC interceptor that automatically records these metrics for every incoming request. This way, we don’t need to manually instrument every handler — metrics are collected transparently as part of the request lifecycle.

package interceptor

import (
    "context"
    "time"

    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/observability/metrics"
    "google.golang.org/grpc"
    "google.golang.org/grpc/status"
)

func MetricsInterceptor() grpc.UnaryServerInterceptor {
    return func(
        ctx context.Context,
        req interface{},
        info *grpc.UnaryServerInfo,
        handler grpc.UnaryHandler,
    ) (interface{}, error) {
        start := time.Now()
        res, err := handler(ctx, req)

        method := info.FullMethod
        statusCode := status.Code(err).String()
        metrics.GRPCRequestCounter.WithLabelValues(method, statusCode).Inc()
        metrics.GRPCRequestLatency.WithLabelValues(method).Observe(time.Since(start).Seconds())

        return res, err
    }
}

We then attach the interceptor to gRPC server via grpc.ChainUnaryInterceptor:

    srv := grpc.NewServer(
        grpc.ChainUnaryInterceptor(
            interceptor.LoggerInterceptor(opts.Logger),
            interceptor.MetricsInterceptor(),
        ),
  )

Once metrics are collected, we need a way for Prometheus to access them. We expose a /metrics endpoint in our HTTP server, which Prometheus will scrape periodically.

package server

type Opts struct {
    Config   *config.HTTPServer
    Logger   logger.Logger
    Database database.DatabaseService
    Cache    cache.CacheService
    Metrics  metrics.MetricsService
}

func NewServer(opts *Opts) *HTTPServer {
    mux := http.NewServeMux()

    healthService := service.NewHealthService(&service.HealthServiceOpts{
        Database: opts.Database,
        Cache:    opts.Cache,
    })
    healthHandler := handler.NewHealthHandler(&handler.Opts{
        HealthService: healthService,
        Logger:        opts.Logger,
    })

    mux.HandleFunc("/livez", healthHandler.Livez)
    mux.HandleFunc("/readyz", healthHandler.Readyz)
    mux.Handle("/metrics", opts.Metrics.Handler())

    return &HTTPServer{
        Config: opts.Config,
        Server: &http.Server{
            Addr:    opts.Config.URL,
            Handler: mux,
        },
        Logger: opts.Logger,
    }
}

Now Let's register metrics to HTTPServer in main.go:

package main

import (...)

func main() {
  //...

    metricsService := metrics.NewMetricsService(cfg.Metrics, metrics.GRPCMetrics{})

    httpServer := httpserver.NewServer(&httpserver.Opts{
        Config:   cfg.HTTPServer,
        Logger:   log,
        Database: db,
        Cache:    redisCache,
        Metrics:  metricsService,
    })
    go func() {
        err = httpServer.Serve()
        if err != nil && !errors.Is(err, http.ErrServerClosed) {
            stop()
        }
    }()
}

Finally, we update our configuration to include a METRICS_ENABLE_DEFAULT_METRICS option. This flag lets us control whether we also expose Go’s built-in runtime metrics like garbage collection and goroutine counts — useful in production for monitoring resource usage.

type Metrics struct {
    EnableDefaultMetrics bool
}

func NewConfigWithOptions(opts LoaderOptions) (*Config, error) {
  cfg := &Config{
    Metrics: &Metrics{
            EnableDefaultMetrics: getEnvBool("METRICS_ENABLE_DEFAULT_METRICS", false),
        },
  }
}

//New helper for parsing boolean values
func getEnvBool(key string, defaultVal bool) bool {
    if val, err := strconv.ParseBool(os.Getenv(key)); err == nil {
        return val
    }

    return defaultVal
}

With this in place, our services now export rich, structured metrics that can be scraped by Prometheus and visualized in Grafana. These metrics will serve as the backbone for our dashboards and alerts once we deploy to production.

OpenTelemetry Tracing

Tracing is the third pillar of observability, alongside metrics and logs. While metrics tell you how your system behaves overall, and logs describe what happened at a specific moment, tracing shows you how a request flows across services. It’s especially valuable in microservice architectures where a single client request might touch multiple services, databases, or queues before completing.

We’ll be using OpenTelemetry — a popular open-source standard for collecting distributed traces — and connect it with Jaeger for visualization. You can also integrate it later with other backends like Zipkin, Tempo, or AWS X-Ray if needed.

Let’s start by defining our TracerService in internal/observability/tracing/tracing.go:

package tracing

import (
    "context"
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/propagation"
    "go.opentelemetry.io/otel/sdk/resource"
    "go.opentelemetry.io/otel/exporters/otlp/otlptrace"
    "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/config"
    "github.com/sagarmaheshwary/go-microservice-boilerplate/internal/logger"
    sdktrace "go.opentelemetry.io/otel/sdk/trace"
    semconv "go.opentelemetry.io/otel/semconv/v1.24.0"
)

type Opts struct {
    Config *config.Tracing
    Logger logger.Logger
}

type TracerService struct {
    Config *config.Tracing
    Logger logger.Logger
    tp     *sdktrace.TracerProvider
}

func NewTracerService(ctx context.Context, opts *Opts) (*TracerService, error) {
    cfg := opts.Config

    exporter, err := otlptrace.New(
        ctx,
        otlptracehttp.NewClient(
            otlptracehttp.WithEndpoint(cfg.CollectorURL),
            otlptracehttp.WithInsecure(),
        ),
    )
    if err != nil {
        return nil, err
    }

    res, err := resource.New(ctx,
        resource.WithAttributes(semconv.ServiceName(cfg.ServiceName)),
    )
    if err != nil {
        return nil, err
    }

    tp := sdktrace.NewTracerProvider(
        sdktrace.WithBatcher(exporter),
        sdktrace.WithResource(res),
    )

    otel.SetTracerProvider(tp)
    otel.SetTextMapPropagator(propagation.TraceContext{})

    opts.Logger.Info("Tracing initialized (exporter: otlptracehttp)",
        logger.Field{Key: "serviceName", Value: cfg.ServiceName},
    )

    return &TracerService{
        Config: cfg,
        Logger: opts.Logger,
        tp:     tp,
    }, nil
}

func (t *TracerService) Shutdown(ctx context.Context) error {
    return t.tp.Shutdown(ctx)
}

Next, we’ll update our configuration to include tracing settings such as service name and exporter in config.go:

type Tracing struct {
    ServiceName  string `validate:"required"`
    CollectorURL string `validate:"required,hostname_port"`
}

func NewConfigWithOptions(opts LoaderOptions) (*Config, error) {
    cfg := &Config{
        Tracing: &Tracing{
            ServiceName:  getEnv("TRACING_SERVICE_NAME", "go-microservice-boilerplate"),
            CollectorURL: getEnv("TRACING_COLLECTOR_URL", "localhost:4318"),
        },
        //...
    }
}

Now that the config is ready, we can initialize the tracer in main.go so that every part of our application has access to it:

func main() {
    ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt)
    defer stop()

    //...

    tracerService, err := tracing.NewTracerService(ctx, &tracing.Opts{
        Config: cfg.Tracing,
        Logger: log,
    })
    if err != nil {
        log.Fatal(err.Error())
    }

    //...

    <-ctx.Done()

    //gracefully shutdown the client
    if err := tracerService.Shutdown(ctx); err != nil {
        log.Error("failed to close tracing client", logger.Field{Key: "error", Value: err.Error()})
    }
}

To propagate trace context across service boundaries, we add a gRPC StatsHandler. This ensures that when a request comes into our service, it either starts a new trace or continues an existing one passed from another service:

srv := grpc.NewServer(
    grpc.ChainUnaryInterceptor(
        interceptor.LoggerInterceptor(opts.Logger),
        interceptor.MetricsInterceptor(),
    ),
    grpc.StatsHandler(otelgrpc.NewServerHandler(
        otelgrpc.WithTracerProvider(otel.GetTracerProvider()),
        otelgrpc.WithPropagators(otel.GetTextMapPropagator()),
    )),
)

From now on, every gRPC call will automatically create a trace that’s sent to Jaeger.

Jaeger UI: http://localhost:16686/search

Each trace is composed of multiple spans — where a trace represents the entire journey of a request (for example, a user calling your API), and spans represent individual operations within that request (like “fetch user from DB” or “send email”).

To demonstrate this, let’s add spans to our SayHello handler and the UserService. Each span will capture the work done by that specific function, and together they’ll form a full picture of a single request’s flow through the system.

func (g *GreeterServer) SayHello(ctx context.Context, in *helloworld.SayHelloRequest) (*helloworld.SayHelloResponse, error) {
    tr := otel.Tracer("hello_world.Greeter")
    ctx, span := tr.Start(ctx, "helloworld.SayHello")
    defer span.End()

    //...
}

func (s *userService) FindByID(ctx context.Context, id uint) (*model.User, error) {
    tr := otel.Tracer("UserService")
    ctx, span := tr.Start(ctx, "FindByID")
    span.SetAttributes(attribute.String("UserId", strconv.Itoa(int(id))))
    defer span.End()

    //...
}

When you look at the trace in Jaeger, you’ll see a clear chain of execution — the main request trace leading into the SayHello span, which then calls into the UserService span.

The SetAttributes method lets you attach useful context (like user IDs, order IDs, or error states) that can make debugging much easier when viewing a trace.

Context propagation here is critical — notice that we always pass ctx forward. This context carries the traceID created by our gRPC StatsHandler, ensuring that every downstream function and service is linked to the same request chain.

End to End Observability in Action

The repo comes with a ready-to-use observability setup using Docker Compose, available in the examples branch. If you want to follow along, clone the repo and switch to that branch.

We’ll first bring up our core application stack — which includes the service itself, PostgreSQL, and Redis:

docker compose up

Once the application is running, we can start the observability stack. This will spin up Grafana, Prometheus, and Jaeger — all preconfigured to work with our service.

docker compose -f docker-compose.observability.yml up

Jaeger runs via the jaeger/all-in-one image using its default in-memory storage to keep things lightweight and simple. Prometheus uses a basic scrape configuration that pulls metrics from our service’s /metrics endpoint every eight seconds:

global:
  scrape_interval: 8s

scrape_configs:
  - job_name: go-microservice-boilerplate
    static_configs:
      - targets:
          - "go-microservice-boilerplate:4000"

Grafana comes with a sample dashboard already wired up to Prometheus, showcasing the gRPC metrics we defined earlier. Open a browser and navigate to http://localhost:3000 (default credentials are admin / admin).

The dashboard has four panels visualizing data from the grpc_requests_total and grpc_request_duration_seconds metrics:

Average gRPC Request Duration — time series
gRPC Request Latency (95th Percentile) — time series
gRPC Requests per Method — time series
Total gRPC Requests — counter/gauge

This sample dashboard offers a quick glimpse into request throughput and latency trends — perfect for demonstration and local testing. In a real production environment, you’d typically design a more comprehensive dashboard tailored to your service’s specific KPIs, error rates, and resource utilization patterns.

Now that we’ve seen metrics in action, let’s turn to tracing. We already explored what a trace looks like for a single service, but the real power of tracing emerges when requests flow through multiple microservices. Distributed traces help you visualize the entire request path, pinpoint slow hops, and quickly identify where failures occur.

To demonstrate distributed tracing, let’s spin up a second instance of our service that will act as another microservice in the request chain. Clone the same repository inside your project directory, switch to the examples branch, and create an environment file:

git clone https://github.com/SagarMaheshwary/go-microservice-boilerplate.git go-microservice-boilerplate2
cd go-microservice-boilerplate2
git checkout examples
cp .env.example .env

Now let’s add this new service to our main docker-compose.yml file:

services:
  app2:
    build:
      context: ./go-microservice-boilerplate2
      target: development
    container_name: go-microservice-boilerplate2
    ports:
      - 4001:4000
      - 5001:5000
    volumes:
      - ./go-microservice-boilerplate2:/app
    networks:
      - microservices_net
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy

Next, open the SayHello() handler of our main service and replace it with the following code.
This is just for demonstration — in a real-world setup, you would create a dedicated client inside transports/grpc/client and reuse it across the service instead of creating a new connection for every request (since gRPC clients are long-lived):

func (g *GreeterServer) SayHello(ctx context.Context, in *helloworld.SayHelloRequest) (*helloworld.SayHelloResponse, error) {
    tr := otel.Tracer("hello_world.Greeter")
    ctx, span := tr.Start(ctx, "helloworld.SayHello")
    defer span.End()

    conn, err := grpc.NewClient(
        "go-microservice-boilerplate2:5000",
        grpc.WithTransportCredentials(insecure.NewCredentials()),
        grpc.WithStatsHandler(otelgrpc.NewClientHandler()),
    )
    if err != nil {
        return nil, status.Error(codes.Internal, err.Error())
    }
    defer conn.Close()

    greeter := helloworld.NewGreeterClient(conn)
    res, err := greeter.SayHello(ctx, &helloworld.SayHelloRequest{UserId: in.UserId})
    if err != nil {
        return nil, status.Error(codes.Internal, err.Error())
    }

    return res, nil
}

Note: In the internal/tracing/tracing.go file, make sure to include otel.SetTextMapPropagator(propagation.TraceContext{}). This line is missing from the part-3 source, and without it, traces between multiple services won’t connect properly. You can use either the master or examples branch, as both already include this fix.

Now, let’s trigger a request from the first service, which will internally call the second service. We’ll use grpcurl for this:

grpcurl -d '{"user_id": 1}' -proto ./proto/hello_world/hello_world.proto -plaintext localhost:5000 hello_world.Greeter/SayHello

Open Jaeger in your browser, and you should now see a single trace spanning across two services — one initiating the request, and the other handling it downstream:

Clicking into the trace reveals the chain of spans showing each step of the process:

This connected trace gives a clear view of how the request travels through the system — from the first service’s SayHello() call, to the gRPC client invocation, and finally to the second service’s SayHello() and its database lookup via UserService.

And that wraps up our Observability section.
We’ve now covered all three pillars — logging, metrics, and tracing — giving you complete visibility into your services. With this setup, you can detect issues early, measure system health, and trace requests across microservices, building a solid foundation for a production-grade monitoring stack.

Wrapping Up the Series

In this final part, we extended our microservice with Redis integration, health checks, and full observability using Prometheus metrics and OpenTelemetry tracing. We visualized key performance metrics in Grafana and explored distributed tracing with Jaeger — connecting multiple services to see how requests flow end-to-end.

With this, our Designing Production-Ready Microservices in Go series comes to a close. You now have a solid foundation that combines clean project structure, containerization, configuration management, database integration, service-to-service communication, and observability — everything you need to kickstart a production-grade Go microservice project.

The repository is meant to be a reusable boilerplate for spinning up new Go microservices — giving you a clean, production-ready starting point for any project.

The master branch contains a clean, production-ready setup, while the examples branch includes all the working examples and demos featured throughout the series.

You can check out the complete code for Part Three here.

Thanks for reading! I hope you found this series helpful. If you have any questions, feedback, or suggestions, feel free to drop a comment — I’d love to hear your thoughts.

DEV Community