Rost

Posted on Mar 28 • Originally published at glukhov.org

Structured Logging in Go with slog for Observability and Alerting

#logging #observability #monitoring #dev

Logs are a debugging interface you can still use when the system is on fire.
The problem is that plain text logs age poorly: as soon as you need filtering,
aggregation, and alerting, you start parsing sentences.

Structured logging is the antidote.
It turns every log line into a small event with stable fields, so tools can search and aggregate reliably.
For how logs connect to metrics, dashboards, and alerting in the wider stack, see the Observability: Monitoring, Metrics, Prometheus & Grafana Guide.

What structured logging is and why it scales

Structured logging is logging where a record is not just a string, but a
message plus typed key value attributes. The idea is boring in the best way:
once logs are machine-readable, an incident stops being a grep contest.

A quick comparison:

Plain text (human-first, tool-hostile)

failed to charge card user=42 amount=19.99 ms=842 err=timeout

Structured (tool-first, still readable)

{"msg":"failed to charge card","user_id":42,"amount":19.99,"duration_ms":842,"error":"timeout"}

In production, it helps to think of logs as an event stream emitted by the
process, while routing and storage live outside the application. That mental
model pushes you toward writing one event per line and keeping events easy to
ship and re-process.

Slog in Go as a shared logging front end

Go has had the classic log package since forever, but modern services need
levels and fields. The log/slog package (Go 1.21 and later) brings structured
logging into the standard library and formalises a common shape for log
records: time, level, message, and attributes.
For a compact language and command refresher alongside this guide, see the Go Cheatsheet.

The key parts of the model are:

Record

A record is what happened. In slog terms, it contains time, level, message,
and a set of attributes. You create records via methods like Info and Error,
or via Log when you want to supply the level explicitly.

Attributes

Attributes are the key value pairs that make logs queryable. If you log the
same concept under three different keys (user, userId, uid), you get three
different datasets. Consistent keys are where the real value hides.

Handler

A handler is how records become bytes. The built-in TextHandler writes
key=value output, while JSONHandler writes line-delimited JSON. Handlers are
also where redaction, key renaming, and output routing tend to happen.

One under-rated feature is that slog can sit in front of existing code. When
you set a default slog logger, top-level slog functions use it, and the classic
log package can be redirected to it too. That makes incremental migration
possible.

Groups

Groups solve the "every subsystem uses id" problem. You can group a set of
attributes for a request (request.method, request.path) or namespace an entire
subsystem with WithGroup so keys do not collide.

A production shaped slog setup

The following setup hits the usual goals.
The examples use a small logx package; for where packages like that usually live in a real module, see Go Project Structure: Practices & Patterns.

one JSON event per line
logs written to stdout for collection
stable service metadata attached once
context-aware logging for request and trace IDs
central redaction for sensitive keys

package logx

import (
    "log/slog"
    "os"
)

var level slog.LevelVar // defaults to INFO

func New() *slog.Logger {
    opts := &slog.HandlerOptions{
        Level:     &level, // can be changed at runtime
        AddSource: true,   // include file and line when available
        ReplaceAttr: func(groups []string, a slog.Attr) slog.Attr {
            // Centralised redaction: consistent and hard to bypass by accident.
            switch a.Key {
            case "password", "token", "authorization", "api_key":
                return slog.String(a.Key, "[redacted]")
            }
            return a
        },
    }

    h := slog.NewJSONHandler(os.Stdout, opts)

    return slog.New(h).With(
        "service", os.Getenv("SERVICE_NAME"),
        "env", os.Getenv("ENV"),
        "version", os.Getenv("VERSION"),
    )
}

func SetLevel(l slog.Level) { level.Set(l) }

A tiny detail with large consequences: the built-in JSON handler uses standard
keys (time, level, msg, source). When your log backend expects a different
schema, ReplaceAttr is the pressure-release valve that lets you normalise keys
without rewriting call sites.

Schema matters more than the logger

Most "structured logging" failures are schema failures.

Essential fields that keep paying rent

Every log backend will store a timestamp, level, and message. In practice, a
useful application schema often adds a small set of stable fields:

service, env, version
component (or subsystem)
event (a stable name for the thing that happened)
request_id (when a request exists)
trace_id and span_id (when tracing exists)
error (string) and error_kind (stable bucket)

Notice the pattern: these fields answer operational questions, not developer
curiosity.

Semantic conventions are a cheap consistency hack

If you already use OpenTelemetry, its semantic conventions provide a standard
vocabulary for attributes across telemetry signals. Even if you do not export
logs via OpenTelemetry, borrowing attribute names reduces the "what did we call
this field in service B" tax.

High cardinality and why logs get expensive

High cardinality means "too many unique values". It is fine inside a JSON
payload, but it becomes painful when a backend treats some fields as indexed
labels or stream keys. User IDs, IP addresses, random request tokens, and full
URLs tend to explode combinations.

The practical outcome is simple: keep labels and index keys boring (service,
environment, region), and keep high-cardinality fields inside the structured
payload for filtering at query time.

Correlation with request IDs and traces

Correlation is the point where logs stop being just text and start behaving
like telemetry.

Request ID as the lowest-friction correlation key

A request ID is the simplest bridge between an incoming request and everything
that happens because of it. It tends to work even without distributed tracing,
and it is still useful when traces are sampled.

A common pattern is to attach a per-request logger to the context:

package logx

import (
    "context"
    "log/slog"
)

type ctxKey struct{}

func WithLogger(ctx context.Context, l *slog.Logger) context.Context {
    return context.WithValue(ctx, ctxKey{}, l)
}

func FromContext(ctx context.Context) *slog.Logger {
    if l, ok := ctx.Value(ctxKey{}).(*slog.Logger); ok && l != nil {
        return l
    }
    return slog.Default()
}

Trace correlation with W3C Trace Context and OpenTelemetry

W3C Trace Context defines a standard way to propagate trace identity (for HTTP,
via traceparent and tracestate). OpenTelemetry builds on that so trace IDs and
span IDs can be extracted from context.

This middleware example logs both request_id and trace identifiers when
available:

package middleware

import (
    "crypto/rand"
    "encoding/hex"
    "net/http"

    "go.opentelemetry.io/otel/trace"
    "log/slog"

    "example.com/project/logx"
)

func requestID() string {
    var b [16]byte
    _, _ = rand.Read(b[:])
    return hex.EncodeToString(b[:])
}

func WithRequestLogger(base *slog.Logger) func(http.Handler) http.Handler {
    return func(next http.Handler) http.Handler {
        return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
            rid := r.Header.Get("X-Request-Id")
            if rid == "" {
                rid = requestID()
            }

            l := base.With(
                "request_id", rid,
                "method", r.Method,
                "path", r.URL.Path,
            )

            if sc := trace.SpanContextFromContext(r.Context()); sc.IsValid() {
                l = l.With(
                    "trace_id", sc.TraceID().String(),
                    "span_id", sc.SpanID().String(),
                )
            }

            ctx := logx.WithLogger(r.Context(), l)
            next.ServeHTTP(w, r.WithContext(ctx))
        })
    }
}

Once correlation fields exist, the log line becomes an index into other data.
The difference in a live incident is not subtle.

Turning structured logs into monitoring and alerting signals

Logs are great at answering "what happened". Alerting is usually about "how
often and how bad".

A practical approach is to treat certain log events as counters:

event=payment_failed
event=db_timeout
event=cache_miss

Many platforms can derive log-based metrics by counting matching records over a
window. Structured logs make that count resilient, because it is based on a
field value rather than a brittle text match.
When you are ready to visualise and explore those signals, Install and Use Grafana on Ubuntu: Complete Guide walks through a full Grafana setup you can point at common log and metrics backends.

This is also where log levels start to matter. Debug logs are often valuable,
but they are also where cost and noise hides. Using a dynamic level (LevelVar)
lets the system stay quiet by default, while still allowing targeted detail
when needed.

Closing thoughts

Structured logging in Go is no longer a library debate. The interesting part is
whether your log records are consistent, correlatable, and affordable to store.

When your logs carry stable fields like event, request_id, and trace_id, they
stop being "strings someone wrote" and start being a dataset you can operate.

Notes

The Go team introduced log/slog in Go 1.21 and emphasised that structured logs
use key-value pairs so they can be parsed, filtered, searched, and analysed
reliably, and also noted the motivation of providing a common framework shared
across the ecosystem.

The log/slog package documentation defines the record model (time, level,
message, key-value pairs) and the built-in handlers (TextHandler for key=value
and JSONHandler for line-delimited JSON), and documents SetDefault integration
with the classic log package.

For distributed correlation, the W3C Trace Context specification standardises
traceparent and tracestate propagation, and OpenTelemetry specifies that its
SpanContext conforms to W3C Trace Context and exposes TraceId and SpanId,
making log-trace correlation straightforward when a span is present.

For log storage cost and performance, Grafana Loki documentation strongly
recommends bounded, static labels and warns about high cardinality labels
creating too many streams and a huge index, which is directly relevant when
deciding what becomes a label vs what stays as an unindexed JSON field.

DEV Community