Temitope

Posted on May 11

Structured Logging in Python With Structlog: Correlating Logs, Traces, and Errors in Production

#python #structlog #logging #observability

Most Python applications log like this:

logger.info(f"Processing order {order_id} for user {user_id}")
logger.error(f"Payment failed for order {order_id}: {str(e)}")

It works fine in development. In production, it falls apart.

When you're debugging a latency spike at 2am, grep-ing through gigabytes of unstructured log strings is painful. When you want to correlate a log line with the trace that generated it, you can't — the trace ID isn't in the log. When you want to count how many payment failures happened for a specific product in the last hour, you can't — the data is buried in a string.

Structured logging fixes all of this. Instead of log strings, you emit log events — dictionaries of key-value pairs that your log aggregator can index, query, and alert on.

In this article, we'll build a production-grade structured logging setup using Structlog, connect it to OpenTelemetry traces, and establish the patterns that make logs actually useful when something goes wrong.

Why Structlog Over Python's Built-in Logging

Python's standard logging module can emit JSON, but it requires significant configuration and produces output that's awkward to work with. Structlog takes a different approach — it makes structured, context-rich logging the default rather than something you have to force out of the standard library.

The core difference is how context is handled. With standard logging, you manually include context in every log call:

# Standard logging — context repeated everywhere
logger.info(f"Order created: order_id={order_id}, user_id={user_id}, product_id={product_id}")
logger.info(f"Payment processed: order_id={order_id}, user_id={user_id}, amount={amount}")
logger.info(f"Email sent: order_id={order_id}, user_id={user_id}, template=confirmation")

With Structlog, you bind context once and it automatically appears on every subsequent log event in that request:

# Structlog — bind once, appears everywhere
log = structlog.get_logger().bind(order_id=order_id, user_id=user_id)
log.info("order_created")
log.info("payment_processed", amount=amount)
log.info("email_sent", template="confirmation")

This isn't just cleaner — it means you can never accidentally forget to include the order ID on a critical log line.

Installing Dependencies

pip install structlog
pip install opentelemetry-sdk
pip install opentelemetry-api
pip install python-json-logger  # For JSON output in production

Project Structure

structured-logging/
├── logging_config.py   # Structlog setup
├── middleware.py        # Request context injection
├── main.py             # FastAPI application
└── services.py         # Business logic with logging

Step 1: Configuring Structlog

The configuration is where most of the important decisions happen. We want different behavior in development (readable, colored output) and production (JSON, machine-parseable).

logging_config.py

import logging
import sys
import structlog
from typing import Any


def add_trace_context(
    logger: Any, method: str, event_dict: dict
) -> dict:
    """
    Inject the current OpenTelemetry trace and span ID into every log event.
    This is the bridge between logs and traces.
    """
    from opentelemetry import trace

    current_span = trace.get_current_span()
    span_context = current_span.get_span_context()

    if span_context.is_valid:
        # Format as hex strings matching the W3C traceparent format
        event_dict["trace_id"] = format(span_context.trace_id, "032x")
        event_dict["span_id"] = format(span_context.span_id, "016x")
        event_dict["trace_sampled"] = span_context.trace_flags.sampled

    return event_dict


def add_log_level(
    logger: Any, method: str, event_dict: dict
) -> dict:
    """Add log level as a string field."""
    event_dict["level"] = method.upper()
    return event_dict


def configure_logging(environment: str = "production") -> None:
    """
    Configure Structlog for the given environment.
    Call this once at application startup.
    """

    # Shared processors that run in both development and production
    shared_processors = [
        structlog.contextvars.merge_contextvars,       # Merge request-scoped context
        structlog.processors.add_log_level,            # Add level field
        structlog.processors.TimeStamper(fmt="iso"),   # ISO 8601 timestamps
        add_trace_context,                             # Inject trace/span IDs
        structlog.processors.StackInfoRenderer(),      # Include stack info if present
        structlog.processors.format_exc_info,          # Format exceptions inline
    ]

    if environment == "development":
        # Human-readable output for local development
        structlog.configure(
            processors=shared_processors + [
                structlog.dev.ConsoleRenderer(colors=True),
            ],
            wrapper_class=structlog.make_filtering_bound_logger(logging.DEBUG),
            context_class=dict,
            logger_factory=structlog.PrintLoggerFactory(),
            cache_logger_on_first_use=True,
        )
    else:
        # JSON output for production — parseable by log aggregators
        structlog.configure(
            processors=shared_processors + [
                structlog.processors.dict_tracebacks,      # Structured tracebacks
                structlog.processors.JSONRenderer(),        # JSON output
            ],
            wrapper_class=structlog.make_filtering_bound_logger(logging.INFO),
            context_class=dict,
            logger_factory=structlog.PrintLoggerFactory(),
            cache_logger_on_first_use=True,
        )

    # Also configure standard library logging to route through Structlog
    # This captures logs from third-party libraries (SQLAlchemy, httpx, etc.)
    logging.basicConfig(
        format="%(message)s",
        stream=sys.stdout,
        level=logging.INFO,
    )

The add_trace_context processor is the most important piece here. Every log event — whether it comes from your code or a third-party library — automatically gets the current trace ID and span ID appended. This means you can take any log line from production and immediately jump to the full trace in your observability backend, or take a trace and find every log line that was emitted during that request.

Step 2: Request-Scoped Context With Middleware

One of Structlog's most powerful features is context variables — thread-local (or async-local) storage that holds log context for the duration of a request. Set it once in middleware, and it appears on every log event that request generates.

middleware.py

import uuid
import time
import structlog
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.requests import Request
from starlette.responses import Response

logger = structlog.get_logger()


class LoggingMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next) -> Response:
        # Clear any context from previous requests
        # This is critical — without this, context leaks between requests
        structlog.contextvars.clear_contextvars()

        # Generate a unique request ID for correlating all logs in this request
        request_id = str(uuid.uuid4())

        # Bind request-scoped context that will appear on all log events
        structlog.contextvars.bind_contextvars(
            request_id=request_id,
            method=request.method,
            path=request.url.path,
            client_ip=request.client.host if request.client else None,
        )

        start_time = time.perf_counter()

        # Log the incoming request
        logger.info("request_started")

        try:
            response = await call_next(request)
            duration_ms = (time.perf_counter() - start_time) * 1000

            # Log the completed request with response details
            logger.info(
                "request_completed",
                status_code=response.status_code,
                duration_ms=round(duration_ms, 2),
            )

            # Add request ID to response headers for client-side correlation
            response.headers["X-Request-ID"] = request_id
            return response

        except Exception as e:
            duration_ms = (time.perf_counter() - start_time) * 1000
            logger.error(
                "request_failed",
                duration_ms=round(duration_ms, 2),
                exc_info=True,
            )
            raise

The clear_contextvars() call at the start of every request is easy to forget and critical to include. Without it, context from one request can bleed into the next request handled by the same worker — you'd see logs from request B carrying the request ID of request A.

Step 3: Business Logic Logging

With the infrastructure in place, logging in your business logic is clean and context-rich by default.

services.py

import structlog
from opentelemetry import trace

logger = structlog.get_logger()
tracer = trace.get_tracer("order-service")


class OrderService:
    async def create_order(self, user_id: str, product_id: str, quantity: int) -> dict:
        # Bind order-specific context for all logs in this operation
        log = logger.bind(
            user_id=user_id,
            product_id=product_id,
            quantity=quantity,
        )

        log.info("order_creation_started")

        with tracer.start_as_current_span("validate_order") as span:
            if quantity <= 0:
                # Structured error — quantity is a queryable field
                log.warning(
                    "order_validation_failed",
                    reason="invalid_quantity",
                    quantity=quantity,
                )
                span.set_attribute("validation.error", "invalid_quantity")
                raise ValueError("Quantity must be positive")

            log.debug("order_validation_passed")

        with tracer.start_as_current_span("process_payment") as span:
            try:
                payment_result = await self._process_payment(user_id, product_id, quantity)
                log.info(
                    "payment_processed",
                    payment_id=payment_result["payment_id"],
                    amount=payment_result["amount"],
                    currency=payment_result["currency"],
                )
                span.set_attribute("payment.id", payment_result["payment_id"])
                span.set_attribute("payment.amount", payment_result["amount"])

            except PaymentDeclinedError as e:
                log.warning(
                    "payment_declined",
                    decline_reason=e.reason,
                    decline_code=e.code,
                )
                span.record_exception(e)
                raise

            except ExternalServiceError as e:
                log.error(
                    "payment_service_unavailable",
                    service="payment-gateway",
                    exc_info=True,
                )
                span.record_exception(e)
                raise

        order_id = f"order_{user_id}_{product_id}"
        log.info("order_created", order_id=order_id)

        return {"order_id": order_id, "status": "created"}

    async def _process_payment(self, user_id: str, product_id: str, quantity: int) -> dict:
        # Simulate payment processing
        return {
            "payment_id": f"pay_{user_id}",
            "amount": quantity * 29.99,
            "currency": "USD",
        }

Notice that different exceptions get different log levels with different context:

PaymentDeclinedError is a warning — it's expected behavior, not a system failure
ExternalServiceError is an error with exc_info=True — the payment gateway is down, which needs immediate attention

This distinction matters in production. If everything is logged as error, alerts become noise and get ignored. Reserving error for genuine system failures means every error alert deserves immediate attention.

Step 4: Wiring It Into FastAPI

main.py

import os
from contextlib import asynccontextmanager
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import structlog

from logging_config import configure_logging
from middleware import LoggingMiddleware
from services import OrderService

logger = structlog.get_logger()


@asynccontextmanager
async def lifespan(app: FastAPI):
    # Configure logging at startup
    environment = os.environ.get("ENVIRONMENT", "production")
    configure_logging(environment=environment)
    logger.info("application_started", environment=environment)
    yield
    logger.info("application_stopping")


app = FastAPI(lifespan=lifespan)
app.add_middleware(LoggingMiddleware)

order_service = OrderService()


class OrderRequest(BaseModel):
    user_id: str
    product_id: str
    quantity: int


@app.post("/orders")
async def create_order(order: OrderRequest):
    try:
        result = await order_service.create_order(
            user_id=order.user_id,
            product_id=order.product_id,
            quantity=order.quantity,
        )
        return result
    except ValueError as e:
        raise HTTPException(status_code=400, detail=str(e))
    except Exception:
        raise HTTPException(status_code=500, detail="Internal server error")

What Production Logs Look Like

A successful order request produces a sequence of JSON log events, each carrying the full context:

{"event": "request_started", "level": "INFO", "timestamp": "2026-05-01T10:23:41Z", "request_id": "a3f1c2d4", "method": "POST", "path": "/orders", "client_ip": "192.168.1.1", "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736", "span_id": "00f067aa0ba902b7"}

{"event": "order_creation_started", "level": "INFO", "timestamp": "2026-05-01T10:23:41Z", "request_id": "a3f1c2d4", "user_id": "u123", "product_id": "p456", "quantity": 2, "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736", "span_id": "00f067aa0ba902b7"}

{"event": "payment_processed", "level": "INFO", "timestamp": "2026-05-01T10:23:41Z", "request_id": "a3f1c2d4", "user_id": "u123", "product_id": "p456", "quantity": 2, "payment_id": "pay_u123", "amount": 59.98, "currency": "USD", "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736", "span_id": "ab12cd34ef567890"}

{"event": "order_created", "level": "INFO", "timestamp": "2026-05-01T10:23:41Z", "request_id": "a3f1c2d4", "user_id": "u123", "order_id": "order_u123_p456", "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736", "span_id": "ab12cd34ef567890"}

{"event": "request_completed", "level": "INFO", "timestamp": "2026-05-01T10:23:41Z", "request_id": "a3f1c2d4", "status_code": 200, "duration_ms": 342.5, "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736", "span_id": "00f067aa0ba902b7"}

Every log event carries the same trace_id. In your log aggregator, you can:

Filter by trace_id to see every log event from a single request
Filter by user_id to see every order a specific user has placed
Filter by event=payment_declined and group by decline_reason to understand why payments are failing
Alert when event=payment_service_unavailable appears more than 3 times in 5 minutes

None of this is possible with unstructured log strings.

Correlating Logs and Traces in Practice

The trace_id field is the bridge between your logs and your traces. Here's how to use it:

From a log to a trace: A user reports their order failed. You find their user_id in your log aggregator, filter by event=payment_declined, get the trace_id from the log event, and paste it into your tracing backend. You now see the full execution timeline — every span, every operation, every timing.

From a trace to logs: You notice a trace with unusually high latency in your tracing backend. You copy the trace_id and filter your log aggregator. You see that a payment_processed log event was emitted 800ms into the request — but the request_completed event came 2.3 seconds later. Something happened after payment processing that doesn't appear in the trace. The logs fill the gap.

This bidirectional navigation — from logs to traces and back — is what makes a production observability stack genuinely useful rather than just theoretically complete.

Common Mistakes

Logging sensitive data. It's easy to accidentally log payment details, passwords, or personally identifiable information as structured fields. Before adding a field to a log event, ask whether it belongs in your log aggregator. User IDs are fine. Card numbers are not.

Forgetting to clear context between requests. If you're using bind_contextvars without clear_contextvars at the start of each request, context leaks between requests on the same worker. Always clear first.

Using string formatting instead of keyword arguments. log.info(f"Payment of {amount} processed") defeats the purpose of structured logging. Use log.info("payment_processed", amount=amount) — keep the event name short and stable, put the data in fields.

Logging at the wrong level. Reserve error for conditions that require immediate human attention. Use warning for expected failure cases (payment declined, validation failed). Use info for significant business events. Use debug for everything else. If everything is error, nothing is.

Summary

Structured logging with Structlog gives you three things that unstructured logging cannot:

Queryable log data — filter, aggregate, and alert on specific fields rather than parsing strings
Automatic context propagation — bind context once per request and it appears on every log event
Log-trace correlation — every log event carries the trace ID, making bidirectional navigation between logs and traces possible

The setup is straightforward: configure Structlog once, add the trace context processor, use middleware to bind request-scoped context, and log events with keyword arguments instead of formatted strings. From there, your log aggregator does the heavy lifting.

Unstructured logs tell you something happened. Structured logs tell you what happened, to whom, when, why, and where in the trace you can find the full picture.

Find me on GitHub or LinkedIn.

DEV Community