우병수

Posted on Jul 1 • Originally published at techdigestor.com

Monitoring Kubernetes Clusters with OpenTelemetry Collector: The Agent + Gateway Pattern Explained

#ai #machinelearning #cloud #devops

TL;DR: The failure mode nobody talks about until it's happening in production: every pod opens a direct gRPC connection to your Tempo or Loki ingest endpoint, and the backend starts dropping spans not because it's overloaded on CPU, but because it hits its concurrent connection limit.

📖 Reading time: ~23 min

What's in this article

The Problem: Per-Node Chaos Without a Collection Strategy
Architecture Overview: Agent DaemonSet + Gateway Deployment
Deploying the Agent: DaemonSet Config That Actually Works
Deploying the Gateway: Where Batching and Sampling Live
Three Non-Obvious Behaviors That Will Burn You
Validating the Pipeline and What to Monitor About the Collector Itself
When to Use This Pattern and When to Skip It

The Problem: Per-Node Chaos Without a Collection Strategy

The failure mode nobody talks about until it's happening in production: every pod opens a direct gRPC connection to your Tempo or Loki ingest endpoint, and the backend starts dropping spans not because it's overloaded on CPU, but because it hits its concurrent connection limit. gRPC connections aren't free — each one holds state, negotiates keepalives, and occupies a file descriptor. At five nodes with a handful of pods each, this is invisible. At twenty nodes with autoscaling workloads, you're opening hundreds of persistent connections to a single endpoint that was sized for a fraction of that. The ingest service doesn't degrade gracefully; it starts returning RESOURCE_EXHAUSTED and your traces silently disappear.

The fan-out math is straightforward and brutal. If each node runs a DaemonSet collector that connects directly to your gateway, and each of those collectors opens connections for metrics (OTLP gRPC), traces, and logs separately, a 20-node cluster with three signal types is already at 60 persistent upstream connections minimum — before you account for any application-level SDK exporters that also decided to phone home directly. Tempo's default ingest configuration isn't built to handle that connection count from a single cluster, and Loki's distributor will start rejecting pushes under the same pressure. The spans don't queue; they're dropped at the exporter with a transient error that most SDKs log once and discard.

A DaemonSet-only deployment looks clean on paper: one collector per node, scrape local pods, forward upstream. The problem is "forward upstream" implies the DaemonSet agent is doing two jobs simultaneously — staying lightweight enough to not steal resources from the workloads it shares a node with, and being stateful enough to buffer, batch, retry, and route signals reliably. Those are contradictory requirements. A collector trying to hold a retry queue for failed exports while also scraping Prometheus endpoints on tight intervals will either starve its scrape loop under memory pressure or drop its queue when the node evicts it. The two roles genuinely need to be separate processes with separate resource profiles.

The architecture that actually works separates these concerns by design. The agent — running as a DaemonSet — is intentionally dumb and lean: receive signals from local pods via OTLP, do minimal processing (add node/pod labels, nothing expensive), and forward to an internal gateway endpoint. The gateway — running as a Deployment with persistent storage or at least a proper memory queue — handles everything stateful: batching, retry with backoff, TLS to the external backend, fan-in from all agents into a manageable number of upstream connections. The gateway sees maybe three or four connections going out, regardless of how many nodes are in the cluster. That's the connection count your Tempo ingest was actually sized for.

Trying to make one collector configuration do both roles causes resource contention in a specific and annoying way. The batch processor needs memory headroom proportional to the volume it's buffering. The prometheusreceiver needs CPU for scrape cycles. On a busy node, these compete. You'll see the batch processor's send queue fill up during a scrape spike, which causes backpressure into the receiver, which causes the exporter to the backend to time out, which triggers a retry loop — and now your lightweight DaemonSet pod is sitting at 400MB RAM and getting OOMKilled. The node comes back clean, the collector restarts, and you've lost whatever was in the queue. Split the roles and you size each component appropriately: agents at 64–128MB limits, the gateway at whatever the actual buffering workload demands.

Architecture Overview: Agent DaemonSet + Gateway Deployment

The split that most people miss when they first read the OpenTelemetry Collector docs is that you're not choosing between an agent and a gateway — you're running both, for different jobs. The agent is a resource-constrained process living on every node. The gateway is a proper service that absorbs all that data and does the expensive work. Conflating them into a single deployment is the fastest way to either blow your node memory budget or lose telemetry during a backend outage.

The agent runs as a DaemonSet — one pod per node, no exceptions. Its job list is narrow on purpose: scrape kubeletstats and hostmetrics for node-level data, tail container logs via the filelog receiver, and sit on localhost:4317 waiting for OTLP pushes from app containers on the same node. That last one matters: because the agent is node-local, sidecar or SDK instrumentation can hit it over loopback without any service discovery overhead. The memory ceiling for the agent should be hard-capped — something like 200–300Mi depending on log volume — because it has no persistent queue. If the gateway is unreachable, the agent drops data. That's a feature, not a bug. You do not want a DaemonSet silently accumulating gigabytes of retry buffer on every node in a partition event.

The gateway runs as a Deployment with two or more replicas behind a ClusterIP service. It receives OTLP over gRPC from every agent in the cluster, applies tail-based sampling decisions across the full trace (which requires seeing all spans for a given trace ID in one place — more on that in the sampling section), batches aggressively before forwarding, and owns all the retry queues and remote backend credentials. The gateway is where your Prometheus remote_write URL, your Tempo endpoint, and your Loki push URL live — as Kubernetes Secrets mounted into the gateway pods, not baked into a ConfigMap that every node-level ServiceAccount can read. The data flow in plain terms: your app pushes OTLP to the agent on localhost:4317, the agent forwards over gRPC to the gateway's ClusterIP (typically something like otel-gateway.monitoring.svc.cluster.local:4317), and the gateway fans out to backends — Prometheus gets metrics via remote_write, Tempo gets traces via OTLP HTTP or gRPC, Loki gets logs via its push API.

RBAC is where this split pays off in a concrete security way. The agent ServiceAccount needs real cluster permissions because it's doing discovery work:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: otel-agent
rules:
  - apiGroups: [""]
    resources: ["nodes", "nodes/metrics", "pods", "endpoints", "services"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["extensions", "apps"]
    resources: ["replicasets"]
    verbs: ["get", "list", "watch"]
  # needed for kubeletstats receiver hitting /metrics/cadvisor
  - nonResourceURLs: ["/metrics", "/metrics/cadvisor"]
    verbs: ["get"]

The gateway ServiceAccount needs none of that. It receives data over a network socket and pushes it to external endpoints. If someone misconfigures the gateway's collector config and opens an unintended receiver, the blast radius is zero Kubernetes API access. Keeping these two ServiceAccounts completely separate means a compromised or misconfigured gateway pod cannot enumerate your cluster topology, and a misconfigured agent pod cannot reach your remote backend credentials. That's the actual value of the split — not just resource isolation, but a meaningful reduction in what any single misconfiguration can touch.

Deploying the Agent: DaemonSet Config That Actually Works

The part most tutorials skip: the agent ConfigMap is where most production failures originate, not the gateway. A misconfigured memory_limiter processor doesn't gracefully shed load — it causes the collector pod to OOMKill and restart in a loop, taking your node-level metrics dark for 30–90 seconds per restart cycle. The processor must be listed first in every pipeline's processor chain, before batch, or it has no chance to act before memory is already blown. This is documented in the OpenTelemetry Collector docs but buried far enough that it's easy to miss on first read.

Here's a minimal but complete ConfigMap that actually runs. Real field names, real units — no placeholders:

apiVersion: v1
kind: ConfigMap
metadata:
  name: otelcol-agent-config
  namespace: monitoring
data:
  config.yaml: |
    extensions:
      health_check:
        endpoint: 0.0.0.0:13133

    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318

      kubeletstats:
        collection_interval: 30s
        auth_type: serviceAccount
        endpoint: "https://${env:K8S_NODE_NAME}:10250"
        insecure_skip_verify: true
        metric_groups:
          - node
          - pod
          - container

      hostmetrics:
        collection_interval: 30s
        scrapers:
          cpu: {}
          disk: {}
          filesystem:
            exclude_mount_points:
              mount_points: [/dev, /proc, /sys, /run/k3s/containerd]
              match_type: strict
          memory: {}
          network: {}
          load: {}

      filelog:
        include:
          - /var/log/pods/*/*/*.log
        include_file_path: true
        include_file_name: false
        operators:
          - type: router
            id: get-format
            routes:
              - output: parser-docker
                expr: 'body matches "^\\{"'
              - output: parser-crio
                expr: 'body matches "^[^ Z]+ "'
          - type: json_parser
            id: parser-docker
            output: extract-metadata
          - type: regex_parser
            id: parser-crio
            regex: '^(?P[^ Z]+) (?Pstdout|stderr) (?P[^ ]*) ?(?P.*)$'
            output: extract-metadata
          - type: move
            id: extract-metadata
            from: attributes["log"]
            to: body

    processors:
      memory_limiter:
        # must be first in pipeline — limits before batch can accumulate
        check_interval: 1s
        limit_mib: 220          # hard ceiling below the 256Mi pod limit
        spike_limit_mib: 60     # absorbs bursts without hitting the hard limit

      batch:
        send_batch_size: 1024
        timeout: 5s
        send_batch_max_size: 2048

      resourcedetection:
        detectors: [env, k8snode]
        timeout: 5s

      k8sattributes:
        auth_type: serviceAccount
        passthrough: false
        filter:
          node_from_env_var: K8S_NODE_NAME
        extract:
          metadata:
            - k8s.pod.name
            - k8s.pod.uid
            - k8s.deployment.name
            - k8s.namespace.name
            - k8s.node.name
            - k8s.container.name

    exporters:
      otlp/gateway:
        endpoint: otelcol-gateway.monitoring.svc.cluster.local:4317
        tls:
          insecure: true   # mTLS is a gateway-layer concern; agent→gateway is cluster-internal

    service:
      extensions: [health_check]
      pipelines:
        metrics:
          receivers: [kubeletstats, hostmetrics, otlp]
          processors: [memory_limiter, resourcedetection, k8sattributes, batch]
          exporters: [otlp/gateway]
        logs:
          receivers: [filelog, otlp]
          processors: [memory_limiter, k8sattributes, batch]
          exporters: [otlp/gateway]
        traces:
          receivers: [otlp]
          processors: [memory_limiter, k8sattributes, batch]
          exporters: [otlp/gateway]

On resource limits: 200m CPU and 256Mi memory is the right starting ceiling for the agent pod, not because of any single benchmark but because of where the pain points are. CPU for the agent is almost never the bottleneck — the 200m ceiling is defensive, preventing a misbehaving scraper from starving other node workloads. Memory is where you actually hit problems. The memory_limiter is configured at 220 MiB hard limit with a 60 MiB spike allowance, which leaves a small gap below the 256Mi pod limit. That gap is intentional: if the limiter somehow fails to shed fast enough, the container OOMKills cleanly rather than thrashing. Flip those numbers — set limit_mib above your container limit — and you get the worst outcome: the kernel kills the process before the limiter ever fires.

resources:
  requests:
    cpu: 50m
    memory: 128Mi
  limits:
    cpu: 200m
    memory: 256Mi

The filelog receiver requires two hostPath mounts that don't show up in the basic install guides. Without them the receiver starts cleanly but collects nothing, and the only evidence is silence in your log pipeline:

volumeMounts:
  - name: varlogpods
    mountPath: /var/log/pods
    readOnly: true
  - name: varlibdockercontainers
    mountPath: /var/lib/docker/containers
    readOnly: true
volumes:
  - name: varlogpods
    hostPath:
      path: /var/log/pods
  - name: varlibdockercontainers
    hostPath:
      path: /var/lib/docker/containers

On containerd-only nodes (k3s, most current kubeadm clusters), /var/lib/docker/containers won't exist but the mount won't fail either — it'll just be empty. The actual log files are under /var/log/pods regardless of runtime, so that mount is the critical one. The docker path is worth keeping for mixed clusters or if you're running Docker-in-Docker workloads.

The health check extension at 0.0.0.0:13133 earns its place by catching a failure mode that metrics pipelines can't see: a collector that is running, passing its readiness check, but has internally deadlocked on a slow exporter. The liveness probe should hit / on port 13133, not the OTLP port. A deadlocked collector will stop updating its internal health endpoint within the check interval, triggering a pod restart before you even notice the gap in your metrics. Without this probe, a stuck collector can look healthy to Kubernetes while silently dropping every signal it receives.

livenessProbe:
  httpGet:
    path: /
    port: 13133
  initialDelaySeconds: 15
  periodSeconds: 20
  failureThreshold: 3
readinessProbe:
  httpGet:
    path: /
    port: 13133
  initialDelaySeconds: 5
  periodSeconds: 10

Deploying the Gateway: Where Batching and Sampling Live

The single most important architectural decision in this whole pattern is where you put tail sampling — and the answer is never on the agent. An agent DaemonSet pod sees spans from one node. A trace for a single user request might touch pods on three different nodes, meaning each agent sees a fragment. If you apply a tail sampling policy at the agent, you're making a keep/drop decision on an incomplete picture. The policy fires when it thinks the trace is complete, but it isn't — the spans from the other nodes are just missing. You don't get an error. You get silently incomplete traces in Tempo, and you spend an hour wondering why your database call spans vanished.

The gateway is the right place because it receives OTLP from all agents and reconstructs full traces before evaluating any policy. The tail_sampling processor holds spans in memory for a configurable decision wait time, then applies your policies against the assembled trace. Below is a ConfigMap that wires this up end to end — a latency-based policy at 500ms, aggressive batching before the remote write, and the file storage extension that keeps you from losing a queue full of spans when the gateway pod restarts:

apiVersion: v1
kind: ConfigMap
metadata:
  name: otelcol-gateway-config
  namespace: observability
data:
  config.yaml: |
    extensions:
      # file_storage keeps the persistent queue on disk across pod restarts.
      # Mount a PVC at this path — emptyDir will not survive a restart.
      file_storage:
        directory: /var/otelcol/queue
        timeout: 10s
        compaction:
          on_start: true
          directory: /var/otelcol/queue/compaction
          max_transaction_size: 65536

    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318

    processors:
      # tail_sampling MUST come before batch. The processor needs to see
      # individual spans to reconstruct traces; batching first breaks that.
      tail_sampling:
        decision_wait: 30s          # hold spans this long before deciding
        num_traces: 50000           # max traces in memory — tune to your heap
        expected_new_traces_per_sec: 200
        policies:
          - name: keep-slow-traces
            type: latency
            latency:
              threshold_ms: 500     # keep any trace with duration >= 500ms
          - name: keep-errors
            type: status_code
            status_code:
              status_codes: [ERROR]
          - name: probabilistic-baseline
            # Keep 10% of everything else so you have a baseline for fast paths
            type: probabilistic
            probabilistic:
              sampling_percentage: 10

      batch:
        send_batch_size: 8192
        timeout: 10s               # flush even if batch isn't full after 10s
        send_batch_max_size: 16384

      memory_limiter:
        check_interval: 1s
        limit_mib: 1500
        spike_limit_mib: 400

    exporters:
      # Traces → Tempo
      otlp/tempo:
        endpoint: tempo.observability.svc.cluster.local:4317
        tls:
          insecure: true
        sending_queue:
          enabled: true
          num_consumers: 4
          queue_size: 5000
          storage: file_storage    # references the extension above
        retry_on_failure:
          enabled: true
          initial_interval: 5s
          max_interval: 30s
          max_elapsed_time: 300s

      # Metrics → Prometheus remote_write (Mimir or Grafana Cloud)
      prometheusremotewrite:
        endpoint: ${MIMIR_REMOTE_WRITE_URL}   # injected from Secret
        tls:
          insecure_skip_verify: false
        headers:
          Authorization: "Basic ${MIMIR_AUTH_HEADER}"   # injected from Secret
        sending_queue:
          enabled: true
          queue_size: 5000
          storage: file_storage
        retry_on_failure:
          enabled: true
          max_elapsed_time: 300s

    service:
      extensions: [file_storage]
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, tail_sampling, batch]
          exporters: [otlp/tempo]
        metrics:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [prometheusremotewrite]

The file_storage extension is the piece most tutorials skip. Without it, the sending_queue lives entirely in memory, and any gateway restart — including a normal rolling update — drops whatever was queued. With it, the queue serializes to disk and resumes after the pod comes back. The critical detail: you must mount a PVC at /var/otelcol/queue, not an emptyDir. An emptyDir is per-pod ephemeral storage — the moment the container exits, the directory is gone, which is exactly the failure mode you're trying to prevent. A ReadWriteOnce PVC on any standard StorageClass handles this; you don't need anything exotic.

The credentials problem trips up most first deploys. The ${MIMIR_REMOTE_WRITE_URL} and ${MIMIR_AUTH_HEADER} placeholders above are not Helm template syntax — the OpenTelemetry Collector binary expands environment variable references in its config file at startup. That means you can inject secrets via envFrom without touching the ConfigMap at all:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: otelcol-gateway
  namespace: observability
spec:
  template:
    spec:
      containers:
        - name: otelcol
          image: otel/opentelemetry-collector-contrib:0.102.0
          args: ["--config=/conf/config.yaml"]
          envFrom:
            # All keys in this Secret become env vars automatically.
            # Add MIMIR_REMOTE_WRITE_URL and MIMIR_AUTH_HEADER here.
            - secretRef:
                name: otelcol-gateway-secrets
          volumeMounts:
            - name: config
              mountPath: /conf
            - name: queue-storage
              mountPath: /var/otelcol/queue   # must match file_storage.directory
      volumes:
        - name: config
          configMap:
            name: otelcol-gateway-config
        - name: queue-storage
          persistentVolumeClaim:
            claimName: otelcol-gateway-queue
---
apiVersion: v1
kind: Secret
metadata:
  name: otelcol-gateway-secrets
  namespace: observability
type: Opaque
stringData:
  MIMIR_REMOTE_WRITE_URL: "https://mimir.example.com/api/v1/push"
  MIMIR_AUTH_HEADER: "dXNlcjpwYXNzd29yZA=="   # base64(user:password)

A few sharp edges worth flagging before you call this production-ready. First, tail_sampling and batch ordering matters more than the docs make clear — batch after tail sampling, not before, otherwise the batch processor groups spans from different traces together and the tail sampler can't reconstruct them correctly. Second, the num_traces: 50000 setting directly determines gateway memory pressure; at 30s decision wait, a spike in traffic can fill that buffer fast. Watch the otelcol_processor_tail_sampling_sampling_traces_on_memory metric and set a horizontal pod autoscaler trigger on it, not just CPU. Third, if you're sending to Grafana Cloud rather than self-hosted Mimir, the remote write URL format is slightly different — it includes /api/prom/push not /api/v1/push, and the auth is HTTP Basic against your Grafana Cloud stack credentials, not a bearer token.

Three Non-Obvious Behaviors That Will Burn You

The one that stings most operators first: the memory_limiter processor must be the first entry in every pipeline's processor list, not second or third. The instinct is to put batch first because batching "should happen before limiting," but that logic is backwards under load. When batch runs first, it accumulates spans and metrics into memory until the batch is full — then hands a large, already-allocated blob to memory_limiter, which can only reject it after the damage is done. The correct order is processors: [memory_limiter, batch] in every pipeline, every time, no exceptions. The collector docs mention this, but it's buried, and the default example configs don't always model it correctly.

# Correct processor order — memory_limiter gates before batch accumulates
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]   # NOT [batch, memory_limiter]
      exporters: [otlp/gateway]
    metrics:
      receivers: [kubeletstats, prometheus]
      processors: [memory_limiter, batch]
      exporters: [otlp/gateway]

The kubeletstats receiver's TLS error is a classic 30-minute timesink. When you see x509: certificate signed by unknown authority in the collector pod logs, the instinct is to check RBAC — service account permissions, ClusterRole bindings, whether the kubelet endpoint is accessible at all. Those are all fine. The error is purely TLS: the kubelet serves a self-signed cert that the collector can't verify because it doesn't have the cluster CA. Fix it one of two ways: pass insecure_skip_verify: true under the receiver's tls block (acceptable inside a private cluster network, not ideal), or mount the cluster CA and point the receiver at it. The permissions angle is a red herring that the error message does nothing to dispel.

receivers:
  kubeletstats:
    collection_interval: 20s
    auth_type: serviceAccount
    endpoint: "https://${env:K8S_NODE_NAME}:10250"
    insecure_skip_verify: true   # or: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    metric_groups: [node, pod, container]

Tail sampling with file_storage will quietly fill a node's disk if you're not watching it. The collector buffers entire traces to disk during the decision_wait window before it decides whether to sample them. A generous window (say, 30s) combined with a traffic spike means tens of gigabytes can accumulate before decisions flush. The collector does not self-limit this storage by default. Set a size_limit in the file_storage extension, and wire up an alert on the filesystem path — not just collector-level metrics. When the disk fills, the collector doesn't degrade gracefully; it fails writes, and you get silent trace loss without obvious errors at the application level.

extensions:
  file_storage/tail_sampling:
    directory: /var/otelcol/tail-sampling
    size_limit: 4GiB   # without this, growth is unbounded

processors:
  tail_sampling:
    decision_wait: 15s   # shorter window = less disk pressure during spikes
    storage: file_storage/tail_sampling

gRPC keepalive settings between agent and gateway are ignored until they suddenly matter. kube-proxy silently drops idle TCP connections after roughly 15 minutes — the exact timeout varies by cloud provider and kernel settings, but the behavior is consistent. Without keepalive_time and keepalive_timeout set on the agent's OTLP exporter, the connection goes idle during a quiet period, gets dropped at the proxy layer, and the next data push fails with a reconnect storm that produces errors resembling a gateway crash. You'll see connection refused or stream reset errors in the agent logs, gateway pod restarts won't help, and the root cause won't be obvious until you correlate the timing with kube-proxy idle timeouts. Set these explicitly on every agent exporter pointing at the gateway:

exporters:
  otlp/gateway:
    endpoint: "otel-gateway-collector:4317"
    tls:
      insecure: true   # internal cluster traffic; TLS termination handled at ingress
    keepalive:
      time: 10s          # send keepalive ping after 10s of inactivity
      timeout: 5s        # wait 5s for pong before declaring connection dead
      permit_without_stream: true   # keep the connection alive even with no active RPCs

Validating the Pipeline and What to Monitor About the Collector Itself

The most common mistake when deploying the agent + gateway pattern is trusting the pipeline because pods are running and no errors appear in logs. That's not validation — that's hope. The OpenTelemetry Collector exposes its own Prometheus metrics on :8888/metrics by default, and before you declare the pipeline production-ready, four specific counters need to be on a dashboard: otelcol_receiver_accepted_spans, otelcol_receiver_refused_spans, otelcol_exporter_sent_metric_points, and otelcol_exporter_send_failed_metric_points. If accepted spans are climbing but sent metric points are flat, your pipeline has a disconnect — likely a processor misconfiguration or a backend that's silently rejecting data. These four metrics give you the full picture: data arriving, data leaving, and data failing at both ends.

# Quick scrape to verify the metrics endpoint is live on an agent pod
kubectl exec -n observability ds/otel-agent -- \
  wget -qO- http://localhost:8888/metrics | grep otelcol_receiver_accepted_spans

# Expected output looks like:
# otelcol_receiver_accepted_spans{receiver="otlp",transport="grpc"} 4821
# If this is zero after your app has been running, check receiver config first

Before you push any agent config to production nodes, otelcol validate is the correct first gate. It parses the full pipeline graph, checks that every referenced component is enabled, and catches type mismatches in processor config that would only surface at runtime. Pair that with a synthetic trace using otel-cli — a standalone binary that sends a real OTLP span — and you can confirm end-to-end flow from a single terminal session without touching your application at all.

# Config validation — runs in CI, costs nothing
otelcol validate --config=agent-config.yaml

# Synthetic trace to a locally-running collector (replace endpoint as needed)
otel-cli exec \
  --endpoint http://localhost:4317 \
  --name "smoke-test" \
  --service "validation-check" \
  -- echo "pipeline alive"

# Then immediately check the counter incremented
kubectl exec -n observability ds/otel-agent -- \
  wget -qO- http://localhost:8888/metrics | grep otelcol_receiver_accepted_spans

On alerting thresholds: the metric that most operators misconfigure alerts on is otelcol_exporter_queue_size divided by otelcol_exporter_queue_capacity — queue utilization. When that ratio climbs above 70%, the instinct is to blame noisy instrumentation or a cardinality explosion on the app side. Usually it's the opposite: your backend (Tempo, Jaeger, a remote OTLP endpoint) is slow to acknowledge, so the exporter's send goroutines are backing up. The queue fills because of push-side latency, not because your app suddenly tripled its span rate. Set a separate alert on otelcol_receiver_accepted_spans rate for the noisy-app scenario — those are genuinely independent failure modes and conflating them wastes investigation time.

# Prometheus alerting rule that distinguishes queue pressure from volume spikes
groups:
  - name: otelcol-health
    rules:
      - alert: CollectorExporterQueueHigh
        expr: |
          otelcol_exporter_queue_size / otelcol_exporter_queue_capacity > 0.70
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Exporter queue above 70% — check backend latency, not app cardinality"

      - alert: CollectorExporterDropping
        expr: |
          rate(otelcol_exporter_send_failed_metric_points[5m]) > 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Collector is actively dropping data — backend unreachable or rejecting"

The operator mindset that makes this pipeline trustworthy — local validation before rollout, explicit metric confirmation, distinguishing which side of the pipeline is misbehaving — maps directly to how reliable local-first tooling gets built in general. If you're wiring AI-assisted tooling into your observability stack or CI pipelines, the same discipline applies: validate locally, instrument the tool itself, and don't trust that something is working just because it didn't crash. The AI Coding Tools in 2026: Cloud Copilots vs Local Models guide covers how that operator mindset translates to choosing and running dev tools — worth reading alongside this if you're building automation that touches your infra.

When to Use This Pattern and When to Skip It

The most common mistake with OpenTelemetry in Kubernetes is adding the gateway tier reflexively — because the diagram looks like the "right" architecture. The gateway is a real operational burden: it's another Deployment to resource-tune, another config to version, another thing to be down when you're debugging at midnight. Add it only when something specific forces your hand.

Add the gateway tier when you cross roughly eight nodes. Below that, the per-node agent DaemonSet pods can push directly to your backend without meaningful fan-out problems. But past that threshold, a few concrete pressures appear simultaneously: tail sampling requires seeing all spans for a given trace in one place, which a DaemonSet pod physically cannot do since different services may run on different nodes. If your backend — Grafana Cloud, Honeycomb, a managed Tempo instance — enforces rate limits or requires a per-tenant API key, you don't want that credential baked into every DaemonSet pod config and replicated across 20 nodes. And when you eventually swap from Jaeger to Tempo, or add a second backend for compliance, doing that in one gateway config file beats rolling a DaemonSet update across the entire fleet.

Skip the gateway entirely when your cluster is three to five nodes and your observability backend lives inside the cluster — say, Victoria Metrics and Grafana running as in-cluster Deployments. Head-based sampling is good enough for a small cluster where you're not drowning in span volume, and an internal backend means there's no auth boundary or rate-limit problem to solve centrally. In that topology, a gateway Deployment just sits between the agent and the backend doing nothing useful while consuming memory you'd rather give to workloads. The agent pods can push OTLP directly to an in-cluster Tempo or Loki endpoint, and the whole config stays in one DaemonSet manifest.

The hybrid case is the one the official docs gloss over but shows up constantly in real clusters: run Prometheus scrape on the agent for node-level metrics (CPU, memory, disk via the hostmetrics receiver), but route traces exclusively through the gateway. The reasoning is resource efficiency. Node metrics are high-cardinality and high-volume — pushing them through the gateway's batch processor means the gateway is doing expensive buffering and compression on a stream that doesn't benefit from centralization, because there's no "tail sampling for metrics" concept. Traces, on the other hand, need the central batch processor to correlate spans across nodes before sampling decisions get made. A practical split looks like this in the agent config:

exporters:
  # Metrics go direct to Victoria Metrics — no gateway hop
  prometheusremotewrite:
    endpoint: "http://victoria-metrics.monitoring.svc:8428/api/v1/write"

  # Traces go to the gateway for tail sampling
  otlp/gateway:
    endpoint: "otelcol-gateway.monitoring.svc:4317"
    tls:
      insecure: true

service:
  pipelines:
    metrics:
      receivers: [hostmetrics, kubeletstats]
      processors: [memory_limiter, batch]
      exporters: [prometheusremotewrite]
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp/gateway]

This keeps the gateway's batch processor focused on trace correlation where it earns its keep, and lets the metrics pipeline take the short path. The tradeoff is that you're maintaining two separate pipeline configs and two separate backends, so don't reach for this unless you're actually seeing the gateway become a bottleneck on metric throughput — which typically means the gateway pod's memory limit is getting hit on clusters with dense node-metrics scrape intervals under 15 seconds.

Disclaimer: This article is for informational purposes only. The views and opinions expressed are those of the author(s) and do not necessarily reflect the official policy or position of Sonic Rocket or its affiliates. Always consult with a certified professional before making any financial or technical decisions based on this content.

Originally published on techdigestor.com. Follow for more developer-focused tooling reviews and productivity guides.

DEV Community