Kubernetes orchestrates containers at scale, but this introduces monitoring challenges that don't exist with traditional deployments. Pods are ephemeral—they start, run, and terminate constantly. When a pod crashes and restarts, its logs disappear unless you capture them elsewhere. IP addresses change with each pod restart, making traditional host-based monitoring ineffective.
Microservices on Kubernetes compound these challenges. A single user request might traverse five services across fifteen pods distributed across multiple nodes. When something breaks, you need to trace that request through constantly changing infrastructure while correlating metrics from pods that might not exist anymore.
Kubernetes Architecture
Kubernetes clusters consist of control plane components and worker nodes. The control plane manages cluster state through the API server, scheduler, and controller manager. Worker nodes run your applications in pods, with each node running kubelet to communicate with the control plane.
Pods are the smallest deployable units in Kubernetes. Each pod contains one or more containers sharing network and storage. Pods are ephemeral—Kubernetes creates and destroys them based on load, health checks, and deployment updates. This means pod names and IPs change constantly.
Services provide stable network endpoints for groups of pods. A service abstracts pod IPs behind a single DNS name and load balances traffic across healthy pods. This decouples applications from pod lifecycle—when pods restart, the service continues routing traffic to new instances.
Pod-Level Observability
Monitoring pods requires tracking both infrastructure metrics and application performance. Infrastructure metrics show resource usage and pod health. Application metrics reveal what the code actually does.
Kubernetes exposes pod metrics through the Metrics API. These include CPU usage, memory consumption, network traffic, and disk I/O. The metrics server collects this data from kubelet on each node and makes it available for queries and horizontal pod autoscaling.
apiVersion: v1
kind: Pod
metadata:
name: order-service
labels:
app: order-service
version: v1.2.0
spec:
containers:
- name: order-service
image: order-service:v1.2.0
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
Resource requests tell Kubernetes the minimum resources a pod needs. Limits cap maximum usage. When a pod exceeds its memory limit, Kubernetes kills it with an OOMKilled status. Monitoring these events reveals whether your resource limits match actual usage.
OpenTelemetry on Kubernetes
OpenTelemetry provides automatic instrumentation for applications running in Kubernetes. The OpenTelemetry Operator injects instrumentation into pods without code changes.
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: default-instrumentation
spec:
exporter:
endpoint: http://otel-collector:4317
propagators:
- tracecontext
- baggage
sampler:
type: parentbased_traceidratio
argument: "0.1"
java:
image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
nodejs:
image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:latest
python:
image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest
Annotate your deployments to enable automatic instrumentation:
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service
spec:
template:
metadata:
annotations:
instrumentation.opentelemetry.io/inject-java: "true"
spec:
containers:
- name: payment-service
image: payment-service:v2.0.0
The operator injects an init container that adds OpenTelemetry libraries to your application. When the pod starts, instrumentation activates automatically, capturing HTTP requests, database queries, and external API calls.
Service Mesh Observability
Service meshes like Istio and Linkerd add a sidecar proxy to each pod. These proxies handle all network traffic, providing observability without instrumenting application code.
The sidecar captures request metrics (rate, error rate, latency), generates distributed traces for every request, and collects access logs with full request context. This gives you network-level observability across all services.
apiVersion: v1
kind: Service
metadata:
name: order-service
labels:
app: order-service
spec:
ports:
- port: 8080
name: http
selector:
app: order-service
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
spec:
template:
metadata:
labels:
app: order-service
version: v1
spec:
containers:
- name: order-service
image: order-service:v1.2.0
ports:
- containerPort: 8080
Istio automatically injects sidecars when you enable injection on a namespace:
kubectl label namespace production istio-injection=enabled
Sidecars export metrics in Prometheus format. Query these metrics to understand traffic patterns between services, identify slow dependencies, and detect error rate spikes.
Debugging Ephemeral Pods
When a pod crashes, its logs often disappear before you can examine them. Kubernetes provides mechanisms to access logs from terminated containers.
The kubectl logs
command retrieves logs from the previous container instance:
kubectl logs payment-service-7d8f9c-hx2k9 --previous
This works until the pod restarts again. For persistent log storage, deploy a log aggregator that ships logs from all pods to centralized storage.
OpenTelemetry Collector handles this. Deploy it as a DaemonSet to run one instance per node:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: otel-collector
spec:
selector:
matchLabels:
app: otel-collector
template:
metadata:
labels:
app: otel-collector
spec:
containers:
- name: otel-collector
image: otel/opentelemetry-collector:latest
volumeMounts:
- name: varlog
mountPath: /var/log
readOnly: true
volumes:
- name: varlog
hostPath:
path: /var/log
The collector reads logs from all pods on the node and exports them to your observability backend. When pods crash, logs remain accessible.
Container Resource Monitoring
Kubernetes monitors container resource usage through cAdvisor, which runs as part of kubelet. cAdvisor collects CPU, memory, network, and disk metrics for each container.
Monitor memory usage patterns to detect leaks. A container whose memory usage climbs steadily will eventually hit its limit and get killed. Track the container_memory_working_set_bytes
metric to see actual memory consumption.
CPU throttling occurs when a container exceeds its CPU limit. Kubernetes throttles the container, making it run slower. The container_cpu_cfs_throttled_seconds_total
metric shows cumulative throttled time. Rising throttling indicates your CPU limits are too low.
Health Checks
Kubernetes uses health checks to determine when to restart containers and when to route traffic to pods. Liveness probes check if a container is alive. If a liveness probe fails repeatedly, Kubernetes restarts the container. Readiness probes check if a container can accept traffic. Kubernetes removes pods with failing readiness probes from service endpoints.
apiVersion: v1
kind: Pod
metadata:
name: payment-service
spec:
containers:
- name: payment-service
image: payment-service:v2.0.0
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
The liveness probe checks /health/live
every 10 seconds. The readiness probe checks /health/ready
every 5 seconds. These endpoints should return HTTP 200 when healthy, 503 when unhealthy.
Implement these endpoints to check actual application health, not just that the process is running. Verify database connections work, required services are reachable, and caches are populated.
Network Policy Observability
Network policies control traffic between pods. When policies block traffic, applications fail with connection timeouts or refused connections. Without visibility into network policy enforcement, these errors look like application bugs.
Service meshes provide network policy observability. Istio generates metrics showing which connections network policies allowed or denied. The istio_tcp_connections_opened_total
and istio_tcp_connections_closed_total
metrics track connection counts with labels indicating source and destination services.
Query these metrics to understand traffic patterns and identify blocked connections:
rate(istio_tcp_connections_closed_total{
response_flags="DC"
}[5m])
The DC
response flag indicates "downstream connection termination" caused by network policy denials.
Node-Level Monitoring
Nodes provide the infrastructure where pods run. Node failures take down all pods on that node. Monitor node health to detect issues before they cause widespread failures.
Key node metrics include CPU usage, memory usage, disk space, and network bandwidth. The node_memory_MemAvailable_bytes
metric shows available memory. When this drops too low, Kubernetes starts evicting pods.
The node_disk_io_time_seconds_total
metric tracks disk I/O time. High I/O times indicate disk saturation, which slows all containers on the node.
Monitor node conditions through the Kubernetes API. The Ready
condition indicates whether the node can accept new pods. The DiskPressure
condition signals low disk space. The MemoryPressure
condition signals low available memory.
Distributed Tracing
Distributed tracing shows request paths through microservices. In Kubernetes, traces must handle the dynamic nature of pods—services scale up and down, pods restart frequently, and IPs change constantly.
OpenTelemetry propagates trace context through service calls. When Service A calls Service B, OpenTelemetry injects trace context into HTTP headers or message metadata. Service B extracts this context and continues the trace.
Kubernetes labels and annotations help correlate traces with infrastructure. Add pod name, namespace, and node name to trace attributes:
Span span = tracer.spanBuilder("process-payment")
.setAttribute("k8s.pod.name", System.getenv("HOSTNAME"))
.setAttribute("k8s.namespace", System.getenv("K8S_NAMESPACE"))
.setAttribute("k8s.node.name", System.getenv("K8S_NODE_NAME"))
.startSpan();
This links traces to specific pod instances. When investigating slow requests, you can identify which pod processed them and check that pod's resource usage at that time.
Horizontal Pod Autoscaling
Horizontal Pod Autoscaler (HPA) scales deployments based on metrics. When CPU usage exceeds a threshold, HPA increases replica count. When usage drops, HPA decreases replicas.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: payment-service-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: payment-service
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Monitor HPA decisions to understand scaling behavior. The kube_horizontalpodautoscaler_status_current_replicas
metric shows current replica count. Compare this against kube_horizontalpodautoscaler_status_desired_replicas
to see if HPA can achieve its target.
If desired replicas exceeds current replicas for extended periods, you've hit cluster capacity limits or pod scheduling constraints.
Monitoring Best Practices
Use labels consistently across all resources. Add labels for service name, version, environment, and team. This enables filtering and grouping in dashboards and queries.
Set up alerts for critical pod states: CrashLoopBackOff indicates repeated startup failures, ImagePullBackOff signals registry access problems, and OOMKilled shows memory limits are too low. These states require immediate investigation.
Monitor the gap between resource requests and actual usage. Requesting more resources than needed wastes cluster capacity. Requesting too little causes throttling and OOM kills. Track the container_memory_working_set_bytes / container_spec_memory_limit_bytes
ratio. Values consistently near 1.0 indicate tight limits.
Implement distributed tracing for all service-to-service communication. This reveals request paths, identifies slow dependencies, and helps diagnose cascading failures.
Uptrace for Kubernetes
Uptrace integrates with Kubernetes through OpenTelemetry and could be deployed on Kubernetes. Deploy the OpenTelemetry Operator and Collector to your cluster. Configure your applications to export telemetry to the collector. The collector forwards data to Uptrace.
Uptrace correlates metrics, logs, and traces across your entire cluster. When a pod crashes, view its final logs alongside traces of the requests it was processing. When latency spikes, see which pods were handling requests and their resource usage at that time.
For Spring Boot microservices on Kubernetes, combine the patterns from Spring Boot monitoring with Kubernetes-native instrumentation. For event-driven systems, check Kafka microservices monitoring.
Getting Started
Start with the basics: deploy the Metrics Server to enable resource metrics. This gives you CPU and memory usage for pods and nodes.
Add OpenTelemetry Operator to enable automatic instrumentation. Annotate your deployments to inject instrumentation without code changes.
Deploy OpenTelemetry Collector as a DaemonSet to collect logs and metrics from all nodes. Configure it to export to your observability backend.
Implement proper health checks on all services. Use liveness probes to detect crashed containers and readiness probes to manage traffic routing.
Set up alerts for pod states that indicate problems—CrashLoopBackOff, ImagePullBackOff, OOMKilled. These require immediate action.
Ready to monitor Kubernetes microservices? Start with Uptrace for unified observability across your cluster.
You may also be interested in:
Top comments (0)