DEV Community

Cover image for Modern Java Observability in 2026 - Spring Boot 4 on Amazon EKS
Yuriy Bezsonov
Yuriy Bezsonov

Posted on

Modern Java Observability in 2026 - Spring Boot 4 on Amazon EKS

If you're running Spring Boot applications on Kubernetes, you've probably hit the same wall I did: containers restart, logs disappear, and when something goes wrong or slows down, you're left guessing which microservice caused the issue.

Two recent developments prompted me to revisit my observability setup. First, the OpenTelemetry Java Agent v2.0 (January 2024) shifted from "instrument everything automatically" to "explicit over implicit" - requiring adjustments to maintain visibility into business logic. Second, Spring Boot 4.0 (November 2025) introduced the new spring-boot-starter-opentelemetry, making it easier than ever to export metrics, traces, and logs via OpenTelemetry Protocol (OTLP).

In this post, I'll walk you through setting up observability for Spring Boot applications on Amazon EKS - starting with the basics (logs and metrics), diving into distributed tracing, and finishing with Application Signals. Hopefully this saves you some time.

Overview of the Solution

The Three Pillars

Quick refresher on the three pillars of observability:

  • Logs: Discrete events with timestamps and context
  • Metrics: Numerical measurements aggregated over time
  • Traces: Request flows across service boundaries

Each answers different questions. Logs tell you what happened. Metrics tell you how much and how often. Traces tell you where time was spent across services.

AWS Services for Observability

Here's the AWS stack I'm using:

Service Purpose
CloudWatch Logs Centralized log aggregation and search
CloudWatch Container Insights Infrastructure metrics for EKS
AWS X-Ray Distributed tracing and service maps
Application Signals Application Performance Monitoring (APM) dashboards and Service Level Objectives (SLOs)

aws-cloudwatch-xray-observability-stack

Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
│                           EKS Cluster                               │
│  ┌─────────────────┐     ┌──────────────────────────────────────┐   │
│  │  Java App Pod   │     │  CloudWatch Observability Add-on     │   │
│  │  + ADOT Agent   │────▶│  - Fluent Bit (logs)                 │   │
│  │                 │     │  - CloudWatch Agent (metrics/traces) │   │
│  └─────────────────┘     └──────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────┘
                                      │
                    ┌─────────────────┼─────────────────┐
                    ▼                 ▼                 ▼
             CloudWatch Logs      X-Ray         Application Signals
Enter fullscreen mode Exit fullscreen mode

The AWS Distro for OpenTelemetry (ADOT) agent runs inside your application container, collecting telemetry and sending it to the CloudWatch Observability add-on. From there, data flows to the appropriate AWS services.

Why I Chose ADOT Over Community OpenTelemetry

Feature Community Agent ADOT Agent
X-Ray trace ID format
AWS resource detection Basic Enhanced
Application Signals

I tried the community agent first, but the X-Ray integration was complicated. ADOT just works - it's AWS-supported and adds the features I actually need.

AWS Distro for OpenTelemetry ADOT architecture diagram

The Foundation: Logs and Metrics

Setting up logs and metrics on EKS is the easy part. The CloudWatch Observability add-on handles everything:

  • Fluent Bit as a DaemonSet for log collection
  • CloudWatch agent for metrics and trace forwarding

Quick Setup

One command:

aws eks create-addon \
  --cluster-name your-cluster \
  --addon-name amazon-cloudwatch-observability \
  --no-cli-pager
Enter fullscreen mode Exit fullscreen mode

That's it. Fluent Bit automatically collects container logs from /var/log/containers/ and sends them to CloudWatch Logs. The CloudWatch agent collects infrastructure metrics and exposes them in Container Insights.

What You Get

Logs appear in CloudWatch under /aws/containerinsights/{cluster-name}/application. Container Insights gives you dashboards showing CPU, memory, network, and disk metrics per pod, node, and namespace.

For Java applications, this foundation is necessary but not sufficient. You can see that your pods are running and consuming resources, but you can't see inside the Java Virtual Machine (JVM) or trace requests across services. That's where things get interesting.

Deep Dive: Distributed Tracing

This is where I spent most of my time. Distributed tracing tracks requests as they flow through microservices - essential for debugging latency issues and understanding dependencies.

The OpenTelemetry v2 Changes

In January 2024, OpenTelemetry Java Agent v2.0 introduced significant changes:

Change Before (v1) After (v2)
Controller spans Enabled Disabled by default
View spans Enabled Disabled by default
Micrometer bridge Enabled Disabled by default
OTLP protocol gRPC HTTP/protobuf

The philosophy shifted from "capture everything" to "opt-in for extras." This reduces noise and overhead, but means you need to explicitly enable features you relied on before. If you upgraded and noticed spans disappeared, this is why.

What Spring Boot 4 Brings

Spring Boot 4.0 introduced spring-boot-starter-opentelemetry with several benefits:

  • Native OTLP export for metrics, traces, and logs without additional dependencies
  • Seamless bridge between Micrometer observations and OpenTelemetry spans
  • Auto-configuration for OpenTelemetry SDK components
  • Support for W3C trace context propagation out of the box

This means @Observed annotations now create both Micrometer metrics and OpenTelemetry spans automatically - no manual bridging required.

The Hybrid Solution

Here's what I landed on - combining ADOT auto-instrumentation with manual @Observed annotations:

What ADOT auto-instruments (no code changes needed):

  • HTTP clients and servers (Spring MVC, JAX-RS)
  • Database clients (JDBC, Hibernate)
  • Messaging systems (Kafka, SQS)
  • AWS SDK calls (EventBridge, S3, DynamoDB)

What I had to add manually:

  • Internal @Service methods (business logic)
  • Custom spans for specific operations

Understanding Spring's Observability Stack

To make @Observed annotations work, you need to understand three components that work together:

Micrometer Observation API - A unified API that creates both metrics and traces from a single instrumentation point. When you annotate a method with @Observed, the Observation API records:

  • A timer metric (how long the method took)
  • A trace span (for distributed tracing)

This means one annotation gives you both a Prometheus metric like unicorn.create.duration and a trace span visible in X-Ray - no need to instrument twice.

Micrometer Tracing - The bridge that connects observations to tracing backends. It takes the spans created by the Observation API and exports them via OpenTelemetry (or other formats like Zipkin). Spring Boot 4's spring-boot-starter-opentelemetry auto-configures this bridge, so spans flow to your OTLP endpoint without manual wiring.

AspectJ weaving - The mechanism that intercepts @Observed method calls at runtime. When your code calls an @Observed method, AspectJ wraps the call to:

  1. Start a timer and create a span before the method executes
  2. Record the duration and close the span after the method completes
  3. Capture exceptions if the method fails

Without AspectJ, the @Observed annotation is just metadata sitting on your method - nothing actually captures timing or creates spans.

Enabling @Observed Annotations

Add these dependencies to your pom.xml:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-opentelemetry</artifactId>
</dependency>
<dependency>
    <groupId>org.aspectj</groupId>
    <artifactId>aspectjweaver</artifactId>
</dependency>
Enter fullscreen mode Exit fullscreen mode

Then annotate your business logic methods:

@Observed(name = "unicorn.create")
@Transactional
public Unicorn createUnicorn(Unicorn unicorn) {
    // Business logic - now traced!
}
Enter fullscreen mode Exit fullscreen mode

This gives you both a Prometheus metric (unicorn.create.duration) and an X-Ray span from a single annotation.

Dockerfile Configuration

Here's the Dockerfile configuration I use:

FROM public.ecr.aws/docker/library/amazoncorretto:21-al2023

# Download ADOT agent
ADD https://github.com/aws-observability/aws-otel-java-instrumentation/releases/latest/download/aws-opentelemetry-agent.jar /opt/aws-opentelemetry-agent.jar

# Attach agent to JVM
ENV JAVA_TOOL_OPTIONS="-javaagent:/opt/aws-opentelemetry-agent.jar"

# OTLP configuration
ENV OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf"
ENV OTEL_METRICS_EXPORTER="none"
ENV OTEL_LOGS_EXPORTER="none"

# Traces configuration
ENV OTEL_TRACES_SAMPLER="always_on"
ENV OTEL_PROPAGATORS="tracecontext,baggage,xray"
ENV OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="http://cloudwatch-agent.amazon-cloudwatch:4316/v1/traces"

# Application Signals
ENV OTEL_AWS_APPLICATION_SIGNALS_ENABLED="true"
ENV OTEL_AWS_APPLICATION_SIGNALS_EXPORTER_ENDPOINT="http://cloudwatch-agent.amazon-cloudwatch:4316/v1/metrics"

COPY app.jar /app.jar
ENTRYPOINT ["java", "-jar", "/app.jar"]
Enter fullscreen mode Exit fullscreen mode

Why these specific settings:

  • OTEL_METRICS_EXPORTER=none: I use Prometheus for metrics, not OTLP
  • OTEL_LOGS_EXPORTER=none: CloudWatch Logs handles logs via Fluent Bit
  • OTEL_PROPAGATORS: Includes xray for X-Ray trace ID format
  • Endpoints point to the CloudWatch agent service in the amazon-cloudwatch namespace

What Gets Traced

With this setup, X-Ray shows the complete request flow:

Component Auto-traced Manual (@Observed)
HTTP Server (Spring MVC)
JDBC (PostgreSQL)
AWS SDK (EventBridge, S3)
Business logic methods

Each trace shows end-to-end latency, time spent in each component, database queries, and external API calls. Finally, full visibility from HTTP request to database and back.

AWS X-Ray distributed tracing service map for microservices

Java Spring Boot X-Ray trace waterfall showing request latency

The Capstone: Application Signals

This is where it all comes together. Application Signals is AWS's APM solution that builds on top of traces. It automatically discovers services, collects telemetry, and displays key metrics without additional instrumentation.

Why I Like Application Signals

Traditional APM tools require vendor-specific agents and create lock-in. Application Signals uses standard OpenTelemetry instrumentation - the same ADOT agent configuration that enables tracing also enables Application Signals.

Just set OTEL_AWS_APPLICATION_SIGNALS_ENABLED=true and you're done. No additional code changes.

Service Health Dashboards

Application Signals automatically creates dashboards showing:

  • Request volume, latency (p50, p90, p99), and error rates
  • Service operations breakdown (which endpoints are slow?)
  • Service dependencies (what does this service call?)

These dashboards appear automatically once traffic flows through your instrumented application. No manual configuration needed - this was a pleasant surprise.

CloudWatch Application Signals service health dashboard

Service Level Objectives

SLOs monitor service reliability against defined targets. For example: "POST /unicorns will achieve latency under 1000ms for 99.9% of requests over a 14-day period."

I've set up SLOs for my critical endpoints and connected them to CloudWatch alarms. When an SLO is at risk, I get notified early.

Runtime Metrics

Application Signals also collects JVM runtime metrics:

  • Heap and non-heap memory usage
  • Garbage collection frequency and pause times
  • Thread count and states
  • CPU usage

These metrics have helped me catch memory leaks and GC pressure issues that weren't visible from infrastructure metrics alone.

Java JVM runtime metrics in CloudWatch showing heap memory and garbage collection

Trade-offs and Considerations

Cost

CloudWatch pricing applies to logs, metrics, traces, and Application Signals. For high-traffic applications, costs can add up. Here's what I do:

  • Log retention policies (I don't keep DEBUG logs forever)
  • Trace sampling for high-volume services
  • Standard metric resolution (high-resolution only where needed)

Performance

The ADOT agent adds approximately 10-50ms to application startup time. Runtime overhead is minimal in my testing.

AspectJ weaving for @Observed annotations adds 1-2 seconds to startup. Worth it for the visibility it provides.

Complexity

The hybrid instrumentation approach (ADOT auto-instrumentation + manual @Observed) requires understanding what gets traced automatically versus what needs explicit annotation.

Conclusion

Here's the observability journey I've landed on for Java on Amazon EKS:

  1. Logs and metrics via the CloudWatch Observability add-on provide the foundation
  2. Distributed tracing with ADOT and @Observed annotations gives visibility into request flows
  3. Application Signals delivers APM-level insights without vendor lock-in

The key insight: these layers build on each other. Application Signals needs traces. Traces need proper instrumentation. Proper instrumentation requires understanding the OpenTelemetry v2 changes.

If you want to try this yourself with complete code examples, check out the Java on AWS Immersion Day workshop. It's what I used to build my initial setup.

Learn More

Top comments (0)