If you're running Spring Boot applications on Kubernetes, you've probably hit the same wall I did: containers restart, logs disappear, and when something goes wrong or slows down, you're left guessing which microservice caused the issue.
Two recent developments prompted me to revisit my observability setup. First, the OpenTelemetry Java Agent v2.0 (January 2024) shifted from "instrument everything automatically" to "explicit over implicit" - requiring adjustments to maintain visibility into business logic. Second, Spring Boot 4.0 (November 2025) introduced the new spring-boot-starter-opentelemetry, making it easier than ever to export metrics, traces, and logs via OpenTelemetry Protocol (OTLP).
In this post, I'll walk you through setting up observability for Spring Boot applications on Amazon EKS - starting with the basics (logs and metrics), diving into distributed tracing, and finishing with Application Signals. Hopefully this saves you some time.
Overview of the Solution
The Three Pillars
Quick refresher on the three pillars of observability:
- Logs: Discrete events with timestamps and context
- Metrics: Numerical measurements aggregated over time
- Traces: Request flows across service boundaries
Each answers different questions. Logs tell you what happened. Metrics tell you how much and how often. Traces tell you where time was spent across services.
AWS Services for Observability
Here's the AWS stack I'm using:
| Service | Purpose |
|---|---|
| CloudWatch Logs | Centralized log aggregation and search |
| CloudWatch Container Insights | Infrastructure metrics for EKS |
| AWS X-Ray | Distributed tracing and service maps |
| Application Signals | Application Performance Monitoring (APM) dashboards and Service Level Objectives (SLOs) |
Architecture Overview
┌─────────────────────────────────────────────────────────────────────┐
│ EKS Cluster │
│ ┌─────────────────┐ ┌──────────────────────────────────────┐ │
│ │ Java App Pod │ │ CloudWatch Observability Add-on │ │
│ │ + ADOT Agent │────▶│ - Fluent Bit (logs) │ │
│ │ │ │ - CloudWatch Agent (metrics/traces) │ │
│ └─────────────────┘ └──────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
│
┌─────────────────┼─────────────────┐
▼ ▼ ▼
CloudWatch Logs X-Ray Application Signals
The AWS Distro for OpenTelemetry (ADOT) agent runs inside your application container, collecting telemetry and sending it to the CloudWatch Observability add-on. From there, data flows to the appropriate AWS services.
Why I Chose ADOT Over Community OpenTelemetry
| Feature | Community Agent | ADOT Agent |
|---|---|---|
| X-Ray trace ID format | ✗ | ✓ |
| AWS resource detection | Basic | Enhanced |
| Application Signals | ✗ | ✓ |
I tried the community agent first, but the X-Ray integration was complicated. ADOT just works - it's AWS-supported and adds the features I actually need.
The Foundation: Logs and Metrics
Setting up logs and metrics on EKS is the easy part. The CloudWatch Observability add-on handles everything:
- Fluent Bit as a DaemonSet for log collection
- CloudWatch agent for metrics and trace forwarding
Quick Setup
One command:
aws eks create-addon \
--cluster-name your-cluster \
--addon-name amazon-cloudwatch-observability \
--no-cli-pager
That's it. Fluent Bit automatically collects container logs from /var/log/containers/ and sends them to CloudWatch Logs. The CloudWatch agent collects infrastructure metrics and exposes them in Container Insights.
What You Get
Logs appear in CloudWatch under /aws/containerinsights/{cluster-name}/application. Container Insights gives you dashboards showing CPU, memory, network, and disk metrics per pod, node, and namespace.
For Java applications, this foundation is necessary but not sufficient. You can see that your pods are running and consuming resources, but you can't see inside the Java Virtual Machine (JVM) or trace requests across services. That's where things get interesting.
Deep Dive: Distributed Tracing
This is where I spent most of my time. Distributed tracing tracks requests as they flow through microservices - essential for debugging latency issues and understanding dependencies.
The OpenTelemetry v2 Changes
In January 2024, OpenTelemetry Java Agent v2.0 introduced significant changes:
| Change | Before (v1) | After (v2) |
|---|---|---|
| Controller spans | Enabled | Disabled by default |
| View spans | Enabled | Disabled by default |
| Micrometer bridge | Enabled | Disabled by default |
| OTLP protocol | gRPC | HTTP/protobuf |
The philosophy shifted from "capture everything" to "opt-in for extras." This reduces noise and overhead, but means you need to explicitly enable features you relied on before. If you upgraded and noticed spans disappeared, this is why.
What Spring Boot 4 Brings
Spring Boot 4.0 introduced spring-boot-starter-opentelemetry with several benefits:
- Native OTLP export for metrics, traces, and logs without additional dependencies
- Seamless bridge between Micrometer observations and OpenTelemetry spans
- Auto-configuration for OpenTelemetry SDK components
- Support for W3C trace context propagation out of the box
This means @Observed annotations now create both Micrometer metrics and OpenTelemetry spans automatically - no manual bridging required.
The Hybrid Solution
Here's what I landed on - combining ADOT auto-instrumentation with manual @Observed annotations:
What ADOT auto-instruments (no code changes needed):
- HTTP clients and servers (Spring MVC, JAX-RS)
- Database clients (JDBC, Hibernate)
- Messaging systems (Kafka, SQS)
- AWS SDK calls (EventBridge, S3, DynamoDB)
What I had to add manually:
- Internal
@Servicemethods (business logic) - Custom spans for specific operations
Understanding Spring's Observability Stack
To make @Observed annotations work, you need to understand three components that work together:
Micrometer Observation API - A unified API that creates both metrics and traces from a single instrumentation point. When you annotate a method with @Observed, the Observation API records:
- A timer metric (how long the method took)
- A trace span (for distributed tracing)
This means one annotation gives you both a Prometheus metric like unicorn.create.duration and a trace span visible in X-Ray - no need to instrument twice.
Micrometer Tracing - The bridge that connects observations to tracing backends. It takes the spans created by the Observation API and exports them via OpenTelemetry (or other formats like Zipkin). Spring Boot 4's spring-boot-starter-opentelemetry auto-configures this bridge, so spans flow to your OTLP endpoint without manual wiring.
AspectJ weaving - The mechanism that intercepts @Observed method calls at runtime. When your code calls an @Observed method, AspectJ wraps the call to:
- Start a timer and create a span before the method executes
- Record the duration and close the span after the method completes
- Capture exceptions if the method fails
Without AspectJ, the @Observed annotation is just metadata sitting on your method - nothing actually captures timing or creates spans.
Enabling @Observed Annotations
Add these dependencies to your pom.xml:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-opentelemetry</artifactId>
</dependency>
<dependency>
<groupId>org.aspectj</groupId>
<artifactId>aspectjweaver</artifactId>
</dependency>
Then annotate your business logic methods:
@Observed(name = "unicorn.create")
@Transactional
public Unicorn createUnicorn(Unicorn unicorn) {
// Business logic - now traced!
}
This gives you both a Prometheus metric (unicorn.create.duration) and an X-Ray span from a single annotation.
Dockerfile Configuration
Here's the Dockerfile configuration I use:
FROM public.ecr.aws/docker/library/amazoncorretto:21-al2023
# Download ADOT agent
ADD https://github.com/aws-observability/aws-otel-java-instrumentation/releases/latest/download/aws-opentelemetry-agent.jar /opt/aws-opentelemetry-agent.jar
# Attach agent to JVM
ENV JAVA_TOOL_OPTIONS="-javaagent:/opt/aws-opentelemetry-agent.jar"
# OTLP configuration
ENV OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf"
ENV OTEL_METRICS_EXPORTER="none"
ENV OTEL_LOGS_EXPORTER="none"
# Traces configuration
ENV OTEL_TRACES_SAMPLER="always_on"
ENV OTEL_PROPAGATORS="tracecontext,baggage,xray"
ENV OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="http://cloudwatch-agent.amazon-cloudwatch:4316/v1/traces"
# Application Signals
ENV OTEL_AWS_APPLICATION_SIGNALS_ENABLED="true"
ENV OTEL_AWS_APPLICATION_SIGNALS_EXPORTER_ENDPOINT="http://cloudwatch-agent.amazon-cloudwatch:4316/v1/metrics"
COPY app.jar /app.jar
ENTRYPOINT ["java", "-jar", "/app.jar"]
Why these specific settings:
-
OTEL_METRICS_EXPORTER=none: I use Prometheus for metrics, not OTLP -
OTEL_LOGS_EXPORTER=none: CloudWatch Logs handles logs via Fluent Bit -
OTEL_PROPAGATORS: Includesxrayfor X-Ray trace ID format - Endpoints point to the CloudWatch agent service in the
amazon-cloudwatchnamespace
What Gets Traced
With this setup, X-Ray shows the complete request flow:
| Component | Auto-traced | Manual (@Observed) |
|---|---|---|
| HTTP Server (Spring MVC) | ✓ | |
| JDBC (PostgreSQL) | ✓ | |
| AWS SDK (EventBridge, S3) | ✓ | |
| Business logic methods | ✓ |
Each trace shows end-to-end latency, time spent in each component, database queries, and external API calls. Finally, full visibility from HTTP request to database and back.
The Capstone: Application Signals
This is where it all comes together. Application Signals is AWS's APM solution that builds on top of traces. It automatically discovers services, collects telemetry, and displays key metrics without additional instrumentation.
Why I Like Application Signals
Traditional APM tools require vendor-specific agents and create lock-in. Application Signals uses standard OpenTelemetry instrumentation - the same ADOT agent configuration that enables tracing also enables Application Signals.
Just set OTEL_AWS_APPLICATION_SIGNALS_ENABLED=true and you're done. No additional code changes.
Service Health Dashboards
Application Signals automatically creates dashboards showing:
- Request volume, latency (p50, p90, p99), and error rates
- Service operations breakdown (which endpoints are slow?)
- Service dependencies (what does this service call?)
These dashboards appear automatically once traffic flows through your instrumented application. No manual configuration needed - this was a pleasant surprise.
Service Level Objectives
SLOs monitor service reliability against defined targets. For example: "POST /unicorns will achieve latency under 1000ms for 99.9% of requests over a 14-day period."
I've set up SLOs for my critical endpoints and connected them to CloudWatch alarms. When an SLO is at risk, I get notified early.
Runtime Metrics
Application Signals also collects JVM runtime metrics:
- Heap and non-heap memory usage
- Garbage collection frequency and pause times
- Thread count and states
- CPU usage
These metrics have helped me catch memory leaks and GC pressure issues that weren't visible from infrastructure metrics alone.
Trade-offs and Considerations
Cost
CloudWatch pricing applies to logs, metrics, traces, and Application Signals. For high-traffic applications, costs can add up. Here's what I do:
- Log retention policies (I don't keep DEBUG logs forever)
- Trace sampling for high-volume services
- Standard metric resolution (high-resolution only where needed)
Performance
The ADOT agent adds approximately 10-50ms to application startup time. Runtime overhead is minimal in my testing.
AspectJ weaving for @Observed annotations adds 1-2 seconds to startup. Worth it for the visibility it provides.
Complexity
The hybrid instrumentation approach (ADOT auto-instrumentation + manual @Observed) requires understanding what gets traced automatically versus what needs explicit annotation.
Conclusion
Here's the observability journey I've landed on for Java on Amazon EKS:
- Logs and metrics via the CloudWatch Observability add-on provide the foundation
- Distributed tracing with ADOT and
@Observedannotations gives visibility into request flows - Application Signals delivers APM-level insights without vendor lock-in
The key insight: these layers build on each other. Application Signals needs traces. Traces need proper instrumentation. Proper instrumentation requires understanding the OpenTelemetry v2 changes.
If you want to try this yourself with complete code examples, check out the Java on AWS Immersion Day workshop. It's what I used to build my initial setup.






Top comments (0)