Your Agent Loop Just Cost $1,000: Instrumenting Spring AI with OpenTelemetry GenAI Conventions

#java #ai #llm #systemdesign

Your Agent Loop Just Cost $1,000: Instrumenting Spring AI with OpenTelemetry GenAI Conventions

In 2026, deploying multi-agent systems without strict observability is a fast track to explaining a five-figure cloud bill to your CTO. If you aren't tracing token consumption down to the individual agent step using standardized telemetry, you are flying blind in production.

Why Most Developers Get This Wrong

Relying on custom JSON log-parsing hacks: Developers waste weeks writing custom log parsers to extract token counts, which inevitably break when model providers update their payload schemas.
Treating agent loops as generic HTTP dependencies: Standard HTTP span metrics only tell you that a call happened; they hide the critical recursive tool-calling chains that spiral into infinite loops.
Ignoring OTel standards: Building proprietary metric schemas instead of adopting the standardized OpenTelemetry GenAI Semantic Conventions creates massive vendor lock-in.

The Right Way

Standardize your observability stack by binding Spring AI's native Micrometer Observation API directly to OpenTelemetry's GenAI semantic conventions to capture token-level metrics automatically.

Use standardized attributes: Map model executions to standard OTel attributes like gen_ai.request.model, gen_ai.usage.prompt_tokens, and gen_ai.usage.completion_tokens.
Correlate spans with Trace Context: Propagate W3C trace headers through your vector database queries, tool executions, and LLM calls to visualize the entire agent lifecycle in one trace.
Enforce runtime budget limits: Intercept the Observation lifecycle to dynamically kill trace contexts when an agent's cumulative token cost exceeds a predefined threshold.

Show Me The Code

Configure Spring AI's ChatClient with native observation support and custom token-tracking advisors:

@Configuration
public class ObservabilityConfig {

    @Bean
    public ChatClient chatClient(ChatModel chatModel, ObservationRegistry registry) {
        return ChatClient.builder(chatModel)
            .observationRegistry(registry)
            .defaultAdvisors(
                new SimpleLoggerAdvisor(),
                // Native Spring AI advisor capturing OTel GenAI conventions
                new TokenUsageTrackingAdvisor() 
            )
            .build();
    }
}

Key Takeaways

Stop reinventing the wheel: Let Spring AI's integration with Micrometer map directly to OpenTelemetry's standard gen_ai.* semantic attributes.
Alert on token burn rates: Set up real-time alerting on the rate of gen_ai.client.token.usage metrics to catch rogue agent loops before they drain your budget.
Trace the entire chain: Ensure your vector stores (e.g., PgVector) and custom tools are instrumented so you can pinpoint whether high latency stems from the model or your retrieval step.