DEV Community

Markus
Markus

Posted on • Originally published at myfear.substack.com on

From Black Box to Blueprint: Tracing Every LLM Decision with Quarkus

Tracing LLM Calls

In a world of agentic AI, knowing what your model did isn't enough. You need to know why. This is the tutorial that shows you how.

Modern LLMs don’t just spit out answers. They call tools, self-correct, and loop through logic like mini-autonomous systems. But what happens under the hood of those multi-step decisions? How can a developer trust or even debug an LLM-powered feature if the model's chain of thought is invisible?

I thought about tracing calls and representing them as diagrams since a while. Back at Devoxx UK, my amazing colleague Bruno implemented a trace with Camel and in this hands-on guide, I’ll build a transparent, inspectable agentic AI system with based on:

  • Quarkus as the reactive Java runtime

  • Langchain4j for structured AI service composition

  • Ollama for local LLM inference

  • CDI interceptors for non-invasive observability

  • Mermaid.js to render decision flowcharts

By the end, you’ll have a full-stack system that traces every step of an LLM interaction. From the prompt to tool invocations to guardrail corrections and renders it as a shareable graph.

Bootstrapping the LLM-Ready Quarkus App

Quarkus provides a command-line interface (CLI) that simplifies project creation and extension management. The project will be created with all necessary dependencies from the start. Execute the following command in a terminal to generate a new Maven-based Quarkus project:

quarkus create app com.example:llm-observability \
    --extension='rest-jackson,quarkus-langchain4j-ollama' \
    --no-code
cd llm-observability
Enter fullscreen mode Exit fullscreen mode

And if you want to get a headstart and just gab the complete project, make sure to check out the Github repository.

Configure application.properties

The quarkus-langchain4j-ollama extension enables a declarative configuration approach through the application.properties file located in src/main/resources. This file is the central point for telling the Quarkus application how to connect to and interact with the Ollama service.

Add the following properties to src/main/resources/application.properties:

quarkus.langchain4j.ollama.chat-model.model-id=llama3.1:latest
quarkus.langchain4j.ollama.timeout=60s
quarkus.langchain4j.ollama.log-requests=true
quarkus.langchain4j.ollama.log-responses=true
quarkus.langchain4j.ollama.devservices.enabled=false

Enter fullscreen mode Exit fullscreen mode

This setup gives you a declarative AI integration with no manual HTTP code or JSON parsing required.

Creating the Conversational AI Core

With the project configured, the next step is to build the simplest form of interaction: a basic chatbot. This involves creating a declarative Langchain4j AI Service and a JAX-RS resource to expose it via an HTTP endpoint. This service will serve as the foundational component upon which the more complex features of tools, guardrails, and tracing will be layered.

AI Service Interface

The core of the quarkus-langchain4j integration is a powerful declarative programming model. Instead of writing a traditional service class with imperative logic, developers define a Java interface and annotate it. Quarkus and Langchain4j then work together to generate the implementation at build time.

Create a new Java interface named ChatbotAiService.java in the org.acme package with the following content:

package org.acme;

import dev.langchain4j.service.UserMessage;
import io.quarkiverse.langchain4j.RegisterAiService;
import io.quarkiverse.langchain4j.guardrails.InputGuardrails;
import io.quarkiverse.langchain4j.guardrails.OutputGuardrails;

@RegisterAiService(tools = CalculatorTools.class)
public interface ChatbotAiService {

    @InputGuardrails(BannedWordGuard.class)
    @OutputGuardrails(ConcisenessGuard.class)
    String chat(@UserMessage String userMessage);
}
Enter fullscreen mode Exit fullscreen mode

Note: I have already included an Input and an Output Guardrail, as well as a tool-call here. We will implement them later on.

REST Endpoint

To make the AI service accessible from the outside world, a standard JAX-RS REST resource is needed. This resource will expose an HTTP endpoint that clients can call.

Create a new class named ChatResource.java in the org.acme package:

package org.acme;

import org.acme.tracing.LLMCallTracking;

import jakarta.inject.Inject;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;

@Path("/chat")
public class ChatResource {

    @Inject
    ChatbotAiService chatbot;

    @POST
    @Consumes(MediaType.TEXT_PLAIN)
    @Produces(MediaType.TEXT_PLAIN)
    @LLMCallTracking
    public String chat(String message) {
        return chatbot.chat(message);
    }
}
Enter fullscreen mode Exit fullscreen mode

Note: You can already see the @LLMCallTracking interceptor binding that we will cover later on in the tutorial.

Adding Tool Calling Support

This section elevates the simple chatbot into a more capable "agent" by granting it the ability to use external tools. Tools are functions that the LLM can invoke to perform tasks it cannot do on its own, such as performing precise calculations or accessing real-time data from external APIs. This is the first step toward building a system whose behavior is not fully determined by a single prompt but emerges from a reasoning process.

A new CDI bean will be created to house the tool methods.

Create a new class named CalculatorTools.java in the org.acme package:

package org.acme;

import org.acme.tracing.LLMCallTracking;

import dev.langchain4j.agent.tool.Tool;
import io.quarkus.logging.Log;
import jakarta.enterprise.context.ApplicationScoped;

@ApplicationScoped
public class CalculatorTools {

    @Tool("Calculates the sum of two numbers, 'a' and 'b'.")
    @LLMCallTracking
    public double add(double a, double b) {
        Log.infof("Tool executed: add(%.2f, %.2f)%n", a, b);
        return a + b;
    }

    @Tool("Calculates the difference between two numbers, 'a' and 'b'.")
    @LLMCallTracking
    public double subtract(double a, double b) {
        Log.infof("Tool executed: subtract(%.2f, %.2f)%n", a, b);
        return a - b;
    }
}
Enter fullscreen mode Exit fullscreen mode

Key elements of this class are:

  • @ApplicationScoped: This makes the class a CDI bean, allowing it to be managed by the Quarkus container and discovered by the Langchain4j framework.

  • @Tool: This Langchain4j annotation marks a method as an available tool for the LLM.

  • Tool Description : The string provided to the @Tool annotation (e.g., "Calculates the sum of two numbers, 'a' and 'b'.") is critically important. This natural language description is what Langchain4j sends to the LLM as part of the tool's specification. The LLM uses this description to understand what the tool does and decide whether to call it. A clear and precise description significantly increases the likelihood that the LLM will use the tool correctly.

  • @LLMCallTrackingalready introduces the interceptor we will be developing later.

Enforcing Guardrails

Now that the agent can act by using tools, its behavior must be constrained. This section introduces guardrails, a mechanism to enforce rules on both the LLM's input and its output. This tutorial will implement a particularly powerful feature: the ability for a guardrail to detect a faulty response and automatically "reprompt" the LLM to correct itself, demonstrating a dynamic, self-correction loop.

Input Guardrail

An input guardrail will be created to demonstrate how to validate and reject user prompts before they are sent to the LLM.

Create a new class BannedWordGuard.java in the org.acme package:

package org.acme;

import org.acme.tracing.LLMCallTracking;

import dev.langchain4j.data.message.UserMessage;
import io.quarkiverse.langchain4j.guardrails.InputGuardrail;
import io.quarkiverse.langchain4j.guardrails.InputGuardrailResult;
import jakarta.enterprise.context.ApplicationScoped;

@ApplicationScoped
@LLMCallTracking
public class BannedWordGuard implements InputGuardrail {

    @Override
    public InputGuardrailResult validate(UserMessage userMessage) {
        String text = userMessage.singleText();
        if (text.toLowerCase().contains("politics")) {
            return fatal("This topic is not allowed.");
        }
        return success();
    }
}
Enter fullscreen mode Exit fullscreen mode

This guardrail checks if the user's message contains the word "politics". If it does, it returns a fatal result, which immediately stops the processing chain and prevents the LLM from being called. The message "This topic is not allowed." would be propagated back as an error.

Output Guardrail with Reprompting

This is a key part of the tutorial, demonstrating the dynamic self-correction capability. An output guardrail will be created to check if the LLM's response is too verbose. If it is, instead of simply failing, it will trigger a reprompt.

Create a new class ConcisenessGuard.java in the org.acme package:

package org.acme;

import dev.langchain4j.data.message.AiMessage;
import io.quarkiverse.langchain4j.guardrails.OutputGuardrail;
import io.quarkiverse.langchain4j.guardrails.OutputGuardrailResult;
import jakarta.enterprise.context.ApplicationScoped;

@ApplicationScoped
public class ConcisenessGuard implements OutputGuardrail {

    private static final int MAX_LENGTH = 1500;

    @Override
    public OutputGuardrailResult validate(AiMessage aiMessage) {
        String text = aiMessage.text();
        // Allow empty content (e.g., when AI is making tool calls)
        if (text == null || text.isBlank()) {
            return success();
        }

        if (text.length() > MAX_LENGTH) {
            return reprompt("Response is too long.", "Please be more concise.");
        }
        return success();
    }
}
Enter fullscreen mode Exit fullscreen mode

Full-Stack Observability via CDI Interceptors

This section is the heart of the tutorial, where the core observability mechanism is built. A Quarkus CDI interceptor will be implemented to capture high-level information about the chat interactions and store it for later visualization. The architectural choice of using an interceptor over other alternatives will be justified as the most suitable for this application's needs.

Define a Tracing Annotation

A well-designed data model is essential for capturing the rich, structured context of the conversation. The following Java records, as specified in the initial prompt, will serve as the data transfer objects for the tracing system.

Create a new file TraceData.java in the org.acme.tracing package:

package org.acme.tracing;

import java.time.Duration;
import java.time.LocalDateTime;
import java.util.List;
import java.util.Map;

public class TraceData {

        public record ConversationTrace(
                        String conversationId,
                        LocalDateTime startTime,
                        List<LLMInteraction> interactions,
                        List<ToolCall> toolCalls,
                        List<GuardrailViolation> violations,
                        Map<String, Object> metadata) {
        }

        public record LLMInteraction(
                        String prompt,
                        String response,
                        String model,
                        Integer inputTokenCount,
                        Integer outputTokenCount,
                        Duration duration) {
        }

        public record ToolCall(
                        String toolName,
                        String params,
                        String result,
                        Duration duration) {
        }

        public record GuardrailViolation(
                        String guardrail,
                        String violation,
                        String reprompt) {
        }
}
Enter fullscreen mode Exit fullscreen mode

Store Traces In-Memory

To associate the interceptor with the methods it should target, a custom annotation, known as an interceptor binding, is required.

Create a new annotation LLMCallTracking.java in the org.acme.tracing package:

package org.acme.tracing;

import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;

import jakarta.interceptor.InterceptorBinding;

@InterceptorBinding
@Retention(RetentionPolicy.RUNTIME)
@Target({ ElementType.TYPE, ElementType.METHOD })
public @interface LLMCallTracking {
}
Enter fullscreen mode Exit fullscreen mode

Implementing the LLMCallTracker and LLMCallInterceptor

The tracing system will consist of two main components: a tracker service to store the traces and the interceptor itself to capture the events.

First, create the LLMCallTracker service. This @ApplicationScoped bean will hold the conversation traces in memory.

Create LLMCallTracker.java in the org.acme.tracing package:

package org.acme.tracing;

import java.time.LocalDateTime;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Map;
import java.util.Optional;
import java.util.concurrent.ConcurrentHashMap;

import org.acme.tracing.TraceData.ConversationTrace;
import org.acme.tracing.TraceData.GuardrailViolation;
import org.acme.tracing.TraceData.LLMInteraction;
import org.acme.tracing.TraceData.ToolCall;

import jakarta.enterprise.context.ApplicationScoped;

@ApplicationScoped
public class LLMCallTracker {
    private final Map<String, ConversationTrace> activeTraces = new ConcurrentHashMap<>();

    public void startTrace(String conversationId, String initialPrompt) {
        activeTraces.computeIfAbsent(conversationId, id -> new ConversationTrace(
                id,
                LocalDateTime.now(),
                Collections.synchronizedList(new ArrayList<>()),
                Collections.synchronizedList(new ArrayList<>()),
                Collections.synchronizedList(new ArrayList<>()),
                new ConcurrentHashMap<>()));
    }

    public void recordLLMInteraction(String conversationId, LLMInteraction interaction) {
        Optional.ofNullable(activeTraces.get(conversationId))
                .ifPresent(trace -> trace.interactions().add(interaction));
    }

    public void recordToolCall(String conversationId, ToolCall toolCall) {
        Optional.ofNullable(activeTraces.get(conversationId))
                .ifPresent(trace -> trace.toolCalls().add(toolCall));
    }

    public void recordGuardrailViolation(String conversationId, GuardrailViolation violation) {
        Optional.ofNullable(activeTraces.get(conversationId))
                .ifPresent(trace -> trace.violations().add(violation));
    }

    public Optional<ConversationTrace> getTrace(String conversationId) {
        return Optional.ofNullable(activeTraces.get(conversationId));
    }
}
Enter fullscreen mode Exit fullscreen mode

Next, create the LLMCallInterceptor in the org.acme.tracing package:

package org.acme.tracing;

import java.time.Duration;
import java.time.Instant;
import java.util.UUID;

import org.acme.tracing.TraceData.GuardrailViolation;
import org.acme.tracing.TraceData.LLMInteraction;
import org.acme.tracing.TraceData.ToolCall;

import com.fasterxml.jackson.databind.ObjectMapper;

import dev.langchain4j.agent.tool.Tool;
import io.quarkiverse.langchain4j.guardrails.GuardrailResult;
import io.quarkiverse.langchain4j.guardrails.OutputGuardrailResult;
import io.quarkus.logging.Log;
import jakarta.annotation.Priority;
import jakarta.inject.Inject;
import jakarta.interceptor.AroundInvoke;
import jakarta.interceptor.Interceptor;
import jakarta.interceptor.InvocationContext;
import jakarta.ws.rs.POST;

@LLMCallTracking
@Interceptor
@Priority(Interceptor.Priority.APPLICATION + 1)
public class LLMCallInterceptor {

    @Inject
    LLMCallTracker tracker;

    @Inject
    RequestCorrelation correlation;

    @Inject
    ObjectMapper mapper; // For serializing tool parameters

    @AroundInvoke
    public Object track(InvocationContext context) throws Exception {
        // Check if this is the entry point (the JAX-RS method)
        if (context.getMethod().isAnnotationPresent(POST.class)) {
            String conversationId = UUID.randomUUID().toString();
            Log.info("CONVERSATION ID: " + conversationId);
            correlation.setConversationId(conversationId);
            tracker.startTrace(conversationId, (String) context.getParameters()[0]);
        }

        String conversationId = correlation.getConversationId();
        if (conversationId == null) {
            // Not part of a tracked conversation, proceed without tracking
            return context.proceed();
        }

        Instant start = Instant.now();
        Object result = null;
        try {
            result = context.proceed();
            return result;
        } finally {
            Instant end = Instant.now();
            Duration duration = Duration.between(start, end);

            // Differentiate based on the type of method intercepted
            if (context.getMethod().isAnnotationPresent(Tool.class)) {
                handleToolCall(context, conversationId, result, duration);
            } else if (result instanceof GuardrailResult) {
                handleGuardrail(context, conversationId, (GuardrailResult) result);
            } else if (context.getMethod().isAnnotationPresent(POST.class)) {
                handleLLMInteraction(context, conversationId, (String) result, duration);
            }
        }
    }

    private void handleLLMInteraction(InvocationContext context, String conversationId, String response,
            Duration duration) {
        LLMInteraction interaction = new LLMInteraction(
                (String) context.getParameters()[0],
                response,
                "ollama:llama3",
                null, null,
                duration);
        tracker.recordLLMInteraction(conversationId, interaction);
    }

    private void handleToolCall(InvocationContext context, String conversationId, Object result, Duration duration) {
        String paramsJson;
        try {
            paramsJson = mapper.writeValueAsString(context.getParameters());
        } catch (Exception e) {
            paramsJson = "Error serializing params: " + e.getMessage();
        }

        ToolCall toolCall = new ToolCall(
                context.getMethod().getName(),
                paramsJson,
                String.valueOf(result),
                duration);
        tracker.recordToolCall(conversationId, toolCall);
    }

    private void handleGuardrail(InvocationContext context, String conversationId, GuardrailResult result) {
        if (!result.isSuccess()) {
            String reprompt = null;
            if (result instanceof OutputGuardrailResult) {
                reprompt = "Reprompt triggered"; // Simple fallback
            }

            GuardrailViolation violation = new GuardrailViolation(
                    context.getTarget().getClass().getSimpleName(),
                    "Guardrail violation detected", // Simple fallback message
                    reprompt);
            tracker.recordGuardrailViolation(conversationId, violation);
        }
    }
}

Enter fullscreen mode Exit fullscreen mode

This gives you AOP-style tracing without polluting business logic.

Correlating Events with Context

With the interceptor firing in multiple places, the next challenge is to distinguish between these different events and correlate them to the same conversation. This requires enhancing the interceptor to be context-aware and establishing a mechanism to pass the conversationId.

A CDI Request Scoped bean is an excellent way to hold the conversationId for the duration of a single HTTP request.

Create RequestCorrelation.java in the org.acme.tracing package:

package org.acme.tracing;

import jakarta.enterprise.context.RequestScoped;

@RequestScoped
public class RequestCorrelation {
    private String conversationId;

    public String getConversationId() {
        return conversationId;
    }

    public void setConversationId(String conversationId) {
        this.conversationId = conversationId;
    }
}
Enter fullscreen mode Exit fullscreen mode

This multi-target interception strategy, combined with a request-scoped correlation ID, is a powerful and general-purpose pattern. It allows the system to stitch together disparate invocations that occur across loosely coupled CDI beans during a single request, forming a single, coherent narrative of the entire workflow. This technique is not limited to LLM tracing; it can be applied to achieve deep observability in any complex CDI-based application, for instance, to trace a request through multiple internal services, database calls, and external API integrations.

Visualizing with Mermaid.js

With all the trace data being meticulously collected and correlated, this final implementation section focuses on the "last mile": transforming the raw ConversationTrace object into a human-readable Mermaid.js flowchart and exposing it via a REST API. This will provide the intuitive visualization that makes the complex internal processes understandable at a glance.

Mermaid Generator

A dedicated service will be created to encapsulate the logic for converting a ConversationTrace into a Mermaid graph definition. This separation of concerns makes the system cleaner and more maintainable.

Create a new class MermaidGraphGenerator.java in the org.acme.tracing package. I will spare you the details. Grab the full example from my Github repository.

@ApplicationScoped
public class MermaidGraphGenerator {
    public String generate(ConversationTrace trace) {
        // Turn trace into Mermaid.js flowchart string
    }
}
Enter fullscreen mode Exit fullscreen mode

This generator iterates through the collected events in the ConversationTrace and constructs a valid Mermaid.js graph definition string, following the flowchart syntax. It uses unique identifiers for each node to build the graph structure.

Expose via REST

A new JAX-RS resource is needed to expose the tracing data. It will provide two endpoints: one for the raw JSON trace and another for the generated Mermaid diagram.

Create a new class LLMTraceResource.java in the org.acme.tracing package:

package org.acme.tracing;

import jakarta.inject.Inject;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.PathParam;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
import jakarta.ws.rs.core.Response;

@Path("/llm-traces")
public class LLMTraceResource {

    @Inject
    LLMCallTracker tracker;

    @Inject
    MermaidGraphGenerator mermaidGenerator;

    @GET
    @Path("/{conversationId}/trace")
    @Produces(MediaType.APPLICATION_JSON)
    public Response getTrace(@PathParam("conversationId") String id) {
        return tracker.getTrace(id)
                .map(trace -> Response.ok(trace).build())
                .orElse(Response.status(Response.Status.NOT_FOUND).build());
    }

    @GET
    @Path("/{conversationId}/mermaid")
    @Produces(MediaType.TEXT_PLAIN)
    public Response getMermaidDiagram(@PathParam("conversationId") String id) {
        return tracker.getTrace(id)
                .map(mermaidGenerator::generate)
                .map(mermaid -> Response.ok(mermaid).build())
                .orElse(Response.status(Response.Status.NOT_FOUND).build());
    }
}
Enter fullscreen mode Exit fullscreen mode

I have also included a simple html website into /META-INF/resources to make it easier to display the generated Mermaid diagram.

Time to put all of this to work and see the result.

quarkus dev
Enter fullscreen mode Exit fullscreen mode

After a while you see Quarkus being ready and accepting requests.

curl -X POST -H "Content-Type: text/plain" \
-d "What is 100 minus 18? Answer in a concise sentence." \
http://localhost:8080/chat
Enter fullscreen mode Exit fullscreen mode

Grab the conversation ID from the log. Something like:

CONVERSATION ID: dd0b0cec-6889-42c7-9f04-f9d24d31674d

And point your browser to http://localhost:8080 where you can paste the conversation ID and see the resulting flow chart.

Mermaid Diagram

And please be aware that this is a very simplistic approach with a lot of features to be desired, but I wanted to make sure to give this a try and see how far I can push it. Feel free to play around with it and see if you can trace other happenings or even see an asynchronous call happening (LLM invoking more than one Tool at once).

Bonus: More Feature Ideas

  • Add some test coverage ;)

  • Use a persistent store (e.g., MongoDB) for LLMCallTracker

  • Use OpenTelemetry for distributed traces

  • Offload trace recording to async workers

  • Scrub or mask sensitive prompts before storing

Final Words

If you're building AI agents in Java, this pattern can be your observability blueprint. Not just logs, not just metrics, this is the story of every decision your AI makes, rendered in real time.

Now go ahead and make that black box transparent.

Top comments (0)