Pallavi Mudkhede

Posted on Jul 1

How I Built Sherlog: an AI Log Analyzer with RAG, Spring AI, Groq & pgvector

#java #springboot #tutorial #ai

Every developer knows the feeling: production throws an error, and you're staring at a
wall of stack-trace text trying to find the one line that matters. So I built Sherlog —
an AI "log detective" that reads an application log, figures out the root cause, and hands
you a step-by-step fix as clean JSON. The twist: it doesn't just ask an LLM blindly. It uses
RAG (Retrieval-Augmented Generation) to ground every answer in a knowledge base of past
incidents.

In this post I'll walk through how it works and the real lessons I learned building it.

🔗 Repo: github.com/pallavimudkhede21/Sherlog

The stack

Java 21 + Spring Boot 4.1 — the backend
Spring AI 2.0 — the LLM framework (ChatClient, structured output, RAG advisors)
Groq (llama-3.1-8b-instant) — fast, OpenAI-compatible LLM inference
Local ONNX embeddings (all-MiniLM-L6-v2) — text → vectors, in-process, free
PostgreSQL + pgvector — the vector database (in Docker)

The idea: from a chatbot to a RAG system

A naive version just sends the log to an LLM and prints the answer. It works, but the advice
is generic — the model only knows its training data and the single log you pasted.

RAG changes that. Before asking the LLM, we retrieve relevant knowledge you own — past
incidents and their proven fixes — and augment the prompt with them. Now the model answers
grounded in your reality.

Here's the whole pipeline:

log → embed (local MiniLM) → search pgvector (top-3 incidents)
    → inject into prompt → Groq (JSON mode) → typed response

Part 1 — Structured output with Spring AI

The first win is getting typed JSON back from the LLM instead of parsing free text. Spring
AI's ChatClient does this with .entity():

return chat.prompt()
    .system(SYSTEM_PROMPT)
    .user(u -> u.text("Analyze these logs:\n\n{logs}").param("logs", request.getLogs()))
    .options(OpenAiChatOptions.builder()
        .responseFormat(OpenAiChatModel.ResponseFormat.builder()
            .type(OpenAiChatModel.ResponseFormat.Type.JSON_OBJECT).build()))
    .call()
    .entity(LogAnalysisResponse.class);   // ← schema + parsing, automatic

.entity(LogAnalysisResponse.class) generates a JSON schema from the POJO, tells the model to
honor it, and maps the reply straight onto the object. Groq's JSON mode
(response_format: json_object) forces valid JSON so the mapping never fails on prose.

Part 2 — Embeddings + pgvector

To find "similar past incidents," we don't keyword-match — we compare meaning. An embedding
model turns text into a vector (a list of numbers); similar meaning → nearby vectors.

I run the embedding model locally with Spring AI's Transformers starter (ONNX
all-MiniLM-L6-v2, 384 dimensions) — no embedding API, no cost. The vectors live in
pgvector, a Postgres extension:

docker run -d --name pgvector-db \
  -e POSTGRES_PASSWORD=postgres -e POSTGRES_DB=lograg \
  -p 5432:5432 pgvector/pgvector:pg17

Spring AI auto-creates the vector_store table (with an HNSW cosine index) on startup, and a
loader seeds it from a small incidents.json:

vectorStore.add(documents);   // embeds each text and stores the vector

Part 3 — Wiring RAG in one line

This is the magic. Spring AI has a purpose-built QuestionAnswerAdvisor that does the
retrieve-and-augment step automatically:

.advisors(QuestionAnswerAdvisor.builder(vectorStore)
    .searchRequest(SearchRequest.builder().topK(3).build())
    .build())

Add that to the ChatClient call and every request now retrieves the 3 most similar past
incidents and injects them into the prompt before Groq answers. That's the whole "R" and "A"
of RAG in one line.

The payoff: does RAG actually help?

I added a toggle (?rag=true|false) to measure it. Same connection-timeout log, both ways:

RAG OFF: "Increase the maximum pool size or adjust the connection timeout." (vague)

RAG ON: "Increase spring.datasource.hikari.maximum-pool-size, ensure connections are
closed, and investigate long-running transactions." (specific — pulled from the knowledge base)

Same model, same log. The only difference is retrieval — and the grounded answer is measurably
more actionable.

The hard-won lessons

The tutorial makes it look smooth. It wasn't. The real lessons:

Check library ↔ framework versions before you adopt. Spring Boot 4 uses Jackson 3 (tools.jackson), not Jackson 2, and needs Spring AI 2.0. Mixing versions cost me hours.
Read the error literally. A 404 Unknown request URL was just a missing /v1 in the base URL — Spring AI 2.0 changed the convention from the 1.x docs.
Structured output isn't magic. A small model returns prose unless you force JSON mode.
Trust the jar, not the blog. When an import failed, javap on the actual jar showed the class had moved to a nested type between milestone and GA. Reading bytecode beats guessing.

Try it

The full source is on GitHub with a README that walks through setup:

🔗 github.com/pallavimudkhede21/Sherlog

If you're learning Spring AI, RAG, or pgvector, clone it and poke around — the
?rag=true|false toggle is a fun way to see what retrieval actually buys you.

Built with Spring Boot 4, Spring AI 2.0, Groq, and pgvector. Questions welcome in the comments!

DEV Community