Every developer knows the feeling: production throws an error, and you're staring at a
wall of stack-trace text trying to find the one line that matters. So I built Sherlog —
an AI "log detective" that reads an application log, figures out the root cause, and hands
you a step-by-step fix as clean JSON. The twist: it doesn't just ask an LLM blindly. It uses
RAG (Retrieval-Augmented Generation) to ground every answer in a knowledge base of past
incidents.
In this post I'll walk through how it works and the real lessons I learned building it.
🔗 Repo: github.com/pallavimudkhede21/Sherlog
The stack
- Java 21 + Spring Boot 4.1 — the backend
- Spring AI 2.0 — the LLM framework (ChatClient, structured output, RAG advisors)
-
Groq (
llama-3.1-8b-instant) — fast, OpenAI-compatible LLM inference -
Local ONNX embeddings (
all-MiniLM-L6-v2) — text → vectors, in-process, free - PostgreSQL + pgvector — the vector database (in Docker)
The idea: from a chatbot to a RAG system
A naive version just sends the log to an LLM and prints the answer. It works, but the advice
is generic — the model only knows its training data and the single log you pasted.
RAG changes that. Before asking the LLM, we retrieve relevant knowledge you own — past
incidents and their proven fixes — and augment the prompt with them. Now the model answers
grounded in your reality.
Here's the whole pipeline:
log → embed (local MiniLM) → search pgvector (top-3 incidents)
→ inject into prompt → Groq (JSON mode) → typed response
Part 1 — Structured output with Spring AI
The first win is getting typed JSON back from the LLM instead of parsing free text. Spring
AI's ChatClient does this with .entity():
return chat.prompt()
.system(SYSTEM_PROMPT)
.user(u -> u.text("Analyze these logs:\n\n{logs}").param("logs", request.getLogs()))
.options(OpenAiChatOptions.builder()
.responseFormat(OpenAiChatModel.ResponseFormat.builder()
.type(OpenAiChatModel.ResponseFormat.Type.JSON_OBJECT).build()))
.call()
.entity(LogAnalysisResponse.class); // ← schema + parsing, automatic
.entity(LogAnalysisResponse.class) generates a JSON schema from the POJO, tells the model to
honor it, and maps the reply straight onto the object. Groq's JSON mode
(response_format: json_object) forces valid JSON so the mapping never fails on prose.
Part 2 — Embeddings + pgvector
To find "similar past incidents," we don't keyword-match — we compare meaning. An embedding
model turns text into a vector (a list of numbers); similar meaning → nearby vectors.
I run the embedding model locally with Spring AI's Transformers starter (ONNX
all-MiniLM-L6-v2, 384 dimensions) — no embedding API, no cost. The vectors live in
pgvector, a Postgres extension:
docker run -d --name pgvector-db \
-e POSTGRES_PASSWORD=postgres -e POSTGRES_DB=lograg \
-p 5432:5432 pgvector/pgvector:pg17
Spring AI auto-creates the vector_store table (with an HNSW cosine index) on startup, and a
loader seeds it from a small incidents.json:
vectorStore.add(documents); // embeds each text and stores the vector
Part 3 — Wiring RAG in one line
This is the magic. Spring AI has a purpose-built QuestionAnswerAdvisor that does the
retrieve-and-augment step automatically:
.advisors(QuestionAnswerAdvisor.builder(vectorStore)
.searchRequest(SearchRequest.builder().topK(3).build())
.build())
Add that to the ChatClient call and every request now retrieves the 3 most similar past
incidents and injects them into the prompt before Groq answers. That's the whole "R" and "A"
of RAG in one line.
The payoff: does RAG actually help?
I added a toggle (?rag=true|false) to measure it. Same connection-timeout log, both ways:
RAG OFF: "Increase the maximum pool size or adjust the connection timeout." (vague)
RAG ON: "Increase spring.datasource.hikari.maximum-pool-size, ensure connections are
closed, and investigate long-running transactions." (specific — pulled from the knowledge base)
Same model, same log. The only difference is retrieval — and the grounded answer is measurably
more actionable.
The hard-won lessons
The tutorial makes it look smooth. It wasn't. The real lessons:
-
Check library ↔ framework versions before you adopt. Spring Boot 4 uses Jackson 3
(
tools.jackson), not Jackson 2, and needs Spring AI 2.0. Mixing versions cost me hours. -
Read the error literally. A
404 Unknown request URLwas just a missing/v1in the base URL — Spring AI 2.0 changed the convention from the 1.x docs. - Structured output isn't magic. A small model returns prose unless you force JSON mode.
-
Trust the jar, not the blog. When an import failed,
javapon the actual jar showed the class had moved to a nested type between milestone and GA. Reading bytecode beats guessing.
Try it
The full source is on GitHub with a README that walks through setup:
🔗 github.com/pallavimudkhede21/Sherlog
If you're learning Spring AI, RAG, or pgvector, clone it and poke around — the
?rag=true|false toggle is a fun way to see what retrieval actually buys you.
Built with Spring Boot 4, Spring AI 2.0, Groq, and pgvector. Questions welcome in the comments!
Top comments (0)