DEV Community

Cover image for How I Built Sherlog: an AI Log Analyzer with RAG, Spring AI, Groq & pgvector
Pallavi Mudkhede
Pallavi Mudkhede

Posted on

How I Built Sherlog: an AI Log Analyzer with RAG, Spring AI, Groq & pgvector

Every developer knows the feeling: production throws an error, and you're staring at a
wall of stack-trace text trying to find the one line that matters. So I built Sherlog
an AI "log detective" that reads an application log, figures out the root cause, and hands
you a step-by-step fix as clean JSON. The twist: it doesn't just ask an LLM blindly. It uses
RAG (Retrieval-Augmented Generation) to ground every answer in a knowledge base of past
incidents.

In this post I'll walk through how it works and the real lessons I learned building it.

🔗 Repo: github.com/pallavimudkhede21/Sherlog

The stack

  • Java 21 + Spring Boot 4.1 — the backend
  • Spring AI 2.0 — the LLM framework (ChatClient, structured output, RAG advisors)
  • Groq (llama-3.1-8b-instant) — fast, OpenAI-compatible LLM inference
  • Local ONNX embeddings (all-MiniLM-L6-v2) — text → vectors, in-process, free
  • PostgreSQL + pgvector — the vector database (in Docker)

The idea: from a chatbot to a RAG system

A naive version just sends the log to an LLM and prints the answer. It works, but the advice
is generic — the model only knows its training data and the single log you pasted.

RAG changes that. Before asking the LLM, we retrieve relevant knowledge you own — past
incidents and their proven fixes — and augment the prompt with them. Now the model answers
grounded in your reality.

Here's the whole pipeline:

log → embed (local MiniLM) → search pgvector (top-3 incidents)
    → inject into prompt → Groq (JSON mode) → typed response
Enter fullscreen mode Exit fullscreen mode

Part 1 — Structured output with Spring AI

The first win is getting typed JSON back from the LLM instead of parsing free text. Spring
AI's ChatClient does this with .entity():

return chat.prompt()
    .system(SYSTEM_PROMPT)
    .user(u -> u.text("Analyze these logs:\n\n{logs}").param("logs", request.getLogs()))
    .options(OpenAiChatOptions.builder()
        .responseFormat(OpenAiChatModel.ResponseFormat.builder()
            .type(OpenAiChatModel.ResponseFormat.Type.JSON_OBJECT).build()))
    .call()
    .entity(LogAnalysisResponse.class);   // ← schema + parsing, automatic
Enter fullscreen mode Exit fullscreen mode

.entity(LogAnalysisResponse.class) generates a JSON schema from the POJO, tells the model to
honor it, and maps the reply straight onto the object. Groq's JSON mode
(response_format: json_object) forces valid JSON so the mapping never fails on prose.

Part 2 — Embeddings + pgvector

To find "similar past incidents," we don't keyword-match — we compare meaning. An embedding
model turns text into a vector (a list of numbers); similar meaning → nearby vectors.

I run the embedding model locally with Spring AI's Transformers starter (ONNX
all-MiniLM-L6-v2, 384 dimensions) — no embedding API, no cost. The vectors live in
pgvector, a Postgres extension:

docker run -d --name pgvector-db \
  -e POSTGRES_PASSWORD=postgres -e POSTGRES_DB=lograg \
  -p 5432:5432 pgvector/pgvector:pg17
Enter fullscreen mode Exit fullscreen mode

Spring AI auto-creates the vector_store table (with an HNSW cosine index) on startup, and a
loader seeds it from a small incidents.json:

vectorStore.add(documents);   // embeds each text and stores the vector
Enter fullscreen mode Exit fullscreen mode

Part 3 — Wiring RAG in one line

This is the magic. Spring AI has a purpose-built QuestionAnswerAdvisor that does the
retrieve-and-augment step automatically:

.advisors(QuestionAnswerAdvisor.builder(vectorStore)
    .searchRequest(SearchRequest.builder().topK(3).build())
    .build())
Enter fullscreen mode Exit fullscreen mode

Add that to the ChatClient call and every request now retrieves the 3 most similar past
incidents and injects them into the prompt before Groq answers. That's the whole "R" and "A"
of RAG in one line.

The payoff: does RAG actually help?

I added a toggle (?rag=true|false) to measure it. Same connection-timeout log, both ways:

RAG OFF: "Increase the maximum pool size or adjust the connection timeout." (vague)

RAG ON: "Increase spring.datasource.hikari.maximum-pool-size, ensure connections are
closed, and investigate long-running transactions."
(specific — pulled from the knowledge base)

Same model, same log. The only difference is retrieval — and the grounded answer is measurably
more actionable.

The hard-won lessons

The tutorial makes it look smooth. It wasn't. The real lessons:

  1. Check library ↔ framework versions before you adopt. Spring Boot 4 uses Jackson 3 (tools.jackson), not Jackson 2, and needs Spring AI 2.0. Mixing versions cost me hours.
  2. Read the error literally. A 404 Unknown request URL was just a missing /v1 in the base URL — Spring AI 2.0 changed the convention from the 1.x docs.
  3. Structured output isn't magic. A small model returns prose unless you force JSON mode.
  4. Trust the jar, not the blog. When an import failed, javap on the actual jar showed the class had moved to a nested type between milestone and GA. Reading bytecode beats guessing.

Try it

The full source is on GitHub with a README that walks through setup:

🔗 github.com/pallavimudkhede21/Sherlog

If you're learning Spring AI, RAG, or pgvector, clone it and poke around — the
?rag=true|false toggle is a fun way to see what retrieval actually buys you.

Built with Spring Boot 4, Spring AI 2.0, Groq, and pgvector. Questions welcome in the comments!

Top comments (0)