Ever asked an LLM a question about your own data and received an incorrect or generic answer?
That’s because Large Language Models (LLMs) don’t know your private data.
In this article, we’ll build a complete Retrieval-Augmented Generation (RAG) pipeline using:
- Java
- PostgreSQL (with vector support)
- Ollama (local LLM + embeddings)
👉 No OpenAI / No paid APIs
👉 Fully local
👉 Practical and production-relevant
🧠 What is RAG?
Retrieval-Augmented Generation (RAG) is an architecture that improves LLM responses by:
Retrieving relevant data from a knowledge source
Passing that data to the LLM
Generating an answer grounded in that context
In simple terms:
Instead of guessing, the model first looks up relevant information and then answers.
🔍 Why Do We Need RAG?
LLMs are powerful, but they have limitations:
- ❌ They don’t know your private/company data
- ❌ Their knowledge is static
- ❌ They can hallucinate
RAG solves this by combining:
- Your data (database)
- Smart retrieval (vector search)
- LLM reasoning (generation)
📊 RAG Flow (This Project)
We will implement this pipeline:
Text → Embedding → Store in DB
Query → Embedding
↓
Vector Search (Top K)
↓
Pass to LLM
↓
Final Answer
⚙️ Prerequisites
1. Install PostgreSQL
Make sure PostgreSQL is installed and running.
2. Install Ollama (Local LLM)
sudo apt-get install zstd
curl -fsSL https://ollama.com/install.sh | sh
3. Pull Required Models
# LLM (for answer generation)
ollama pull llama3
# Embedding model
ollama pull nomic-embed-text
4. Verify Installation
ollama run llama3
If it responds, you’re ready.
🟢 Phase 1: Indexing (Store Data)
In this phase, we:
- Convert text → vector (embedding)
- Store it in PostgreSQL
Why Embeddings?
Embeddings convert text into numbers so we can measure similarity.
Example:
"OAuth authentication"
→ [0.12, -0.98, 0.45, ...]
Database Table
CREATE TABLE text_embeddings (
id SERIAL PRIMARY KEY,
content TEXT,
embedding VECTOR(768)
);
Key Class: EmbeddingService.java
Calls Ollama
Converts text → vector
Snippet
ClassicHttpResponse response = (ClassicHttpResponse) Request.post("http://localhost:11434/api/embeddings")
.bodyString(body.toString(), ContentType.APPLICATION_JSON)
.execute()
.returnResponse();
This returns a numerical vector representation of the input text, which we store in the database.
Key Class: StorageService.java
Stores text + embedding into PostgreSQL
PreparedStatement ps = conn.prepareStatement(
"INSERT INTO text_embeddings (content, embedding) VALUES (?, ?::vector)"
);
ps.setString(1, text);
ps.setString(2, vector);
ps.executeUpdate();
Each piece of text is stored along with its vector representation.
🔵 Phase 2: Query (RAG Flow)
Step 1: User Query
"What is OAuth?"
Step 2: Convert Query → Embedding
Same process as storing text.
Step 3: Retrieve Relevant Data
SELECT content
FROM text_embeddings
ORDER BY embedding <-> ?::vector
LIMIT 3;
👉 This finds the most similar text chunks
Key Class: Retriever.java
This is the R (Retrieval) in RAG.
PreparedStatement ps = conn.prepareStatement(
"""
SELECT content
FROM text_embeddings
ORDER BY embedding <-> ?::vector
LIMIT ?
"""
);
ps.setString(1, vector);
ps.setInt(2, topK);
ResultSet rs = ps.executeQuery();
Step 4: Generate Answer Using LLM
We pass retrieved data to the LLM:
Context:
OAuth 2.0 is an authorization framework...
Question:
What is OAuth?
👉 The LLM generates a clean answer.
Key Class: LLMService.java
This is the G (Generation) in RAG.
Passing Context to the LLM
String prompt = """
Answer briefly in 2-3 sentences.
Context:
%s
Question:
%s
""".formatted(context, query);
We inject retrieved data into the prompt so the LLM generates grounded answers.
🧪 Sample Output
--- Retrieved Context ---
OAuth 2.0 is an authorization framework.
JWT is used for secure authentication.
--- Final Answer ---
OAuth is an authorization framework used to grant access to resources...
🧠 What’s Really Happening?
This is the most important part to understand:
Component Role
Database Stores knowledge
Retriever Finds relevant information
LLM Generates answer
👉 The LLM does NOT retrieve data
👉 The database does NOT generate answers
💻 Full Code
The project includes:
- EmbeddingService.java
- StorageService.java
- Retriever.java
- LLMService.java
- RAGApp.java
- pom.xml
👉 GitHub Repository
https://github.com/knowledgebase21st/Software-Engineering/tree/dev/AI/RAG
🚀 Why This Approach is Powerful
- Works with your own data
- Reduces hallucination
- Fully offline (with Ollama)
- Production-ready pattern
✅ Conclusion
We built a complete RAG pipeline using Java, PostgreSQL, and Ollama.
This approach combines:
- Your data
- Smart retrieval
- LLM reasoning
Result:
Accurate, context-aware answers using your own knowledge base.
Top comments (0)