Ever asked an LLM a question about your own data and received an incorrect or generic answer?
That’s because Large Language Models (LLMs) don’t know your private data.
In this article, we’ll build a complete Retrieval-Augmented Generation (RAG) pipeline using:
- Java
- PostgreSQL (with vector support)
- Ollama (local LLM + embeddings)
👉 No OpenAI / No paid APIs
👉 Fully local
👉 Practical and production-relevant
🧠 What is RAG?
Retrieval-Augmented Generation (RAG) is an architecture that improves LLM responses by:
Retrieving relevant data from a knowledge source
Passing that data to the LLM
Generating an answer grounded in that context
In simple terms:
Instead of guessing, the model first looks up relevant information and then answers.
🔍 Why Do We Need RAG?
LLMs are powerful, but they have limitations:
- ❌ They don’t know your private/company data
- ❌ Their knowledge is static
- ❌ They can hallucinate
RAG solves this by combining:
- Your data (database)
- Smart retrieval (vector search)
- LLM reasoning (generation)
📊 RAG Flow (This Project)
We will implement this pipeline:
Text → Embedding → Store in DB
Query → Embedding
↓
Vector Search (Top K)
↓
Pass to LLM
↓
Final Answer
⚙️ Prerequisites
1. Install PostgreSQL
Make sure PostgreSQL is installed and running.
2. Install Ollama (Local LLM)
sudo apt-get install zstd
curl -fsSL https://ollama.com/install.sh | sh
3. Pull Required Models
# LLM (for answer generation)
ollama pull llama3
# Embedding model
ollama pull nomic-embed-text
4. Verify Installation
ollama run llama3
If it responds, you’re ready.
🟢 Phase 1: Indexing (Store Data)
In this phase, we:
- Convert text → vector (embedding)
- Store it in PostgreSQL
Why Embeddings?
Embeddings convert text into numbers so we can measure similarity.
Example:
"OAuth authentication"
→ [0.12, -0.98, 0.45, ...]
Database Table
CREATE TABLE text_embeddings (
id SERIAL PRIMARY KEY,
content TEXT,
embedding VECTOR(768)
);
Key Class: EmbeddingService.java
Calls Ollama
Converts text → vector
Snippet
ClassicHttpResponse response = (ClassicHttpResponse) Request.post("http://localhost:11434/api/embeddings")
.bodyString(body.toString(), ContentType.APPLICATION_JSON)
.execute()
.returnResponse();
This returns a numerical vector representation of the input text, which we store in the database.
Key Class: StorageService.java
Stores text + embedding into PostgreSQL
PreparedStatement ps = conn.prepareStatement(
"INSERT INTO text_embeddings (content, embedding) VALUES (?, ?::vector)"
);
ps.setString(1, text);
ps.setString(2, vector);
ps.executeUpdate();
Each piece of text is stored along with its vector representation.
🔵 Phase 2: Query (RAG Flow)
Step 1: User Query
"What is OAuth?"
Step 2: Convert Query → Embedding
Same process as storing text.
Step 3: Retrieve Relevant Data
SELECT content
FROM text_embeddings
ORDER BY embedding <-> ?::vector
LIMIT 3;
👉 This finds the most similar text chunks
Key Class: Retriever.java
This is the R (Retrieval) in RAG.
PreparedStatement ps = conn.prepareStatement(
"""
SELECT content
FROM text_embeddings
ORDER BY embedding <-> ?::vector
LIMIT ?
"""
);
ps.setString(1, vector);
ps.setInt(2, topK);
ResultSet rs = ps.executeQuery();
Step 4: Generate Answer Using LLM
We pass retrieved data to the LLM:
Context:
OAuth 2.0 is an authorization framework...
Question:
What is OAuth?
👉 The LLM generates a clean answer.
Key Class: LLMService.java
This is the G (Generation) in RAG.
Passing Context to the LLM
String prompt = """
Answer briefly in 2-3 sentences.
Context:
%s
Question:
%s
""".formatted(context, query);
We inject retrieved data into the prompt so the LLM generates grounded answers.
🧪 Sample Output
--- Retrieved Context ---
OAuth 2.0 is an authorization framework.
JWT is used for secure authentication.
--- Final Answer ---
OAuth is an authorization framework used to grant access to resources...
🧠 What’s Really Happening?
This is the most important part to understand:
Component Role
Database Stores knowledge
Retriever Finds relevant information
LLM Generates answer
👉 The LLM does NOT retrieve data
👉 The database does NOT generate answers
💻 Full Code
The project includes:
- EmbeddingService.java
- StorageService.java
- Retriever.java
- LLMService.java
- RAGApp.java
- pom.xml
👉 GitHub Repository
https://github.com/knowledgebase21st/Software-Engineering/tree/dev/AI/RAG
🚀 Why This Approach is Powerful
- Works with your own data
- Reduces hallucination
- Fully offline (with Ollama)
- Production-ready pattern
✅ Conclusion
We built a complete RAG pipeline using Java, PostgreSQL, and Ollama.
This approach combines:
- Your data
- Smart retrieval
- LLM reasoning
Result:
Accurate, context-aware answers using your own knowledge base.
Top comments (2)
It would have been better if you could use spring data jpa for DB operations.
Good point—Spring Data JPA would make the data layer cleaner. I kept it simple here to focus on the RAG pipeline and pgvector integration, but for production I’d definitely consider JPA or a hybrid approach for better maintainability.