This blog post’s idea is to explore how to bridge the gap between traditional relational data and unstructured documents to create a powerful hypothetical recruitment tool.
Introduction
The Candidate RAG System leverages a sophisticated technology stack to turn a folder of resumes into an interactive, searchable knowledge base. By combining Docling’s hierarchical document processing with PostgreSQL’s relational and vector capabilities (via pgvector), the system allows recruiters to query candidate data with natural language while maintaining strict data integrity.
The Architecture: A Hybrid Approach
Bob’s implementation uses a “Relational + Vector” hybrid architecture. While standard candidate details (name, DOB, file paths) reside in traditional SQL tables, the actual content of the resumes is transformed into high-dimensional vectors for semantic search.
Key Components:
- Document Processing: Docling is used to extract and chunk resume content from PDF and DOCX formats.
- Relational & Vector Storage: A single PostgreSQL instance manages both metadata and embeddings using the pgvector extension.
- Local AI Engine: Ollama runs the models locally, using
granite-embedding:30mfor vectors andgranite4:latestfor the chat interface.
Technologies Used for this project
## Technologies Used
- **[Docling](https://github.com/docling-project/docling)**: Document parsing and chunking
- **[PostgreSQL](https://www.postgresql.org/)**: Database
- **[pgvector](https://github.com/pgvector/pgvector)**: Vector similarity search
- **[Ollama](https://ollama.ai/)**: Local LLM and embeddings
- **[Streamlit](https://streamlit.io/)**: Web interface
- **[Podman](https://podman.io/)**: Container management
The Data Workflow
The system operates through two primary pipelines: Ingestion and Querying.
1. Document Ingestion Flow
- Extraction: Docling converts resumes into hierarchical chunks to preserve document structure.
- Vectorization: Ollama generates 384-dimensional embeddings for each text chunk.
-
Storage: Chunks are saved in the
resume_chunkstable, linked to thecandidatestable via a foreign key.
2. RAG Query Flow
When a user asks, “Which candidate has experience with Python?”:
- The query is converted into an embedding.
- PostgreSQL performs a vector similarity search (embedding<=>query_vector) to find the most relevant resume sections.
- The retrieved text is sent to the Ollama LLM as context to generate a natural language answer.
Database Schema Design
Bob utilizes a clean relational schema to ensure that every vector chunk is always tied back to a specific person.
CREATE TABLE candidates (
id SERIAL PRIMARY KEY,
firstname VARCHAR(100),
lastname VARCHAR(100),
date_of_birth DATE,
resume_path VARCHAR(255),
created_at TIMESTAMP
);
CREATE TABLE resume_chunks (
id SERIAL PRIMARY KEY,
candidate_id INTEGER REFERENCES candidates(id),
chunk_text TEXT,
chunk_index INTEGER,
embedding vector(384), -- granite-embedding:30m dimension
metadata JSONB,
created_at TIMESTAMP
);
| **Table** | **Column** | **Type** | **Purpose** |
| ----------------- | -------------------------------------------- | ------------------------------ | ----------------------------------------- |
| **Candidates** | `id`, `firstname`, `lastname`, `resume_path` | Serial, Varchar | Stores primary candidate records. |
| **Resume Chunks** | `candidate_id`, `chunk_text`, `embedding` | Integer, Text, **Vector(384)** | Stores the AI-searchable resume segments. |
Project Structure
The following is the implementation and structure of the project;
docling-rag-postgresql/
├── app.py # Streamlit GUI application
├── docker-compose.yml # PostgreSQL setup
├── init.sql # Database initialization
├── requirements.txt # Python dependencies
├── .env.example # Environment variables template
├── README.md # This file
├── Docs/ # Documentation
│ ├── Architecture.md # System architecture with Mermaid diagrams
│ └── QUICKSTART.md # Quick start guide
├── scripts/ # Utility scripts
│ ├── setup.sh # Setup script
│ └── run.sh # Run script
├── resumes/ # Resume files
│ ├── resume1.docx
│ ├── resume2.docx
│ ├── resume3.docx
│ └── resume4.pdf
└── src/ # Source code
├── __init__.py
├── config.py # Configuration management
├── database_service.py # PostgreSQL operations
├── document_processor.py # Docling document processing
├── embedding_service.py # Ollama embeddings
├── rag_service.py # RAG pipeline
└── data_loader.py # Resume loading
Database and LLM
- Configuration of the PostgreSQL Database and the local Ollama ⚙️
# Database
DB_HOST=localhost
DB_PORT=5432
DB_NAME=candidates_rag
DB_USER=postgres
DB_PASSWORD=postgres
# Ollama
OLLAMA_HOST=http://localhost:11434
EMBEDDING_MODEL=granite-embedding:30m
LLM_MODEL=granite4:latest
# Application
CHUNK_SIZE=500
CHUNK_OVERLAP=50
TOP_K_RESULTS=5
- And the code which goes with is shown below;
"""Configuration management for the RAG application."""
import os
from dotenv import load_dotenv
load_dotenv()
class Config:
"""Application configuration."""
# Database Configuration
DB_HOST = os.getenv("DB_HOST", "localhost")
DB_PORT = int(os.getenv("DB_PORT", "5432"))
DB_NAME = os.getenv("DB_NAME", "candidates_rag")
DB_USER = os.getenv("DB_USER", "postgres")
DB_PASSWORD = os.getenv("DB_PASSWORD", "postgres")
# Ollama Configuration
OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://localhost:11434")
EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "granite-embedding:30m")
LLM_MODEL = os.getenv("LLM_MODEL", "granite4:latest")
# Application Configuration
CHUNK_SIZE = int(os.getenv("CHUNK_SIZE", "500"))
CHUNK_OVERLAP = int(os.getenv("CHUNK_OVERLAP", "50"))
TOP_K_RESULTS = int(os.getenv("TOP_K_RESULTS", "5"))
@property
def database_url(self):
"""Get database connection URL."""
return f"postgresql://{self.DB_USER}:{self.DB_PASSWORD}@{self.DB_HOST}:{self.DB_PORT}/{self.DB_NAME}"
config = Config()
# Made with Bob
- Sample Data injection for the database 💉
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create candidates table
CREATE TABLE IF NOT EXISTS candidates (
id SERIAL PRIMARY KEY,
firstname VARCHAR(100) NOT NULL,
lastname VARCHAR(100) NOT NULL,
date_of_birth DATE NOT NULL,
resume_path VARCHAR(255),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Create resume_chunks table for storing document chunks with embeddings
CREATE TABLE IF NOT EXISTS resume_chunks (
id SERIAL PRIMARY KEY,
candidate_id INTEGER REFERENCES candidates(id) ON DELETE CASCADE,
chunk_text TEXT NOT NULL,
chunk_index INTEGER NOT NULL,
embedding vector(384), -- granite-embedding:30m produces 384-dimensional vectors
metadata JSONB,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Create index for vector similarity search
CREATE INDEX IF NOT EXISTS resume_chunks_embedding_idx
ON resume_chunks USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Create index for candidate lookups
CREATE INDEX IF NOT EXISTS resume_chunks_candidate_id_idx
ON resume_chunks(candidate_id);
-- Insert sample candidates (will be populated by the application)
INSERT INTO candidates (firstname, lastname, date_of_birth, resume_path) VALUES
('John', 'Smith', '1990-05-15', 'resumes/resume1.docx'),
('Sarah', 'Johnson', '1988-08-22', 'resumes/resume2.docx'),
('Michael', 'Williams', '1992-03-10', 'resumes/resume3.docx'),
('Emily', 'Brown', '1995-11-30', 'resumes/resume4.pdf')
ON CONFLICT DO NOTHING;
-- Made with Bob
Information Ingestion
The information ingestion process, detailed below, consists of several critical stages. This integration framework is the core of the application, serving as the essential foundation that enables all advanced RAG and relational operations.
- data_loader: This orchestration script manages the high-level workflow of importing resumes from the local file system into the system and it coordinates between the document processor, embedding service, and database service to ensure files are parsed, vectorized, and saved correctly.
# data_loader.py
"""Data loader for processing and loading candidate resumes."""
import logging
from pathlib import Path
from typing import List, Dict, Any
from src.database_service import DatabaseService
from src.document_processor import DocumentProcessor
from src.embedding_service import EmbeddingService
logger = logging.getLogger(__name__)
class DataLoader:
"""Load and process candidate resumes into the database."""
def __init__(
self,
db_service: DatabaseService,
doc_processor: DocumentProcessor,
embedding_service: EmbeddingService
):
"""Initialize data loader.
Args:
db_service: Database service instance
doc_processor: Document processor instance
embedding_service: Embedding service instance
"""
self.db_service = db_service
self.doc_processor = doc_processor
self.embedding_service = embedding_service
logger.info("DataLoader initialized")
def load_candidate_resume(self, candidate_id: int) -> bool:
"""Load and process a candidate's resume.
Args:
candidate_id: Candidate ID
Returns:
True if successful, False otherwise
"""
try:
# Get candidate info
candidate = self.db_service.get_candidate_by_id(candidate_id)
if not candidate:
logger.error(f"Candidate {candidate_id} not found")
return False
resume_path = candidate['resume_path']
if not resume_path:
logger.error(f"No resume path for candidate {candidate_id}")
return False
# Check if file exists
if not Path(resume_path).exists():
logger.error(f"Resume file not found: {resume_path}")
return False
logger.info(f"Processing resume for {candidate['firstname']} {candidate['lastname']}")
# Process document
chunks = self.doc_processor.process_document(resume_path)
if not chunks:
logger.warning(f"No chunks extracted from {resume_path}")
return False
# Generate embeddings
chunk_texts = [chunk['text'] for chunk in chunks]
embeddings = self.embedding_service.generate_embeddings(chunk_texts)
# Insert into database
self.db_service.insert_resume_chunks(
candidate_id=candidate_id,
chunks=chunks,
embeddings=embeddings
)
logger.info(f"Successfully loaded resume for candidate {candidate_id}")
return True
except Exception as e:
logger.error(f"Error loading resume for candidate {candidate_id}: {e}")
return False
def load_all_resumes(self) -> Dict[int, bool]:
"""Load all candidate resumes.
Returns:
Dictionary mapping candidate IDs to success status
"""
results = {}
candidates = self.db_service.get_all_candidates()
logger.info(f"Loading resumes for {len(candidates)} candidates")
for candidate in candidates:
candidate_id = candidate['id']
success = self.load_candidate_resume(candidate_id)
results[candidate_id] = success
successful = sum(1 for v in results.values() if v)
logger.info(f"Successfully loaded {successful}/{len(candidates)} resumes")
return results
def reload_candidate_resume(self, candidate_id: int) -> bool:
"""Reload a candidate's resume (delete old chunks and reload).
Args:
candidate_id: Candidate ID
Returns:
True if successful, False otherwise
"""
try:
# Delete existing chunks
self.db_service.delete_candidate_chunks(candidate_id)
# Load resume
return self.load_candidate_resume(candidate_id)
except Exception as e:
logger.error(f"Error reloading resume for candidate {candidate_id}: {e}")
return False
# Made with Bob
- document_processor: This code utilizes the Docling library to parse and convert PDF and DOCX resumes into structured text formats, and implements hierarchical chunking to break down long documents into smaller, manageable pieces while preserving the document’s logical structure.
# document_processor.py
"""Document processing with Docling."""
import logging
from pathlib import Path
from typing import List, Dict, Any
from docling.document_converter import DocumentConverter
from docling_core.transforms.chunker import HierarchicalChunker
logger = logging.getLogger(__name__)
class DocumentProcessor:
"""Process documents using Docling."""
def __init__(self, chunk_size: int = 500, chunk_overlap: int = 50):
"""Initialize document processor.
Args:
chunk_size: Maximum size of text chunks
chunk_overlap: Overlap between chunks
"""
self.converter = DocumentConverter()
self.chunker = HierarchicalChunker(
max_tokens=chunk_size,
overlap_tokens=chunk_overlap
)
logger.info("DocumentProcessor initialized")
def process_document(self, file_path: str) -> List[Dict[str, Any]]:
"""Process a document and return chunks with metadata.
Args:
file_path: Path to the document file
Returns:
List of dictionaries containing chunk text and metadata
"""
try:
logger.info(f"Processing document: {file_path}")
# Convert document
result = self.converter.convert(file_path)
doc = result.document
# Chunk the document
chunks = list(self.chunker.chunk(doc))
# Extract text and metadata from chunks
processed_chunks = []
for idx, chunk in enumerate(chunks):
chunk_data = {
"text": chunk.text,
"index": idx,
"metadata": {
"source": file_path,
"chunk_index": idx,
"total_chunks": len(chunks)
}
}
processed_chunks.append(chunk_data)
logger.info(f"Processed {len(processed_chunks)} chunks from {file_path}")
return processed_chunks
except Exception as e:
logger.error(f"Error processing document {file_path}: {e}")
raise
def process_multiple_documents(self, file_paths: List[str]) -> Dict[str, List[Dict[str, Any]]]:
"""Process multiple documents.
Args:
file_paths: List of document file paths
Returns:
Dictionary mapping file paths to their processed chunks
"""
results = {}
for file_path in file_paths:
try:
results[file_path] = self.process_document(file_path)
except Exception as e:
logger.error(f"Failed to process {file_path}: {e}")
results[file_path] = []
return results
# Made with Bob
- embedding_service: Managing all interactions with the PostgreSQL database, including storing candidate metadata and resume text chunks. Also, this code leverages the pgvector extension to perform specialized vector similarity searches, allowing the system to find the most relevant resume sections for a given query
# embedding_service.py
"""Embedding service using Ollama."""
import logging
from typing import List, Optional
import ollama
from src.config import config
logger = logging.getLogger(__name__)
class EmbeddingService:
"""Generate embeddings using Ollama."""
def __init__(self, model: Optional[str] = None, host: Optional[str] = None):
"""Initialize embedding service.
Args:
model: Ollama model name for embeddings
host: Ollama host URL
"""
self.model = model or config.EMBEDDING_MODEL
self.host = host or config.OLLAMA_HOST
self.client = ollama.Client(host=self.host)
logger.info(f"EmbeddingService initialized with model: {self.model}")
def generate_embedding(self, text: str) -> List[float]:
"""Generate embedding for a single text.
Args:
text: Text to embed
Returns:
Embedding vector as list of floats
"""
try:
response = self.client.embeddings(
model=self.model,
prompt=text
)
return response['embedding']
except Exception as e:
logger.error(f"Error generating embedding: {e}")
raise
def generate_embeddings(self, texts: List[str]) -> List[List[float]]:
"""Generate embeddings for multiple texts.
Args:
texts: List of texts to embed
Returns:
List of embedding vectors
"""
embeddings = []
for text in texts:
try:
embedding = self.generate_embedding(text)
embeddings.append(embedding)
except Exception as e:
logger.error(f"Failed to generate embedding for text: {e}")
# Return zero vector on error
embeddings.append([0.0] * 384) # granite-embedding:30m dimension
return embeddings
def test_connection(self) -> bool:
"""Test connection to Ollama service.
Returns:
True if connection successful, False otherwise
"""
try:
# Try to generate a test embedding
self.generate_embedding("test")
logger.info("Successfully connected to Ollama")
return True
except Exception as e:
logger.error(f"Failed to connect to Ollama: {e}")
return False
# Made with Bob
RAG Service
Last but not least, this is the core “brain” of the application that orchestrates the Retrieval-Augmented Generation pipeline by combining retrieved context with user queries. It sends the final enriched prompt to the granite4:latest model via Ollama to generate a natural language response based specifically on the candidate data.
# rag_service.py
"""RAG service for retrieval and generation."""
import logging
from typing import List, Dict, Any, Optional
import ollama
from src.config import config
from src.database_service import DatabaseService
from src.embedding_service import EmbeddingService
logger = logging.getLogger(__name__)
class RAGService:
"""Retrieval-Augmented Generation service."""
def __init__(
self,
db_service: DatabaseService,
embedding_service: EmbeddingService,
llm_model: Optional[str] = None,
host: Optional[str] = None
):
"""Initialize RAG service.
Args:
db_service: Database service instance
embedding_service: Embedding service instance
llm_model: Ollama LLM model name
host: Ollama host URL
"""
self.db_service = db_service
self.embedding_service = embedding_service
self.llm_model = llm_model or config.LLM_MODEL
self.host = host or config.OLLAMA_HOST
self.client = ollama.Client(host=self.host)
logger.info(f"RAGService initialized with LLM: {self.llm_model}")
def retrieve_context(
self,
query: str,
top_k: int = None,
candidate_id: Optional[int] = None
) -> List[Dict[str, Any]]:
"""Retrieve relevant context for a query.
Args:
query: User query
top_k: Number of results to retrieve
candidate_id: Optional candidate ID to filter results
Returns:
List of relevant chunks with metadata
"""
if top_k is None:
top_k = config.TOP_K_RESULTS
try:
# Generate query embedding
query_embedding = self.embedding_service.generate_embedding(query)
# Search for similar chunks
results = self.db_service.search_similar_chunks(
query_embedding=query_embedding,
top_k=top_k,
candidate_id=candidate_id
)
logger.info(f"Retrieved {len(results)} chunks for query")
return results
except Exception as e:
logger.error(f"Error retrieving context: {e}")
raise
def generate_response(
self,
query: str,
context_chunks: List[Dict[str, Any]],
system_prompt: Optional[str] = None
) -> str:
"""Generate response using LLM with retrieved context.
Args:
query: User query
context_chunks: Retrieved context chunks
system_prompt: Optional system prompt
Returns:
Generated response
"""
try:
# Build context from chunks
context_parts = []
for i, chunk in enumerate(context_chunks, 1):
candidate_name = f"{chunk['firstname']} {chunk['lastname']}"
similarity = chunk.get('similarity', 0)
context_parts.append(
f"[Context {i} - {candidate_name} (Relevance: {similarity:.2f})]\n{chunk['chunk_text']}"
)
context = "\n\n".join(context_parts)
# Default system prompt
if system_prompt is None:
system_prompt = """You are a helpful AI assistant that answers questions about candidates based on their resumes.
Use only the provided context to answer questions. If the context doesn't contain enough information, say so.
Be specific and cite which candidate the information comes from."""
# Build user prompt
user_prompt = f"""Context from candidate resumes:
{context}
Question: {query}
Please provide a detailed answer based on the context above."""
# Generate response
response = self.client.chat(
model=self.llm_model,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
]
)
return response['message']['content']
except Exception as e:
logger.error(f"Error generating response: {e}")
raise
def chat(
self,
query: str,
candidate_id: Optional[int] = None,
top_k: int = None
) -> Dict[str, Any]:
"""Complete RAG pipeline: retrieve and generate.
Args:
query: User query
candidate_id: Optional candidate ID to filter results
top_k: Number of context chunks to retrieve
Returns:
Dictionary with response and metadata
"""
try:
# Retrieve context
context_chunks = self.retrieve_context(
query=query,
top_k=top_k,
candidate_id=candidate_id
)
if not context_chunks:
return {
"response": "I couldn't find any relevant information in the candidate resumes to answer your question.",
"context_chunks": [],
"query": query
}
# Generate response
response = self.generate_response(
query=query,
context_chunks=context_chunks
)
return {
"response": response,
"context_chunks": context_chunks,
"query": query
}
except Exception as e:
logger.error(f"Error in chat: {e}")
raise
def stream_chat(
self,
query: str,
candidate_id: Optional[int] = None,
top_k: int = None
):
"""Stream RAG response.
Args:
query: User query
candidate_id: Optional candidate ID to filter results
top_k: Number of context chunks to retrieve
Yields:
Response chunks
"""
try:
# Retrieve context
context_chunks = self.retrieve_context(
query=query,
top_k=top_k,
candidate_id=candidate_id
)
if not context_chunks:
yield "I couldn't find any relevant information in the candidate resumes to answer your question."
return
# Build context
context_parts = []
for i, chunk in enumerate(context_chunks, 1):
candidate_name = f"{chunk['firstname']} {chunk['lastname']}"
similarity = chunk.get('similarity', 0)
context_parts.append(
f"[Context {i} - {candidate_name} (Relevance: {similarity:.2f})]\n{chunk['chunk_text']}"
)
context = "\n\n".join(context_parts)
system_prompt = """You are a helpful AI assistant that answers questions about candidates based on their resumes.
Use only the provided context to answer questions. If the context doesn't contain enough information, say so.
Be specific and cite which candidate the information comes from."""
user_prompt = f"""Context from candidate resumes:
{context}
Question: {query}
Please provide a detailed answer based on the context above."""
# Stream response
stream = self.client.chat(
model=self.llm_model,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
],
stream=True
)
for chunk in stream:
if 'message' in chunk and 'content' in chunk['message']:
yield chunk['message']['content']
except Exception as e:
logger.error(f"Error in stream_chat: {e}")
yield f"Error: {str(e)}"
# Made with Bob
Conclusion
The Candidate RAG System demonstrates a powerful shift in recruitment technology by transforming static, unstructured resumes into a dynamic and queryable knowledge base. At the heart of this transformation is Docling, which provides robust information extraction capabilities to accurately parse and chunk content from both PDF and DOCX formats. By integrating this sophisticated document processing with PostgreSQL and pgvector for hybrid relational and semantic storage, the architecture allows organizations to move beyond keyword matching to a deeper, contextual understanding of their talent pool.
This specific use-case proves that combining Docling’s hierarchical chunking with vector similarity search enables recruiters to ask complex questions and receive evidence-based answers, effectively bridging the gap between massive document repositories and actionable hiring insights.
>>> Thanks for reading <<<
Links
- Code repository for this blog post: https://github.com/aairom/docling-rag-postgresql
- PostgreSQL official image on Docker hub: https://hub.docker.com/_/postgres
- PostgreSQL in Docker: https://www.datacamp.com/tutorial/postgresql-docker?dc_referrer=https%3A%2F%2Fwww.google.com%2F
- Docling Project: https://github.com/docling-project
- IBM Bob: bob.ibm.com









Top comments (0)