Alain Airom (Ayrom)

Posted on Apr 19

Mastering Modern Hiring Demonstration: Using Docling and PostgreSQL by Bob to Build a Local Candidate RAG Database

#bob #rag #docling #postgressql

This blog post’s idea is to explore how to bridge the gap between traditional relational data and unstructured documents to create a powerful hypothetical recruitment tool.

Introduction

The Candidate RAG System leverages a sophisticated technology stack to turn a folder of resumes into an interactive, searchable knowledge base. By combining Docling’s hierarchical document processing with PostgreSQL’s relational and vector capabilities (via pgvector), the system allows recruiters to query candidate data with natural language while maintaining strict data integrity.

The Architecture: A Hybrid Approach

Bob’s implementation uses a “Relational + Vector” hybrid architecture. While standard candidate details (name, DOB, file paths) reside in traditional SQL tables, the actual content of the resumes is transformed into high-dimensional vectors for semantic search.

Key Components:

Document Processing: Docling is used to extract and chunk resume content from PDF and DOCX formats.
Relational & Vector Storage: A single PostgreSQL instance manages both metadata and embeddings using the pgvector extension.
Local AI Engine: Ollama runs the models locally, using granite-embedding:30m for vectors and granite4:latest for the chat interface.

Technologies Used for this project

## Technologies Used

- **[Docling](https://github.com/docling-project/docling)**: Document parsing and chunking
- **[PostgreSQL](https://www.postgresql.org/)**: Database
- **[pgvector](https://github.com/pgvector/pgvector)**: Vector similarity search
- **[Ollama](https://ollama.ai/)**: Local LLM and embeddings
- **[Streamlit](https://streamlit.io/)**: Web interface
- **[Podman](https://podman.io/)**: Container management

The Data Workflow

The system operates through two primary pipelines: Ingestion and Querying.

1. Document Ingestion Flow

Extraction: Docling converts resumes into hierarchical chunks to preserve document structure.
Vectorization: Ollama generates 384-dimensional embeddings for each text chunk.
Storage: Chunks are saved in the resume_chunks table, linked to the candidates table via a foreign key.

2. RAG Query Flow

When a user asks, “Which candidate has experience with Python?”:

The query is converted into an embedding.
PostgreSQL performs a vector similarity search (embedding<=>query_vector) to find the most relevant resume sections.
The retrieved text is sent to the Ollama LLM as context to generate a natural language answer.

Database Schema Design

Bob utilizes a clean relational schema to ensure that every vector chunk is always tied back to a specific person.

CREATE TABLE candidates (
    id SERIAL PRIMARY KEY,
    firstname VARCHAR(100),
    lastname VARCHAR(100),
    date_of_birth DATE,
    resume_path VARCHAR(255),
    created_at TIMESTAMP
);

CREATE TABLE resume_chunks (
    id SERIAL PRIMARY KEY,
    candidate_id INTEGER REFERENCES candidates(id),
    chunk_text TEXT,
    chunk_index INTEGER,
    embedding vector(384),  -- granite-embedding:30m dimension
    metadata JSONB,
    created_at TIMESTAMP
);

| **Table**         | **Column**                                   | **Type**                       | **Purpose**                               |
| ----------------- | -------------------------------------------- | ------------------------------ | ----------------------------------------- |
| **Candidates**    | `id`, `firstname`, `lastname`, `resume_path` | Serial, Varchar                | Stores primary candidate records.         |
| **Resume Chunks** | `candidate_id`, `chunk_text`, `embedding`    | Integer, Text, **Vector(384)** | Stores the AI-searchable resume segments. |

Project Structure

The following is the implementation and structure of the project;

docling-rag-postgresql/
├── app.py                      # Streamlit GUI application
├── docker-compose.yml          # PostgreSQL setup
├── init.sql                    # Database initialization
├── requirements.txt            # Python dependencies
├── .env.example               # Environment variables template
├── README.md                  # This file
├── Docs/                      # Documentation
│   ├── Architecture.md        # System architecture with Mermaid diagrams
│   └── QUICKSTART.md          # Quick start guide
├── scripts/                   # Utility scripts
│   ├── setup.sh              # Setup script
│   └── run.sh                # Run script
├── resumes/                   # Resume files
│   ├── resume1.docx
│   ├── resume2.docx
│   ├── resume3.docx
│   └── resume4.pdf
└── src/                       # Source code
    ├── __init__.py
    ├── config.py              # Configuration management
    ├── database_service.py    # PostgreSQL operations
    ├── document_processor.py  # Docling document processing
    ├── embedding_service.py   # Ollama embeddings
    ├── rag_service.py         # RAG pipeline
    └── data_loader.py         # Resume loading

Database and LLM

Configuration of the PostgreSQL Database and the local Ollama ⚙️

# Database
DB_HOST=localhost
DB_PORT=5432
DB_NAME=candidates_rag
DB_USER=postgres
DB_PASSWORD=postgres

# Ollama
OLLAMA_HOST=http://localhost:11434
EMBEDDING_MODEL=granite-embedding:30m
LLM_MODEL=granite4:latest

# Application
CHUNK_SIZE=500
CHUNK_OVERLAP=50
TOP_K_RESULTS=5

And the code which goes with is shown below;

"""Configuration management for the RAG application."""
import os
from dotenv import load_dotenv

load_dotenv()


class Config:
    """Application configuration."""

    # Database Configuration
    DB_HOST = os.getenv("DB_HOST", "localhost")
    DB_PORT = int(os.getenv("DB_PORT", "5432"))
    DB_NAME = os.getenv("DB_NAME", "candidates_rag")
    DB_USER = os.getenv("DB_USER", "postgres")
    DB_PASSWORD = os.getenv("DB_PASSWORD", "postgres")

    # Ollama Configuration
    OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://localhost:11434")
    EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "granite-embedding:30m")
    LLM_MODEL = os.getenv("LLM_MODEL", "granite4:latest")

    # Application Configuration
    CHUNK_SIZE = int(os.getenv("CHUNK_SIZE", "500"))
    CHUNK_OVERLAP = int(os.getenv("CHUNK_OVERLAP", "50"))
    TOP_K_RESULTS = int(os.getenv("TOP_K_RESULTS", "5"))

    @property
    def database_url(self):
        """Get database connection URL."""
        return f"postgresql://{self.DB_USER}:{self.DB_PASSWORD}@{self.DB_HOST}:{self.DB_PORT}/{self.DB_NAME}"


config = Config()

# Made with Bob

Sample Data injection for the database 💉

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Create candidates table
CREATE TABLE IF NOT EXISTS candidates (
    id SERIAL PRIMARY KEY,
    firstname VARCHAR(100) NOT NULL,
    lastname VARCHAR(100) NOT NULL,
    date_of_birth DATE NOT NULL,
    resume_path VARCHAR(255),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Create resume_chunks table for storing document chunks with embeddings
CREATE TABLE IF NOT EXISTS resume_chunks (
    id SERIAL PRIMARY KEY,
    candidate_id INTEGER REFERENCES candidates(id) ON DELETE CASCADE,
    chunk_text TEXT NOT NULL,
    chunk_index INTEGER NOT NULL,
    embedding vector(384),  -- granite-embedding:30m produces 384-dimensional vectors
    metadata JSONB,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Create index for vector similarity search
CREATE INDEX IF NOT EXISTS resume_chunks_embedding_idx 
ON resume_chunks USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- Create index for candidate lookups
CREATE INDEX IF NOT EXISTS resume_chunks_candidate_id_idx 
ON resume_chunks(candidate_id);

-- Insert sample candidates (will be populated by the application)
INSERT INTO candidates (firstname, lastname, date_of_birth, resume_path) VALUES
    ('John', 'Smith', '1990-05-15', 'resumes/resume1.docx'),
    ('Sarah', 'Johnson', '1988-08-22', 'resumes/resume2.docx'),
    ('Michael', 'Williams', '1992-03-10', 'resumes/resume3.docx'),
    ('Emily', 'Brown', '1995-11-30', 'resumes/resume4.pdf')
ON CONFLICT DO NOTHING;

-- Made with Bob

Information Ingestion

The information ingestion process, detailed below, consists of several critical stages. This integration framework is the core of the application, serving as the essential foundation that enables all advanced RAG and relational operations.

data_loader: This orchestration script manages the high-level workflow of importing resumes from the local file system into the system and it coordinates between the document processor, embedding service, and database service to ensure files are parsed, vectorized, and saved correctly.

# data_loader.py
"""Data loader for processing and loading candidate resumes."""
import logging
from pathlib import Path
from typing import List, Dict, Any

from src.database_service import DatabaseService
from src.document_processor import DocumentProcessor
from src.embedding_service import EmbeddingService

logger = logging.getLogger(__name__)


class DataLoader:
    """Load and process candidate resumes into the database."""

    def __init__(
        self,
        db_service: DatabaseService,
        doc_processor: DocumentProcessor,
        embedding_service: EmbeddingService
    ):
        """Initialize data loader.

        Args:
            db_service: Database service instance
            doc_processor: Document processor instance
            embedding_service: Embedding service instance
        """
        self.db_service = db_service
        self.doc_processor = doc_processor
        self.embedding_service = embedding_service
        logger.info("DataLoader initialized")

    def load_candidate_resume(self, candidate_id: int) -> bool:
        """Load and process a candidate's resume.

        Args:
            candidate_id: Candidate ID

        Returns:
            True if successful, False otherwise
        """
        try:
            # Get candidate info
            candidate = self.db_service.get_candidate_by_id(candidate_id)
            if not candidate:
                logger.error(f"Candidate {candidate_id} not found")
                return False

            resume_path = candidate['resume_path']
            if not resume_path:
                logger.error(f"No resume path for candidate {candidate_id}")
                return False

            # Check if file exists
            if not Path(resume_path).exists():
                logger.error(f"Resume file not found: {resume_path}")
                return False

            logger.info(f"Processing resume for {candidate['firstname']} {candidate['lastname']}")

            # Process document
            chunks = self.doc_processor.process_document(resume_path)

            if not chunks:
                logger.warning(f"No chunks extracted from {resume_path}")
                return False

            # Generate embeddings
            chunk_texts = [chunk['text'] for chunk in chunks]
            embeddings = self.embedding_service.generate_embeddings(chunk_texts)

            # Insert into database
            self.db_service.insert_resume_chunks(
                candidate_id=candidate_id,
                chunks=chunks,
                embeddings=embeddings
            )

            logger.info(f"Successfully loaded resume for candidate {candidate_id}")
            return True

        except Exception as e:
            logger.error(f"Error loading resume for candidate {candidate_id}: {e}")
            return False

    def load_all_resumes(self) -> Dict[int, bool]:
        """Load all candidate resumes.

        Returns:
            Dictionary mapping candidate IDs to success status
        """
        results = {}
        candidates = self.db_service.get_all_candidates()

        logger.info(f"Loading resumes for {len(candidates)} candidates")

        for candidate in candidates:
            candidate_id = candidate['id']
            success = self.load_candidate_resume(candidate_id)
            results[candidate_id] = success

        successful = sum(1 for v in results.values() if v)
        logger.info(f"Successfully loaded {successful}/{len(candidates)} resumes")

        return results

    def reload_candidate_resume(self, candidate_id: int) -> bool:
        """Reload a candidate's resume (delete old chunks and reload).

        Args:
            candidate_id: Candidate ID

        Returns:
            True if successful, False otherwise
        """
        try:
            # Delete existing chunks
            self.db_service.delete_candidate_chunks(candidate_id)

            # Load resume
            return self.load_candidate_resume(candidate_id)

        except Exception as e:
            logger.error(f"Error reloading resume for candidate {candidate_id}: {e}")
            return False

# Made with Bob

document_processor: This code utilizes the Docling library to parse and convert PDF and DOCX resumes into structured text formats, and implements hierarchical chunking to break down long documents into smaller, manageable pieces while preserving the document’s logical structure.

# document_processor.py
"""Document processing with Docling."""
import logging
from pathlib import Path
from typing import List, Dict, Any

from docling.document_converter import DocumentConverter
from docling_core.transforms.chunker import HierarchicalChunker

logger = logging.getLogger(__name__)


class DocumentProcessor:
    """Process documents using Docling."""

    def __init__(self, chunk_size: int = 500, chunk_overlap: int = 50):
        """Initialize document processor.

        Args:
            chunk_size: Maximum size of text chunks
            chunk_overlap: Overlap between chunks
        """
        self.converter = DocumentConverter()
        self.chunker = HierarchicalChunker(
            max_tokens=chunk_size,
            overlap_tokens=chunk_overlap
        )
        logger.info("DocumentProcessor initialized")

    def process_document(self, file_path: str) -> List[Dict[str, Any]]:
        """Process a document and return chunks with metadata.

        Args:
            file_path: Path to the document file

        Returns:
            List of dictionaries containing chunk text and metadata
        """
        try:
            logger.info(f"Processing document: {file_path}")

            # Convert document
            result = self.converter.convert(file_path)
            doc = result.document

            # Chunk the document
            chunks = list(self.chunker.chunk(doc))

            # Extract text and metadata from chunks
            processed_chunks = []
            for idx, chunk in enumerate(chunks):
                chunk_data = {
                    "text": chunk.text,
                    "index": idx,
                    "metadata": {
                        "source": file_path,
                        "chunk_index": idx,
                        "total_chunks": len(chunks)
                    }
                }
                processed_chunks.append(chunk_data)

            logger.info(f"Processed {len(processed_chunks)} chunks from {file_path}")
            return processed_chunks

        except Exception as e:
            logger.error(f"Error processing document {file_path}: {e}")
            raise

    def process_multiple_documents(self, file_paths: List[str]) -> Dict[str, List[Dict[str, Any]]]:
        """Process multiple documents.

        Args:
            file_paths: List of document file paths

        Returns:
            Dictionary mapping file paths to their processed chunks
        """
        results = {}
        for file_path in file_paths:
            try:
                results[file_path] = self.process_document(file_path)
            except Exception as e:
                logger.error(f"Failed to process {file_path}: {e}")
                results[file_path] = []

        return results

# Made with Bob

embedding_service: Managing all interactions with the PostgreSQL database, including storing candidate metadata and resume text chunks. Also, this code leverages the pgvector extension to perform specialized vector similarity searches, allowing the system to find the most relevant resume sections for a given query

# embedding_service.py
"""Embedding service using Ollama."""
import logging
from typing import List, Optional
import ollama

from src.config import config

logger = logging.getLogger(__name__)


class EmbeddingService:
    """Generate embeddings using Ollama."""

    def __init__(self, model: Optional[str] = None, host: Optional[str] = None):
        """Initialize embedding service.

        Args:
            model: Ollama model name for embeddings
            host: Ollama host URL
        """
        self.model = model or config.EMBEDDING_MODEL
        self.host = host or config.OLLAMA_HOST
        self.client = ollama.Client(host=self.host)
        logger.info(f"EmbeddingService initialized with model: {self.model}")

    def generate_embedding(self, text: str) -> List[float]:
        """Generate embedding for a single text.

        Args:
            text: Text to embed

        Returns:
            Embedding vector as list of floats
        """
        try:
            response = self.client.embeddings(
                model=self.model,
                prompt=text
            )
            return response['embedding']
        except Exception as e:
            logger.error(f"Error generating embedding: {e}")
            raise

    def generate_embeddings(self, texts: List[str]) -> List[List[float]]:
        """Generate embeddings for multiple texts.

        Args:
            texts: List of texts to embed

        Returns:
            List of embedding vectors
        """
        embeddings = []
        for text in texts:
            try:
                embedding = self.generate_embedding(text)
                embeddings.append(embedding)
            except Exception as e:
                logger.error(f"Failed to generate embedding for text: {e}")
                # Return zero vector on error
                embeddings.append([0.0] * 384)  # granite-embedding:30m dimension

        return embeddings

    def test_connection(self) -> bool:
        """Test connection to Ollama service.

        Returns:
            True if connection successful, False otherwise
        """
        try:
            # Try to generate a test embedding
            self.generate_embedding("test")
            logger.info("Successfully connected to Ollama")
            return True
        except Exception as e:
            logger.error(f"Failed to connect to Ollama: {e}")
            return False

# Made with Bob

RAG Service

Last but not least, this is the core “brain” of the application that orchestrates the Retrieval-Augmented Generation pipeline by combining retrieved context with user queries. It sends the final enriched prompt to the granite4:latest model via Ollama to generate a natural language response based specifically on the candidate data.

# rag_service.py
"""RAG service for retrieval and generation."""
import logging
from typing import List, Dict, Any, Optional
import ollama

from src.config import config
from src.database_service import DatabaseService
from src.embedding_service import EmbeddingService

logger = logging.getLogger(__name__)


class RAGService:
    """Retrieval-Augmented Generation service."""

    def __init__(
        self,
        db_service: DatabaseService,
        embedding_service: EmbeddingService,
        llm_model: Optional[str] = None,
        host: Optional[str] = None
    ):
        """Initialize RAG service.

        Args:
            db_service: Database service instance
            embedding_service: Embedding service instance
            llm_model: Ollama LLM model name
            host: Ollama host URL
        """
        self.db_service = db_service
        self.embedding_service = embedding_service
        self.llm_model = llm_model or config.LLM_MODEL
        self.host = host or config.OLLAMA_HOST
        self.client = ollama.Client(host=self.host)
        logger.info(f"RAGService initialized with LLM: {self.llm_model}")

    def retrieve_context(
        self,
        query: str,
        top_k: int = None,
        candidate_id: Optional[int] = None
    ) -> List[Dict[str, Any]]:
        """Retrieve relevant context for a query.

        Args:
            query: User query
            top_k: Number of results to retrieve
            candidate_id: Optional candidate ID to filter results

        Returns:
            List of relevant chunks with metadata
        """
        if top_k is None:
            top_k = config.TOP_K_RESULTS

        try:
            # Generate query embedding
            query_embedding = self.embedding_service.generate_embedding(query)

            # Search for similar chunks
            results = self.db_service.search_similar_chunks(
                query_embedding=query_embedding,
                top_k=top_k,
                candidate_id=candidate_id
            )

            logger.info(f"Retrieved {len(results)} chunks for query")
            return results

        except Exception as e:
            logger.error(f"Error retrieving context: {e}")
            raise

    def generate_response(
        self,
        query: str,
        context_chunks: List[Dict[str, Any]],
        system_prompt: Optional[str] = None
    ) -> str:
        """Generate response using LLM with retrieved context.

        Args:
            query: User query
            context_chunks: Retrieved context chunks
            system_prompt: Optional system prompt

        Returns:
            Generated response
        """
        try:
            # Build context from chunks
            context_parts = []
            for i, chunk in enumerate(context_chunks, 1):
                candidate_name = f"{chunk['firstname']} {chunk['lastname']}"
                similarity = chunk.get('similarity', 0)
                context_parts.append(
                    f"[Context {i} - {candidate_name} (Relevance: {similarity:.2f})]\n{chunk['chunk_text']}"
                )

            context = "\n\n".join(context_parts)

            # Default system prompt
            if system_prompt is None:
                system_prompt = """You are a helpful AI assistant that answers questions about candidates based on their resumes.
Use only the provided context to answer questions. If the context doesn't contain enough information, say so.
Be specific and cite which candidate the information comes from."""

            # Build user prompt
            user_prompt = f"""Context from candidate resumes:

{context}

Question: {query}

Please provide a detailed answer based on the context above."""

            # Generate response
            response = self.client.chat(
                model=self.llm_model,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": user_prompt}
                ]
            )

            return response['message']['content']

        except Exception as e:
            logger.error(f"Error generating response: {e}")
            raise

    def chat(
        self,
        query: str,
        candidate_id: Optional[int] = None,
        top_k: int = None
    ) -> Dict[str, Any]:
        """Complete RAG pipeline: retrieve and generate.

        Args:
            query: User query
            candidate_id: Optional candidate ID to filter results
            top_k: Number of context chunks to retrieve

        Returns:
            Dictionary with response and metadata
        """
        try:
            # Retrieve context
            context_chunks = self.retrieve_context(
                query=query,
                top_k=top_k,
                candidate_id=candidate_id
            )

            if not context_chunks:
                return {
                    "response": "I couldn't find any relevant information in the candidate resumes to answer your question.",
                    "context_chunks": [],
                    "query": query
                }

            # Generate response
            response = self.generate_response(
                query=query,
                context_chunks=context_chunks
            )

            return {
                "response": response,
                "context_chunks": context_chunks,
                "query": query
            }

        except Exception as e:
            logger.error(f"Error in chat: {e}")
            raise

    def stream_chat(
        self,
        query: str,
        candidate_id: Optional[int] = None,
        top_k: int = None
    ):
        """Stream RAG response.

        Args:
            query: User query
            candidate_id: Optional candidate ID to filter results
            top_k: Number of context chunks to retrieve

        Yields:
            Response chunks
        """
        try:
            # Retrieve context
            context_chunks = self.retrieve_context(
                query=query,
                top_k=top_k,
                candidate_id=candidate_id
            )

            if not context_chunks:
                yield "I couldn't find any relevant information in the candidate resumes to answer your question."
                return

            # Build context
            context_parts = []
            for i, chunk in enumerate(context_chunks, 1):
                candidate_name = f"{chunk['firstname']} {chunk['lastname']}"
                similarity = chunk.get('similarity', 0)
                context_parts.append(
                    f"[Context {i} - {candidate_name} (Relevance: {similarity:.2f})]\n{chunk['chunk_text']}"
                )

            context = "\n\n".join(context_parts)

            system_prompt = """You are a helpful AI assistant that answers questions about candidates based on their resumes.
Use only the provided context to answer questions. If the context doesn't contain enough information, say so.
Be specific and cite which candidate the information comes from."""

            user_prompt = f"""Context from candidate resumes:

{context}

Question: {query}

Please provide a detailed answer based on the context above."""

            # Stream response
            stream = self.client.chat(
                model=self.llm_model,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": user_prompt}
                ],
                stream=True
            )

            for chunk in stream:
                if 'message' in chunk and 'content' in chunk['message']:
                    yield chunk['message']['content']

        except Exception as e:
            logger.error(f"Error in stream_chat: {e}")
            yield f"Error: {str(e)}"

# Made with Bob

Conclusion

The Candidate RAG System demonstrates a powerful shift in recruitment technology by transforming static, unstructured resumes into a dynamic and queryable knowledge base. At the heart of this transformation is Docling, which provides robust information extraction capabilities to accurately parse and chunk content from both PDF and DOCX formats. By integrating this sophisticated document processing with PostgreSQL and pgvector for hybrid relational and semantic storage, the architecture allows organizations to move beyond keyword matching to a deeper, contextual understanding of their talent pool.

This specific use-case proves that combining Docling’s hierarchical chunking with vector similarity search enables recruiters to ask complex questions and receive evidence-based answers, effectively bridging the gap between massive document repositories and actionable hiring insights.

>>> Thanks for reading <<<

DEV Community