Emmanuel Mensah

Posted on Oct 7

Medical Chatbots with RAG

#rag #llm #gemini #python

My colleague built a RAG chatbot that answered HR questions using company information. Inspired by this, I created a similar one for healthcare. This guide shows you how to build a medical chatbot from scratch using RAG, Flask, and modern AI tools.

What you'll build:
• A chatbot that answers health questions using reliable medical data
• RAG architecture for contextual responses
• Professional chat interface with medical disclaimers

⚠️ Important: This is for educational purposes only. Never replace professional medical advice.

Prerequisites

Before we start, make sure you have:

Python 3 installed
Basic knowledge of Python and Flask
API keys for Google Gemini and Pinecone(Vector Database)

Project Setup

Your Project structure should look like this:

Now install the required packages:
pip install flask flask-cors python-dotenv pinecone-client langchain langchain-pinecone langchain-community langchain-google-genai langchain-huggingface sentence-transformers pypdf

Building the RAG Pipeline

RAG lets our chatbot find relevant medical information from documents and generate accurate responses.

Let's create our document processing utilities in src/helper.py:

import os
import glob
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings


def extract_and_split_pdf(file_path, chunk_size=1000, chunk_overlap=200):
    """Extract and chunk PDF content."""

    # Ensure we're in the right directory
    if os.getcwd().endswith('research'):
        os.chdir('..')
        print("Changed working directory to:", os.getcwd())

    # Load the PDF file
    loader = PyPDFLoader(file_path)
    documents = loader.load()
    print(f"Loaded {len(documents)} document(s) from {file_path}")

    # Split the documents into chunks
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap
    )
    text_chunks = text_splitter.split_documents(documents)
    print(f"Split documents into {len(text_chunks)} chunks")

    return text_chunks


def extract_and_split_pdfs_from_directory(directory_path, chunk_size=1000, chunk_overlap=200):
    """Extract and chunk PDF content from all PDFs in a directory."""

    # Ensure we're in the right directory
    if os.getcwd().endswith('research'):
        os.chdir('..')  # Move up to the parent directory
        print("Changed working directory to:", os.getcwd())

    # Find all PDF files in the directory
    pdf_pattern = os.path.join(directory_path, "*.pdf")
    pdf_files = glob.glob(pdf_pattern)

    if not pdf_files:
        raise ValueError(f"No PDF files found in directory: {directory_path}")

    print(
        f"Found {len(pdf_files)} PDF file(s): {[os.path.basename(f) for f in pdf_files]}")

    all_text_chunks = []
    total_documents = 0

    # Process each PDF file
    for pdf_file in pdf_files:
        print(f"\nProcessing: {os.path.basename(pdf_file)}")

        # Load the PDF file
        loader = PyPDFLoader(pdf_file)
        documents = loader.load()
        total_documents += len(documents)
        print(
            f"Loaded {len(documents)} document(s) from {os.path.basename(pdf_file)}")

        # Split the documents into chunks
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=chunk_size,
            chunk_overlap=chunk_overlap
        )
        text_chunks = text_splitter.split_documents(documents)
        all_text_chunks.extend(text_chunks)
        print(
            f"Created {len(text_chunks)} chunks from {os.path.basename(pdf_file)}")

    print(
        f"\nTotal: {len(pdf_files)} PDF files, {total_documents} documents, {len(all_text_chunks)} chunks")
    return all_text_chunks


# Download and return the embeddings model
def download_embeddings():
    model_name = "all-MiniLM-L6-v2"
    embeddings = HuggingFaceEmbeddings(model_name=model_name)
    return embeddings

Key components:

PDF Processing: Our system loads medical PDFs from the data/ directory and automatically splits them into manageable 1000-character chunks with 200-character overlap. This ensures we capture complete medical concepts while maintaining context between related information.

Chunking Benefits: Large medical documents can overwhelm AI models and lead to generic responses. By breaking content into smaller, focused pieces, our chatbot can pinpoint exact information relevant to user questions, resulting in more accurate and specific medical guidance.

Embedding Model: The all-MiniLM-L6-v2 model transforms each text chunk into a 384-dimensional numerical vector that captures the semantic meaning of medical content. When users ask questions, the system converts their query into the same vector format and finds the most similar medical information using mathematical similarity calculations.

Storing Our Medical Knowledge in Pinecone

We'll transform our processed medical documents into a searchable vector database using Pinecone. Here's how we set it up in store_index.py:

from src.helper import download_embeddings, extract_and_split_pdfs_from_directory
from pinecone import Pinecone, ServerlessSpec
from langchain_pinecone import PineconeVectorStore
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")

# Extract and split all PDFs from the data directory
extracted_data = extract_and_split_pdfs_from_directory(
    directory_path='data', chunk_size=1000, chunk_overlap=200)
text_chunks = extracted_data
embeddings = download_embeddings()


# set up pinecone
pc = Pinecone(api_key=PINECONE_API_KEY)
index_name = "health-chatbot-final"
if not pc.has_index(index_name):
    pc.create_index(
        name=index_name,
        dimension=384,
        metric="cosine",
        spec=ServerlessSpec(
            cloud="aws",
            region="us-east-1",
        )
    )

# embed each chunk and store in pinecone
vector_store = PineconeVectorStore.from_documents(
    index_name=index_name,
    embedding=embeddings,
    documents=text_chunks
)

How it works

Index Setup: Creates a Pinecone index with 384 dimensions (matching our embedding model) and uses cosine similarity to measure how closely related different medical concepts are.

Vector Storage: Each text chunk gets converted into a numerical vector that captures its medical meaning, then stored in Pinecone's cloud database for instant access.

Fast retrieval: When users ask questions like "What are diabetes symptoms?", Pinecone compares the question's vector against all stored medical content and returns the most relevant chunks in milliseconds.

⚠️ Important: Run this script once to set up your vector database. Your medical knowledge base is then ready for the chatbot!

Creating the Bot's personality

Now we need to teach our AI how to behave like a responsible medical assistant. To set the "personality" and "guidelines" for our chatbot, create src/prompt.py:

How this guides our AI

When someone asks "What causes high blood pressure?", our prompt ensures the AI will:

Answer clearly based on medical documents
Use simple language anyone can understand
Format responses with headings and bullet points
Include disclaimers about consulting healthcare professionals

Proper prompt engineering is crucial. It creates a helpful medical assistant rather than an AI that oversteps its boundaries.

Building the Main Flask Application

The app.py combines our document processing, vector database, and AI prompts into a working chatbot API. Create app.py in your project root:

from pinecone import Pinecone
import os
from dotenv import load_dotenv
from langchain_pinecone import PineconeVectorStore
from flask import Flask, render_template, request, jsonify
from flask_cors import CORS
from langchain_google_genai import GoogleGenerativeAI
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from src.prompt import system_prompt
from src.helper import download_embeddings

app = Flask(__name__)
CORS(app)

# Load environment variables
load_dotenv()

print("Starting HealthBot...")

# Initialize embeddings
embeddings = download_embeddings()
print("Embeddings model loaded.")

# Use the same index name as in store_index.py
index_name = "health-chatbot-final"

print("Connecting to Pinecone...")

# Get API keys
PINECONE_API_KEY = os.environ.get('PINECONE_API_KEY')
GOOGLE_API_KEY = os.environ.get('GOOGLE_API_KEY')

if not PINECONE_API_KEY or not GOOGLE_API_KEY:
    raise ValueError("Missing required API keys in environment variables!")

# check if index exists
pc = Pinecone(api_key=PINECONE_API_KEY)

if not pc.has_index(index_name):
    print(f"Error: Pinecone index '{index_name}' not found!")
    print("Please run 'python store_index.py' first to create and populate the index.")
    exit(1)

# Load existing index from pinecone
vector_store = PineconeVectorStore.from_existing_index(
    index_name=index_name,
    embedding=embeddings
)
print("Connected to Pinecone.")

llm = GoogleGenerativeAI(
    model="gemini-2.5-flash",
    temperature=0.4,
    max_output_tokens=1024
)

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("user", "{input}")
])

retriever = vector_store.as_retriever(
    search_kwargs={"k": 3})  # Limit to 3 most relevant docs
question_answering_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answering_chain)


@app.route('/')
def home():
    """Render chat interface"""
    return render_template('index.html')


@app.route('/chat', methods=['POST'])
def chat():
    """Handle chat requests"""
    try:
        data = request.get_json()
        user_message = data.get('message', '')

        if not user_message:
            return jsonify({'error': 'No message provided'}), 400

        # Get response from RAG chain
        response = rag_chain.invoke({"input": user_message})
        bot_response = response["answer"]

        return jsonify({
            'response': bot_response,
            'status': 'success'
        })

    except Exception as e:
        print(f"Error in chat endpoint: {str(e)}")
        return jsonify({
            'error': 'An error occurred while processing your request',
            'status': 'error'
        }), 500


if __name__ == '__main__':
    print("Starting Flask server on http://localhost:5000")
    app.run(debug=True, host='0.0.0.0', port=5000)

What's happening here?

Receives user's medical question
Uses RAG chain to find relevant medical documents
Generates contextual response using Gemini AI
Adds medical disclaimer automatically

How the chatbot works

Our chatbot answers medical questions using the RAG pipeline we built. It finds relevant information from medical documents and provides helpful responses with proper disclaimers through a clean, user-friendly interface.

What we've accomplished

In this tutorial, we built a complete medical chatbot from scratch:
✅ Created a RAG pipeline for accurate medical information retrieval
✅ Set up Pinecone vector database for fast document search
✅ Built a Flask backend with proper API endpoints
✅ Designed a professional chat interface
✅ Added essential medical disclaimers for user safety

Future Improvements

While our chatbot works great, here are some enhancements we could add:

Conversational flow - Make it dialog-based rather than just Q&A
Chat memory - Store conversation history for reference
Multi-language support - Add local languages for broader accessibility

Wrapping Up

Building this medical chatbot was an exciting journey into RAG and AI applications. I successfully combined document processing, vector databases, and language models to create a system that intelligently answers health questions using reliable medical sources.

Modern tools like LangChain, Pinecone, and Flask make creating powerful applications surprisingly straightforward. What once took months can now be built in days with the right approach.

This project shows how AI can make medical information more accessible while maintaining safety through proper disclaimers and source-based responses. The RAG architecture ensures our chatbot doesn't hallucinate but grounds responses in actual medical literature.

This is just the beginning. The foundation we've built can extend to symptom checkers, medication guides, and more sophisticated healthcare tools.

Check out the full code on my repo.

If you found this tutorial helpful, follow me for more AI content! Have questions? Drop them in the comments below.

DEV Community