<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Monika Sonnad Math</title>
    <description>The latest articles on DEV Community by Monika Sonnad Math (@monikasonnadmath).</description>
    <link>https://dev.to/monikasonnadmath</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3966298%2F26d5c20b-7a1b-4d3f-88fc-746328f6c712.jpeg</url>
      <title>DEV Community: Monika Sonnad Math</title>
      <link>https://dev.to/monikasonnadmath</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/monikasonnadmath"/>
    <language>en</language>
    <item>
      <title>I Work in Healthcare Tech. Here's Why I Built a RAG Tool for Clinical Documents.</title>
      <dc:creator>Monika Sonnad Math</dc:creator>
      <pubDate>Wed, 03 Jun 2026 10:24:19 +0000</pubDate>
      <link>https://dev.to/monikasonnadmath/i-work-in-healthcare-tech-heres-why-i-built-a-rag-tool-for-clinical-documents-3630</link>
      <guid>https://dev.to/monikasonnadmath/i-work-in-healthcare-tech-heres-why-i-built-a-rag-tool-for-clinical-documents-3630</guid>
      <description>&lt;p&gt;I didn't set out to build a RAG application. I set out to solve an annoying problem I kept watching happen.&lt;br&gt;
I work as a senior software developer in healthcare technology in Belfast. A big part of that job is understanding what actually slows them down, and figuring out where software can help. Not what's technically impressive what's genuinely useful.&lt;br&gt;
One thing I kept noticing: a lot of time gets spent navigating documents. Not reading them carefully and thoughtfully — just navigating. Ctrl+F for a patient name. Scrolling to find a medication dosage. Hunting for a follow-up instruction buried in paragraph four of page seven of a discharge summary.&lt;br&gt;
It sounds small. Multiply it by every clinical document, every working day, and it adds up fast.&lt;br&gt;
When I started thinking about RAG as a solution, the first thing I did was slow down and think about what "accuracy" means in a clinical setting — because it means something different here than it does in most software contexts.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;The Problem With Most AI Demonstrations in Healthcare&lt;/strong&gt;&lt;br&gt;
Most AI demos in healthcare go like this: upload a document, ask a question, get a fluent confident answer. It looks impressive. The problem is that fluent and confident doesn't mean accurate. Language models are optimised to produce coherent text. Left to their own devices they'll fill gaps, make inferences, and sometimes just invent things — in a way that reads exactly like a real answer.&lt;br&gt;
In most contexts that's an acceptable tradeoff. In a clinical context it isn't. A patient's discharge medications, their follow-up appointments, their documented allergies — these things need to come from the actual document, not from a model's best guess based on what usually appears in documents like this.&lt;br&gt;
So before I wrote a single line of code I made two decisions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Temperature 0. No creativity. Deterministic responses only. The model either finds the answer in the document or it says it doesn't know.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Explicit system prompt. The model is told directly: answer only from the provided context. If the answer isn't there, say so clearly. Do not guess.&lt;br&gt;
These aren't complicated decisions. But they're the right ones for this use case, and I see a lot of healthcare AI demos where nobody made them.&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;How RAG Actually Works (Without the Hype)&lt;/strong&gt;&lt;br&gt;
RAG stands for Retrieval-Augmented Generation. The idea is straightforward:&lt;br&gt;
Instead of asking a language model a question and hoping it knows the answer from training, you first retrieve relevant sections from your actual documents, then ask the model to answer based only on what you retrieved.&lt;br&gt;
The pipeline looks like this:&lt;br&gt;
Document (PDF)&lt;br&gt;
  ↓&lt;br&gt;
Extract text&lt;br&gt;
  ↓&lt;br&gt;
Split into chunks&lt;br&gt;
  ↓&lt;br&gt;
Embed each chunk as a vector&lt;br&gt;
  ↓&lt;br&gt;
Store in a vector database&lt;/p&gt;

&lt;p&gt;↓ (at query time)&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Question&lt;br&gt;
      ↓&lt;br&gt;
Embed the question as a vector&lt;br&gt;
      ↓&lt;br&gt;
Find the most similar chunks (semantic search)&lt;br&gt;
      ↓&lt;br&gt;
Send those chunks + question to the LLM&lt;br&gt;
      ↓&lt;br&gt;
Get an answer grounded in the document&lt;br&gt;
The key word is grounded. The LLM doesn't know what's in your documents from training — it only knows what you retrieved and passed to it. If the answer isn't in the retrieved chunks, a well-instructed model will tell you that.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Building It: The Decisions That Mattered&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Chunking strategy&lt;/strong&gt;&lt;br&gt;
How you split documents into chunks matters more than most tutorials acknowledge. Too small and you lose context — a medication dosage split from its drug name is useless. Too large and retrieval gets imprecise.&lt;br&gt;
I landed on 500-character chunks with 50-character overlap. The overlap is important — it means the boundary between two chunks always has context from both sides, so you don't lose meaning at the split point.&lt;br&gt;
splitter = RecursiveCharacterTextSplitter(&lt;br&gt;
    chunk_size=500,&lt;br&gt;
    chunk_overlap=50,&lt;br&gt;
    separators=["\n\n", "\n", ". ", " ", ""],&lt;br&gt;
)&lt;br&gt;
The separator hierarchy matters too. We try to split at paragraph boundaries first, then sentence boundaries, then spaces. Splitting mid-sentence is a last resort.&lt;br&gt;
Embedding model&lt;br&gt;
I used OpenAI's text-embedding-3-small. It's fast, cheap, and good enough for document retrieval. For production clinical systems handling complex medical terminology you'd want to evaluate domain-specific embeddings — but for a general-purpose tool this works well.&lt;br&gt;
The system prompt&lt;br&gt;
This is where clinical RAG lives or dies:&lt;br&gt;
system_prompt = (&lt;br&gt;
    "You are a helpful assistant that answers questions about clinical documents. "&lt;br&gt;
    "Answer only from the provided context. If the answer is not in the context, "&lt;br&gt;
    "say so clearly — do not guess or make up information. "&lt;br&gt;
    "In a clinical setting, accuracy matters more than completeness."&lt;br&gt;
)&lt;br&gt;
That last sentence — accuracy matters more than completeness — is doing a lot of work. It gives the model permission to say "I don't know" rather than producing a plausible-sounding answer. In my testing, including it made a real difference to the rate of hallucinations on edge cases.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;What I Learned Building This&lt;/strong&gt;&lt;br&gt;
The messy document problem is real. Most RAG tutorials use clean, well-structured PDFs. Clinical documents are not clean or well-structured. Scanned discharge summaries with inconsistent formatting, referral letters with abbreviations that differ between hospitals, guidelines with tables that don't chunk cleanly — all of these degrade retrieval quality. I've started thinking about pre-processing pipelines to handle this better and may add that to the project.&lt;br&gt;
Confidence matters. I added a simple confidence indicator based on how many relevant chunks were retrieved. It's a rough heuristic — more retrieved chunks suggests a more answerable question — but it gives users a signal about how much to trust the response. In a clinical context, knowing when to verify against the source document is as important as the answer itself.&lt;br&gt;
The UI needs to be dead simple. Clinical staff are not developers. If the interface requires any technical knowledge to operate it won't get used. The Streamlit app has three interactions: upload a file, type a question, read the answer. That's it.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;The Code&lt;/strong&gt;&lt;br&gt;
The project is open source on GitHub: github.com/Monika-Sonnadmath/clinical-rag&lt;br&gt;
It has a Python API for developers who want to embed it in their own pipelines, and a Streamlit web app for anyone who just wants to use it directly.&lt;br&gt;
pip install -r requirements.txt&lt;br&gt;
export OPENAI_API_KEY=sk-...&lt;br&gt;
streamlit run app.py&lt;br&gt;
Upload a PDF, ask a question, get an answer. That's the whole thing.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;What's Next&lt;/strong&gt;&lt;br&gt;
A few things I want to add:&lt;br&gt;
• Local LLM support via Ollama — so it works without an API key and keeps documents entirely on-premises. For clinical use cases, data leaving the building is a concern.&lt;br&gt;
• Multi-document querying — ask a question across a folder of documents at once&lt;br&gt;
• Better pre-processing — handling scanned documents with variable quality, which is very much a solved problem in OCR but needs connecting to the retrieval pipeline&lt;br&gt;
If you work in healthcare tech and have thoughts on what would make this more useful, I'd genuinely like to hear from you.&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>programming</category>
      <category>rag</category>
    </item>
  </channel>
</rss>
