Beck_Moulton

Posted on Dec 31, 2025

Vectors vs. Keywords: Why "Close Enough" is Dangerous in MedTech RAG

#python #machinelearning #database #rag

Hey everyone! 👋

So, we need to talk about Vector Databases. If you’ve been on Dev.to (or anywhere on the internet) for the last year, you know that RAG (Retrieval-Augmented Generation) is basically the hottest thing in town. We dump our data into Pinecone, Weaviate, or pgvector, generate some embeddings, and let the LLM work its magic.

I love it. It feels like sci-fi. But recently, while working on a Clinical Decision Support prototype, I hit a massive wall.

Here is the hard truth: Pure vector search is a precision trap. And in medicine, that trap doesn't just return a bad search result—it can be dangerous.

The "Hypo" vs. "Hyper" Nightmare

Let’s look at the classic example that keeps me up at night.

Imagine a clinician queries your RAG system:

"Protocol for acute hyperglycemia management"

(Hyperglycemia = High Blood Sugar).

If you rely 100% on semantic search (cosine similarity), the model looks for concepts that appear in similar contexts.

Hyperglycemia appears in texts about blood sugar, insulin, and emergencies.
Hypoglycemia (Low Blood Sugar) appears in texts about... blood sugar, insulin, and emergencies.

In the latent space of many embedding models (even the fancy ones like OpenAI's text-embedding-3), these two words are neighbors. They are semantically "close" because they are siblings in the "blood sugar problems" family.

The result? Your vector search might confidently retrieve a document about treating Hypoglycemia (giving glucose) instead of Hyperglycemia (giving insulin).

If the LLM doesn't catch that nuance and summarizes the wrong retrieved chunk... well, you see the problem. "Close enough" isn't good enough here.

Why Vectors Fail at Precision

Vectors are all about vibes and context. They are amazing for realizing that "renal failure" and "kidney issues" are the same thing.

But they struggle with:

Antonyms (Hot/Cold, High/Low).
Proper Nouns (Drug names like Celexa vs Celebrex).
Negations ("Patient does not have fever").

I wrote a bit more about the foundational mistakes I made when starting with these architectures on my other blog recently, but the gist is: Don't trust the embedding blindly.

The Fix: Hybrid Search

The solution isn't to ditch vectors—we still need them for that semantic understanding. The solution is to bring back the "Old Guard" of search: Keywords (BM25).

We need a Hybrid Search architecture.

Dense Retrieval (Vectors): Finds documents that match the intent and context.
Sparse Retrieval (Keywords/BM25): Finds documents that contain the exact words.

If a user types "Hyperglycemia," BM25 will scream at any document containing "Hypoglycemia" and downrank it because the letters don't match.

How to Implement It (Conceptually)

You don't need to build this from scratch. Most modern vector DBs support this now, but here is the logic if you were stitching it together in Python:

# A sloppy, late-night pseudo-code example

def hybrid_search(query, alpha=0.5):
    """
    alpha: Weight between keyword (0) and vector (1) search
    """

    # 1. Get Vector Results (The "Vibe" Search)
    vector_results = vector_db.search(
        query_vector=generate_embedding(query), 
        limit=50
    )

    # 2. Get Keyword Results (The "Exact" Search)
    # BM25 algorithm checks for exact string matches
    keyword_results = keyword_db.search(
        query_text=query, 
        limit=50
    )

    # 3. Reciprocal Rank Fusion (RRF)
    # Merging the two lists based on their rank
    combined_results = rrf_merge(vector_results, keyword_results, alpha)

    return combined_results[:10]

The Secret Sauce: Re-ranking

Even with Hybrid search, you might still get some noise. The final step to make this production-ready for a medical context is a Cross-Encoder Reranker.

After you get your Top 50 results from the Hybrid search, you pass them through a model specifically designed to score "How relevant is Document A to Query B?" (like bge-reranker or Cohere's rerank endpoint).

This step is computationally expensive, so you only do it on the small subset of retrieved docs, but it acts as the final sanity check. It reads the text like a human would and says, "Wait, the user asked for Hyperglycemia, this doc talks about Hypoglycemia. Zero points."

Conclusion

If you are building a chat bot for a pizza shop, pure vectors are fine. If someone gets a pepperoni recommendation instead of sausage, no one dies.

But for Clinical Decision Support (or legal, or finance), we have to respect the limitations of embeddings.

Use Vectors for recall (finding everything relevant).
Use Keywords for precision (filtering out the antonyms).
Use Rerankers for the final verdict.

Happy coding, and may your cosine similarities always be accurate! 🩺💻

(P.S. If you've run into similar issues with RAG hallucinations, let me know in the comments. I'm curious how you solved it!)

DEV Community