DEV Community

Cover image for Why RAG Pipelines Silently Hallucinate — And The Decay Score That Catches It Before The LLM Does
VLSiddarth
VLSiddarth

Posted on

Why RAG Pipelines Silently Hallucinate — And The Decay Score That Catches It Before The LLM Does

Your RAG pipeline has a blind spot. It is not your embeddings. It is not your retrieval algorithm. It is time.

Vector databases return results ranked by semantic similarity. A document from 18 months ago and a document from last week score identically if the wording is similar. The LLM receives both with equal confidence. When the older document contains superseded information — a deprecated API method, an outdated compliance rule, a retracted clinical guideline — the model hallucinates with full conviction. No exception is raised. No warning appears.

I ran a live query against our production endpoint this week: "multi-agentic ai orchestration" at difficulty 4.

Here is what came back:

arxiv:2505.02861v2   decay: 0.214  label: fresh   age: 381 days
github:harmonist     decay: 0.015  label: fresh   age: 4 days
arxiv:2601.14652v4   decay: 0.072  label: fresh   age: 118 days
github:win4r/tasks   decay: 0.317  label: aging   age: 99 days ⚠️
arxiv:2601.10560v1   decay: 0.075  label: fresh   age: 123 days
github:builderz      decay: 0.306  label: aging   age: 95 days ⚠️

Enter fullscreen mode Exit fullscreen mode

Two sources flagged as aging. Not stale enough to block — but enough to warn before synthesis. Knowledge velocity: STABLE. Recommended refresh: quarterly.

The math behind the score is straightforward:

decay = 1 - 0.5^(age_days / half_life_days)
Enter fullscreen mode Exit fullscreen mode

Half-life varies by source type. A GitHub repository has a 180-day half-life — code goes stale as dependencies update. An arXiv paper has a 1,095-day half-life — research ages more slowly. A Stack Overflow answer sits at 365 days.

This is the layer I built: a post-retrieval decay gate that stamps every retrieved document with a deterministic freshness score before it enters the LLM context window. It sits between your vector database and your generation step. It requires zero changes to your existing pipeline.

Try it yourself — free tier, 500 calls/month, no credit card:

Step 1 — get your key

bash
curl -X POST "https://api.knowledgeuniverse.tech/v1/signup?email=you@example.com"
Enter fullscreen mode Exit fullscreen mode

Step 2 — run a query

curl -X POST https://api.knowledgeuniverse.tech/v1/discover \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"topic":"your domain here","difficulty":4,"formats":["pdf","github"]}'
Enter fullscreen mode Exit fullscreen mode

The max_decay_detected field in the response tells you the worst freshness score in your retrieved set. Gate on that. Block anything above 0.4 in fast-moving domains.

How are you handling temporal staleness in your RAG pipelines today?

Top comments (0)