DEV Community

Cover image for Why Your AI Cites Real Sources That Never Said That (And the 3-Layer Fix)
Yaseen
Yaseen

Posted on • Originally published at Medium

Why Your AI Cites Real Sources That Never Said That (And the 3-Layer Fix)

100+ hallucinated citations passed peer review at NeurIPS 2025.

Expert reviewers. The world's most competitive AI conference. Three or more sign-offs per paper.

Still missed.

Because they weren't fake sources. The papers were real. The authors were real. The claims they were being used to support? Never appeared in them.

That's citation misattribution — and it's the hardest hallucination type to catch in production RAG pipelines.


What Is Citation Misattribution?

Most devs know about ghost citations — the model invents a paper, generates a plausible DOI, and a quick search returns nothing. Caught. Done.

Citation misattribution is different.

The model cites a real source but attributes a claim or finding to it that the source never actually made. The paper exists. The DOI resolves. The author is real. What the AI says the paper proves? Not in there.

GPTZero coined a term for it: vibe citing. Like vibe coding — generating code that feels correct without being correct — vibe citing produces references with the right shape of accuracy, wrong substance.

The source looks real. The claim sounds right. That's the whole problem.

Here's what makes it dangerous in production: a surface-level verification check passes. The source exists. The only way to catch the error is to read the cited passage and verify it supports the specific claim being made. At scale, that step gets skipped.


Why It Happens at the Model Level

The model isn't being careless. It's pattern-matching on what a well-cited output should look like — not what the source actually contains.

GPTZero found consistent patterns in the NeurIPS hallucinations:

  • Real author names expanded into guessed first names
  • Coauthors dropped or added
  • Paper titles paraphrased in ways that changed their scope
  • An arXiv ID linking to a completely different article
  • Placeholder IDs like arXiv:2305.XXXX in reference lists

These aren't random errors. They're structurally coherent errors. The model has learned the schema of a citation. It fills the schema. Whether the content at the referenced location supports the claim is a separate question — one it doesn't always get right.


Where the Exposure Lives in Production

Legal: Mata v. Avianca (2023) — an attorney submitted a ChatGPT-generated brief with six fabricated case citations. Sanctioned $5,000. That was ghost citations. Citation misattribution is the same liability surface, harder to catch.

Healthcare: Clinical AI misattributing a contraindication finding to a real study doesn't just create a compliance issue — it's a patient safety incident.

Enterprise: Research reports, competitive analyses, due diligence documents. Small claim-level distortions, compounding across every AI-generated output that cites a source.

The real problem is that it doesn't feel like a lie. It feels like a slightly imprecise interpretation of a real source. That's exactly when people stop checking.


The Diagnostic Question

Before the fix — one question worth asking about your current stack:

When your AI makes a specific claim and cites a source, is there any step in your pipeline that verifies the cited passage actually supports that claim?

Not whether the source exists. Whether the claim and the passage are aligned.

Most RAG pipelines don't answer that question. Here's why.

Standard RAG retrieves at document level

# Typical document-level retrieval
def retrieve(query: str, k: int = 5) -> list[Document]:
    embeddings = embed(query)
    results = vector_store.similarity_search(embeddings, k=k)
    return results  # Returns full documents — not specific passages
Enter fullscreen mode Exit fullscreen mode

This confirms the source is topically relevant. It doesn't verify that the specific passage inside that document supports the specific claim being generated.

Context drift compounds it. A nuanced finding gets compressed in summarisation. The summary feeds generation. By the time a citation appears in the output, the model is working from a representation that no longer preserves the original claim's limits.


The 3-Layer Fix

Layer 1 — Passage-Level Retrieval

Move from document-level to paragraph/section-level chunking. Retrieve the specific passages most likely to support or refute the claim — not the full document.

from langchain.text_splitter import RecursiveCharacterTextSplitter

# Chunk at passage level — not document level
splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,        # ~paragraph size
    chunk_overlap=64,      # preserve context across chunks
    separators=["\n\n", "\n", ". "]
)

passages = splitter.split_documents(documents)

# Store with metadata — source, page, section
for passage in passages:
    passage.metadata.update({
        "source_id": passage.metadata["source"],
        "chunk_index": passage.metadata.get("chunk_index", 0)
    })

vector_store.add_documents(passages)
Enter fullscreen mode Exit fullscreen mode

Now your retrieval returns a specific passage, not a full document. The model's generation window is narrowed to the evidence most likely to be relevant — reducing the opportunity for cross-section blending.


Layer 2 — Citation-to-Claim Alignment Check

After generation, before output — score whether the cited passage actually supports the generated claim.

from anthropic import Anthropic

client = Anthropic()

def check_citation_alignment(
    claim: str,
    cited_passage: str,
    threshold: float = 0.75
) -> dict:
    """
    Verify that the cited passage supports the generated claim.
    Returns alignment score + flag if below threshold.
    """

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=256,
        messages=[{
            "role": "user",
            "content": f"""Does this passage support the claim below?

Claim: {claim}

Passage: {cited_passage}

Respond ONLY with JSON:
{{
  "supported": true/false,
  "confidence": 0.0-1.0,
  "reason": "one sentence explanation"
}}"""
        }]
    )

    result = json.loads(response.content[0].text)
    result["flagged"] = result["confidence"] < threshold
    return result


# In your generation pipeline
alignment = check_citation_alignment(
    claim="GPT-4 achieves 92% accuracy on medical diagnosis tasks",
    cited_passage=retrieved_passage.page_content
)

if alignment["flagged"]:
    # Route to human review — don't let it ship
    queue_for_review(claim, cited_passage, alignment)
Enter fullscreen mode Exit fullscreen mode

This check runs inside the generation loop — before output, not after. By the time something ships, the cost of catching it has already multiplied.


Layer 3 — Quote Grounding

Require outputs to anchor claims to a specific quoted excerpt from the source — not just a document URL or title.

GROUNDED_PROMPT = """
Answer the question using the provided sources.

For every factual claim you make, you MUST include:
1. The specific sentence or passage from the source that supports it
2. The source ID it comes from

Format each grounded claim as:
[CLAIM] Your claim here.
[EVIDENCE] "Exact quoted passage from source" — Source ID: {source_id}

If no passage directly supports a claim, do not make the claim.
"""

def generate_grounded_response(query: str, passages: list[Document]) -> str:
    context = "\n\n".join([
        f"[Source {i}{p.metadata['source_id']}]\n{p.page_content}"
        for i, p in enumerate(passages)
    ])

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1000,
        system=GROUNDED_PROMPT,
        messages=[{
            "role": "user",
            "content": f"Sources:\n{context}\n\nQuestion: {query}"
        }]
    )

    return response.content[0].text
Enter fullscreen mode Exit fullscreen mode

When a claim is tied to a specific quoted passage, the verification surface becomes auditable in seconds. A reviewer sees the claim, sees the evidence, assesses the alignment. Without this, a citation is a pointer to a document. With it, it's a pointer to evidence.


Putting It Together — Full Pipeline

def citation_safe_rag(query: str) -> dict:

    # Layer 1: Passage-level retrieval
    passages = vector_store.similarity_search(
        query,
        k=5,
        search_type="mmr"   # Max marginal relevance — diverse passages
    )

    # Layer 2: Generate with grounding prompt
    raw_response = generate_grounded_response(query, passages)

    # Layer 3: Parse claims + run alignment checks
    claims = extract_claims_and_citations(raw_response)
    results = []

    for claim, source_id, quoted_passage in claims:
        alignment = check_citation_alignment(claim, quoted_passage)

        results.append({
            "claim": claim,
            "source": source_id,
            "evidence": quoted_passage,
            "alignment_score": alignment["confidence"],
            "flagged": alignment["flagged"],
            "reason": alignment["reason"]
        })

    # Route flagged claims for human review
    flagged = [r for r in results if r["flagged"]]
    if flagged:
        human_review_queue.push(flagged)

    return {
        "response": raw_response,
        "claims": results,
        "requires_review": len(flagged) > 0
    }
Enter fullscreen mode Exit fullscreen mode

The Metric You're Probably Not Tracking

Most teams track RAG performance on retrieval accuracy — are we getting the right documents?

The metric that actually matters here is citation precision score: the rate at which cited passages actually support the claims they're attached to.

If you don't have that metric in your eval suite, you don't have visibility into this failure mode.

def evaluate_citation_precision(test_cases: list[dict]) -> float:
    """
    test_cases: list of {claim, cited_passage, ground_truth_supported}
    Returns precision score across the dataset.
    """
    correct = 0

    for case in test_cases:
        alignment = check_citation_alignment(
            case["claim"],
            case["cited_passage"]
        )
        predicted = alignment["supported"]
        if predicted == case["ground_truth_supported"]:
            correct += 1

    return correct / len(test_cases)
Enter fullscreen mode Exit fullscreen mode

Add this to your CI pipeline. Run it on every RAG configuration change.


TL;DR

Layer What it does Where it runs
Passage-level retrieval Narrows context to specific evidence Retrieval stage
Citation-to-claim alignment Scores whether passage supports claim Post-generation, pre-output
Quote grounding Forces claims to reference exact passages Generation prompt

RAG solves the knowledge freshness problem. It doesn't solve the attribution accuracy problem. You need both.


Discussion

Have you run into citation misattribution in your RAG pipelines? How are you handling citation verification at scale?

Drop a comment — curious what approaches teams are using in production.


*Part of the AI Hallucination Series by Ai Ranking / YSquare Technology.

Follow Mohamed yaseen for more articles

Top comments (0)