yuer

Posted on Nov 20

Deterministic RAG: A Drop-in Replacement for GraphRAG’s Unstable Planning

#architecture #llm #rag

Introduction

Current RAG systems rely heavily on LLM-driven dynamic planning.
This gives flexibility — but also brings instability:

same query → different routes

small context drift → different cluster summaries

debugging becomes guessing

reproducibility is poor

audit trails show “what happened”, but not “why this route”

GraphRAG improves structural traversal, but still depends on non-deterministic reasoning inside the agent layer.

Over the past 48 hours, I built a minimal deterministic RAG PoC to explore a simple question:

What if we remove LLM planning from the critical path and force RAG to run on

This article shares the idea and the PoC — designed as a drop-in, not a re

Why De

Determinism is not about “reducing creativity”.
It’s aboutmaking the retriev:

reproducible

debuggable

testable

auditable

stable under load

LLMs can still generate summaries and semantics, but the route must be fixed.

If an LLM wants to “improvise”, it must do so inside deterministic boundaries — not by changing the execution graph.

Design Principles

The PoC follows three simple constraints:

Deterministic clustering

use k-means (fixed k)

fix seed

fixed similarity threshold

deterministic grouping of entities

Deterministic summarization calls

temperature = 0

top_p = 1

max_tokens fixed

Execution = static graph

No planner.
No dynamic steps.
Just a frozen DAG:

entity_extraction
→ deterministic_cluster
→ cluster_summary
→ final_answer

Same input → same route → same audit trace.

Minimal PoC (Full File)

Repo:
https://github.com/yuer-dsl/deterministic-rag-poc

File:
examples/deterministic_rag_poc.py

minimal deterministic rag poc

same input => same execution path

import numpy as np
from sklearn.cluster import KMeans
from openai import OpenAI

client = OpenAI()

def deterministic_cluster(vectors, k=3, seed=42):
km = KMeans(n_clusters=k, random_state=seed)
km.fit(vectors)
return km.labels_

def summarize_cluster(texts):
prompt = f"Summarize:\n{texts}"
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
temperature=0,
top_p=1.0,
max_tokens=256,
)
return resp.choices[0].message.content

def deterministic_rag(docs, vectors):
labels = deterministic_cluster(vectors)
result = {}
for i, lbl in enumerate(labels):
result.setdefault(lbl, []).append(docs[i])

summaries = {

    lbl: summarize_cluster("\n".join(items))

    for lbl, items in result.items()

}

return summaries

How it Works (in practice)

Run twice with the same corpus:

python examples/deterministic_rag_poc.py
python examples/deterministic_rag_poc.py

You get:

identical cluster assignments

identical summaries

identical logs

identical final answers

This validates the core claim:

RAG does not need a planner to be powerful.
It only needs deterministic structure + LLM semantics.

Why this matters for the whole ecosystem

This PoC is not attacking GraphRAG.
It is proposing a complementary execution mode that:

stabilizes LLM-based RAG

works inside enterprise pipelines

satisfies compliance teams

passes reproducibility audits

enables multi-run evaluation

supports “deterministic agents”

makes RAG behave like a real subsystem rather than a creative collaborator

It solves one fundamental pain point:

People want correctness, not surprises.

A Drop-in Module, Not a New Paradigm (yet)

I am not announcing a new standard.
I am simply sharing a component that any agent framework can adopt:

LangChain

LlamaIndex

GraphRAG

PIKE-RAG

AgentBuilder

Custom enterprise pipelines

The goal is to educate the ecosystem:

Deterministic execution is not optional —
it’s the missing piece in current LLM architecture.

Closing Thoughts

This PoC is intentionally minimal.
It exists to make one point clear:

We can control the execution path today.
We don’t need another 200-page planning spec to do it.

If you want:

reproducibility

safety

determinism

testability

Then a deterministic RAG layer is the simplest, fastest, and most universal component you can add.

More experiments coming soon.
Stay tuned.

DEV Community