Introduction
Current RAG systems rely heavily on LLM-driven dynamic planning.
This gives flexibility — but also brings instability:
same query → different routes
small context drift → different cluster summaries
debugging becomes guessing
reproducibility is poor
audit trails show “what happened”, but not “why this route”
GraphRAG improves structural traversal, but still depends on non-deterministic reasoning inside the agent layer.
Over the past 48 hours, I built a minimal deterministic RAG PoC to explore a simple question:
What if we remove LLM planning from the critical path and force RAG to run on
This article shares the idea and the PoC — designed as a drop-in, not a re
Why De
Determinism is not about “reducing creativity”.
It’s aboutmaking the retriev:
reproducible
debuggable
testable
auditable
stable under load
LLMs can still generate summaries and semantics, but the route must be fixed.
If an LLM wants to “improvise”, it must do so inside deterministic boundaries — not by changing the execution graph.
Design Principles
The PoC follows three simple constraints:
- Deterministic clustering
use k-means (fixed k)
fix seed
fixed similarity threshold
deterministic grouping of entities
- Deterministic summarization calls
temperature = 0
top_p = 1
max_tokens fixed
- Execution = static graph
No planner.
No dynamic steps.
Just a frozen DAG:
entity_extraction
→ deterministic_cluster
→ cluster_summary
→ final_answer
Same input → same route → same audit trace.
Minimal PoC (Full File)
Repo:
https://github.com/yuer-dsl/deterministic-rag-poc
File:
examples/deterministic_rag_poc.py
minimal deterministic rag poc
same input => same execution path
import numpy as np
from sklearn.cluster import KMeans
from openai import OpenAI
client = OpenAI()
def deterministic_cluster(vectors, k=3, seed=42):
km = KMeans(n_clusters=k, random_state=seed)
km.fit(vectors)
return km.labels_
def summarize_cluster(texts):
prompt = f"Summarize:\n{texts}"
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
temperature=0,
top_p=1.0,
max_tokens=256,
)
return resp.choices[0].message.content
def deterministic_rag(docs, vectors):
labels = deterministic_cluster(vectors)
result = {}
for i, lbl in enumerate(labels):
result.setdefault(lbl, []).append(docs[i])
summaries = {
lbl: summarize_cluster("\n".join(items))
for lbl, items in result.items()
}
return summaries
How it Works (in practice)
Run twice with the same corpus:
python examples/deterministic_rag_poc.py
python examples/deterministic_rag_poc.py
You get:
identical cluster assignments
identical summaries
identical logs
identical final answers
This validates the core claim:
RAG does not need a planner to be powerful.
It only needs deterministic structure + LLM semantics.
Why this matters for the whole ecosystem
This PoC is not attacking GraphRAG.
It is proposing a complementary execution mode that:
stabilizes LLM-based RAG
works inside enterprise pipelines
satisfies compliance teams
passes reproducibility audits
enables multi-run evaluation
supports “deterministic agents”
makes RAG behave like a real subsystem rather than a creative collaborator
It solves one fundamental pain point:
People want correctness, not surprises.
A Drop-in Module, Not a New Paradigm (yet)
I am not announcing a new standard.
I am simply sharing a component that any agent framework can adopt:
LangChain
LlamaIndex
GraphRAG
PIKE-RAG
AgentBuilder
Custom enterprise pipelines
The goal is to educate the ecosystem:
Deterministic execution is not optional —
it’s the missing piece in current LLM architecture.
Closing Thoughts
This PoC is intentionally minimal.
It exists to make one point clear:
We can control the execution path today.
We don’t need another 200-page planning spec to do it.
If you want:
reproducibility
safety
determinism
testability
Then a deterministic RAG layer is the simplest, fastest, and most universal component you can add.
More experiments coming soon.
Stay tuned.
Top comments (0)