In the rush to build RAG (Retrieval-Augmented Generation) applications, one critical component is often missing: Trust. Most agents simply retrieve chunks of text and synthesize an answer, regardless of whether the sources disagree or are of low quality.
This is how I built a Search Confidence Scoring Agent using AIsa and LangChain. This agent doesn't just answer questions—it assigns a deterministic "Confidence Score" (0-100) based on source quality and cross-source consensus.
The Problem: "Hallucinated Confidence"
Standard LLMs are notoriously overconfident. They will state a falsehood with the same conviction as a fact. To fix this, we need an engineering approach to truth, not just a probabilistic one.
The Solution: A Multi-Stream Architecture
I designed an architecture that queries 5 distinct information streams in parallel using AIsa's Unified API, plus Tavily as an external validator.
The Stack:
- AIsa Scholar: For academic, peer-reviewed claims.
- AIsa Web & Smart Search: For high-quality verified web data.
- AIsa Explain: For meta-reasoning and context.
- Tavily: For additional external grounding.
- LangChain: For orchestration.
Key Implementation Details
1. Multi-Phase Orchestration with AIsa
One of AIsa's unique features is the ability to generate "Explanations" derived from a specific search session. We built a Two-Phase retrieval system:
Phase 1: Wide Discovery
First step is, I fire parallel queries to Scholar, Web, and Smart endpoints to gather raw data.
Phase 2: Deep Reasoning
I automatically capture the search_id from the AIsa response and pipe it into the specialized Explain Endpoint. This gives me a native, grounded explanation from the model itself, which I treat as a high-authority "Reviewer" in our trust calculations.
# excerpt from agent.py
# Phase 1: Parallel Search
futures = {
executor.submit(self.scholar.search, query): "AIsa Scholar",
executor.submit(self.web.search, query): "AIsa Web",
# ...
}
# Phase 2: Chained Explanation
if ais_search_id:
explanation = self.explain.search(search_id=ais_search_id)
2. Determining Truth via "Agreement Scoring"
Instead of letting the LLM guess a score, I implemented a deterministic rubric. We break retrieved content into Atomic Claims and check for consensus.
# excerpt from models.py
class Claim(BaseModel):
text: str
agreement_level: Literal["High", "Medium", "Low", "Conflict"]
supporting_source_indices: List[int]
# excerpt from agent.py
def calculate_confidence(sources, claims):
# 1. Source Quality Score (Academic sources weighted higher)
score += min(academic_count * 15, 30)
# 2. Agreement Score (Penalize conflicts)
for claim in claims:
if claim.agreement_level == "Conflict":
score -= 20
elif claim.agreement_level == "High":
score += 10
return min(max(score, 0), 100)
This ensures that if our Academic sources (AIsa Scholar) conflict with our Web sources (Tavily/AIsa Web), the score explicitly drops.
Visualizing Trust
I built a Transparency Toggle in the UI. Users don't just see the answer; they can expand a "Developer View" to see the raw JSON responses from every API call. This makes the system "glass-box" rather than black-box.
Why AIsa?
This architecture would usually require managing 4-5 different vendor subscriptions (one for Scholar, one for Web, one for LLM inference). AIsa aggregates this into a single key, drastically simplifying the env setup and billing logic for agentic workflows.


Top comments (0)