Building a Search Agent That Knows When It's Right (and When It's Not)

#webdev #programming #ai #beginners

In the rush to build RAG (Retrieval-Augmented Generation) applications, one critical component is often missing: Trust. Most agents simply retrieve chunks of text and synthesize an answer, regardless of whether the sources disagree or are of low quality.

This is how I built a Search Confidence Scoring Agent using AIsa and LangChain. This agent doesn't just answer questions—it assigns a deterministic "Confidence Score" (0-100) based on source quality and cross-source consensus.

The Problem: "Hallucinated Confidence"

Standard LLMs are notoriously overconfident. They will state a falsehood with the same conviction as a fact. To fix this, we need an engineering approach to truth, not just a probabilistic one.

The Solution: A Multi-Stream Architecture

I designed an architecture that queries 5 distinct information streams in parallel using AIsa's Unified API, plus Tavily as an external validator.

The Stack:

AIsa Scholar: For academic, peer-reviewed claims.
AIsa Web & Smart Search: For high-quality verified web data.
AIsa Explain: For meta-reasoning and context.
Tavily: For additional external grounding.
LangChain: For orchestration.

Key Implementation Details

1. Multi-Phase Orchestration with AIsa

One of AIsa's unique features is the ability to generate "Explanations" derived from a specific search session. We built a Two-Phase retrieval system:

Phase 1: Wide Discovery
First step is, I fire parallel queries to Scholar, Web, and Smart endpoints to gather raw data.

Phase 2: Deep Reasoning
I automatically capture the search_id from the AIsa response and pipe it into the specialized Explain Endpoint. This gives me a native, grounded explanation from the model itself, which I treat as a high-authority "Reviewer" in our trust calculations.

# excerpt from agent.py
# Phase 1: Parallel Search
futures = {
    executor.submit(self.scholar.search, query): "AIsa Scholar",
    executor.submit(self.web.search, query): "AIsa Web",
    # ...
}

# Phase 2: Chained Explanation
if ais_search_id:
    explanation = self.explain.search(search_id=ais_search_id)

2. Determining Truth via "Agreement Scoring"

Instead of letting the LLM guess a score, I implemented a deterministic rubric. We break retrieved content into Atomic Claims and check for consensus.

# excerpt from models.py
class Claim(BaseModel):
    text: str
    agreement_level: Literal["High", "Medium", "Low", "Conflict"]
    supporting_source_indices: List[int]

# excerpt from agent.py
def calculate_confidence(sources, claims):
    # 1. Source Quality Score (Academic sources weighted higher)
    score += min(academic_count * 15, 30) 

    # 2. Agreement Score (Penalize conflicts)
    for claim in claims:
        if claim.agreement_level == "Conflict":
            score -= 20
        elif claim.agreement_level == "High":
            score += 10

    return min(max(score, 0), 100)

This ensures that if our Academic sources (AIsa Scholar) conflict with our Web sources (Tavily/AIsa Web), the score explicitly drops.

Visualizing Trust

I built a Transparency Toggle in the UI. Users don't just see the answer; they can expand a "Developer View" to see the raw JSON responses from every API call. This makes the system "glass-box" rather than black-box.

Why AIsa?

This architecture would usually require managing 4-5 different vendor subscriptions (one for Scholar, one for Web, one for LLM inference). AIsa aggregates this into a single key, drastically simplifying the env setup and billing logic for agentic workflows.