How Agentic Search Actually Works: The Research Loop Link-Fetching Agents Miss

#ai #rag #scraping #llm

Most agent pipelines treat web search as a single-shot tool call: send a query, get back some URLs, fetch one or two of them, stuff the text into the context window, move on. That works fine for lookup tasks. "What is the capital of France?" does not need a research loop.

But real research tasks do. "What are the current funding trends in open-source AI infrastructure?" or "How does Company X's pricing compare to its three main competitors?" requires following threads, noticing gaps, and issuing follow-up queries. A single fetch-and-summarize pass almost always misses the part of the answer that was buried in a secondary source, a forum thread, or a page the first result happened to link to.

That is the gap agentic search is supposed to fill. Here is what the actual loop looks like, and why the naive version falls short.

What a naive search agent does

A typical ReAct-style agent calls a search tool, gets back a list of results, picks the top one or two, fetches the content, and hands it to the LLM. The LLM either answers from that or gives up.

The failure mode is quiet. The agent returns an answer, often a confident one, but it is based on whatever happened to rank highest in that one query. If the first result is a marketing page, a paywalled article, or a three-year-old blog post, the answer reflects that without the agent noticing.

Three concrete problems:

Single query coverage: one phrasing of a question surfaces a different slice of the web than a slightly different phrasing. No single query covers the topic.
No gap detection: the agent does not evaluate whether the retrieved content actually answers the question. It feeds whatever it got to the LLM and lets the LLM figure it out.
No follow-up: if the first batch of results is insufficient, the agent has no mechanism to try again with a refined query or drill into a promising link.

The research loop that fixes this

Agentic search replaces the single-shot pattern with a loop that has three phases: query generation, result evaluation, and follow-up decision.

Phase 1: generate multiple queries. Given a research goal, the LLM generates three to five distinct queries that approach the topic from different angles. Not just synonyms, but genuinely different framings: a factual lookup, a comparison query, a "what are people saying about" query, a date-scoped query.

Phase 2: fetch, extract, and score. For each query, fetch the top results. Extract clean text (not raw HTML with nav bars and cookie banners, but the actual prose). Score each chunk against the original research goal: is this relevant? does it introduce new information? does it contradict something already retrieved?

Phase 3: decide to continue or stop. If the retrieved content covers the goal, synthesize and return. If there are still gaps, generate new queries targeting those gaps specifically, and loop. Most well-scoped research tasks converge in two to four iterations.

Here is a minimal version of that loop in Python using Anakin's Agentic Search API, which handles the fetch-and-extract step and returns results with sources attached:

import requests
import os

ANAKIN_API_KEY = os.environ["ANAKIN_API_KEY"]
ANAKIN_ENDPOINT = "https://api.anakin.ai/v1/agentic-search"

def agentic_search(goal: str, max_iterations: int = 3) -> dict:
    collected_sources = []
    queries_tried = []
    current_query = goal

    for iteration in range(max_iterations):
        print(f"Iteration {iteration + 1}: querying '{current_query}'")

        response = requests.post(
            ANAKIN_ENDPOINT,
            headers={
                "Authorization": f"Bearer {ANAKIN_API_KEY}",
                "Content-Type": "application/json",
            },
            json={
                "query": current_query,
                "include_sources": True,
            },
            timeout=30,
        )
        response.raise_for_status()
        data = response.json()

        answer = data.get("answer", "")
        sources = data.get("sources", [])
        collected_sources.extend(sources)
        queries_tried.append(current_query)

        # Ask the LLM whether the answer covers the goal
        # or whether a follow-up query is needed.
        # In a real pipeline this is an LLM call.
        # Here we fake it with a length heuristic for illustration.
        if len(answer.split()) > 150:
            print(f"Goal covered after {iteration + 1} iteration(s).")
            return {"answer": answer, "sources": collected_sources}

        # Generate a follow-up query targeting the gap.
        # In production: call your LLM with the goal, the answer so far,
        # and ask it to produce a more specific query.
        current_query = f"{goal} detailed analysis {iteration + 1}"

    return {
        "answer": "Max iterations reached. Partial results below.",
        "sources": collected_sources,
    }


if __name__ == "__main__":
    result = agentic_search(
        goal="What are the main approaches to reducing LLM inference costs in 2024?"
    )
    print(result["answer"])
    for src in result["sources"]:
        print(" -", src.get("url"), src.get("title"))

The key thing the loop adds is the gap-detection step. Even with the fake heuristic above, the structure forces you to ask: did I actually get what I needed? That question is absent from the single-shot pattern.

What clean source attribution changes

The other thing a proper agentic search loop enables is traceable answers. When you fetch raw pages yourself and concatenate the text into a prompt, you lose the mapping between claims and sources. The LLM synthesizes across everything and you cannot tell which sentence came from where.

When each result comes back with its source URL attached to the specific text chunk it came from, you can build a citation index. For RAG pipelines this matters a lot: the final answer can include inline citations, and a downstream verification step can re-fetch the source to confirm a claim has not changed since it was indexed.

For agent memory, this is also useful. If the agent stores what it has already fetched (by URL or by a hash of the content), it avoids re-fetching the same page on the next iteration and can detect when two sources contradict each other.

Where to take this next

The loop I showed above is stateless across iterations. A more robust version would maintain a shared context object that accumulates:

All queries tried (to avoid rephrasing the same one)
All URLs fetched (to skip duplicates)
A running summary of what is known vs. what is still unknown
A confidence score that drives the stop condition

The stop condition is the hardest part to get right. Too aggressive and the agent stops after one good-looking result. Too lenient and it loops until it hits the token or cost limit. In practice, a small LLM call that scores coverage against a checklist derived from the original goal works better than a word-count heuristic.

The agents that produce genuinely useful research outputs are not the ones with the best base model. They are the ones with the tightest loop: query, evaluate, follow up, stop when done.