How Agentic Search Actually Works: The Research Loop Link-Fetching Agents Miss

#ai #rag #scraping #llm

How Agentic Search Actually Works: The Research Loop Link-Fetching Agents Miss

Most agent tutorials show you the same pattern: take a user query, call a search API, grab the top result, stuff the text into your prompt. Done. Ship it.

That works fine for trivia. It falls apart when the question requires synthesis across multiple sources, when the first result is a listicle with no substance, or when the answer depends on information that only shows up three clicks deep into a documentation site.

The difference between a link-fetching agent and a genuinely useful research agent is the loop. Let me show you what that loop actually looks like.

What a Naive Search Agent Does

A basic agent that "uses web search" usually does something like this:

Receive question
Run one search query
Take the first URL from results
Fetch that URL
Return whatever text comes back

The problem is that web search results are ranked for clicks, not for answer quality. The top result might be a vendor comparison page with affiliate links, or a forum thread where nobody answered the question, or a press release from three years ago.

Even if you grab five results instead of one, you're still making a single pass. You're not evaluating whether what you got actually answers the question.

The Research Loop

Agentic search works differently. The core idea is that the LLM drives the process iteratively, deciding at each step whether it has enough information or needs to dig further. The loop looks more like this:

Receive question
Generate a targeted search query (the LLM should write this, not just pass the user input verbatim)
Get search results as structured data (titles, URLs, snippets)
Decide which results are worth fetching based on snippets
Fetch selected pages, get clean text
Evaluate: does this actually answer the question? What's still missing?
If incomplete, generate a follow-up query targeting the gap
Repeat until confident or until a step budget is exhausted
Synthesize a final answer with sources

Steps 6 and 7 are where most agent implementations stop short. Without them, you have a retrieval tool, not a research agent.

Here's a minimal Python implementation of this loop using Anakin's Agentic Search endpoint, which handles the iterative querying and source-grounded answering in one call, and their Scrape API for when you need to go deeper on a specific page:

import httpx
import json

ANAKIN_API_KEY = "your-api-key"

def agentic_search(question: str) -> dict:
    """
    Call Anakin's Agentic Search API. Returns an answer grounded in sources
    with citations, running the research loop server-side.
    """
    response = httpx.post(
        "https://api.anakin.ai/v1/agentic-search",
        headers={
            "Authorization": f"Bearer {ANAKIN_API_KEY}",
            "Content-Type": "application/json",
        },
        json={"query": question},
        timeout=60.0,
    )
    response.raise_for_status()
    return response.json()

def scrape_page(url: str) -> str:
    """
    Fetch clean text from a specific URL when the agent needs to go deeper.
    """
    response = httpx.post(
        "https://api.anakin.ai/v1/scrape",
        headers={
            "Authorization": f"Bearer {ANAKIN_API_KEY}",
            "Content-Type": "application/json",
        },
        json={"url": url, "format": "markdown"},
        timeout=30.0,
    )
    response.raise_for_status()
    return response.json().get("content", "")

def research_with_followup(question: str, context: str = "") -> str:
    """
    Run agentic search, then optionally scrape a specific source
    if the answer references something worth digging into further.
    """
    full_query = f"{question}\n\nContext: {context}" if context else question

    result = agentic_search(full_query)
    answer = result.get("answer", "")
    sources = result.get("sources", [])

    print(f"Initial answer:\n{answer}\n")
    print(f"Sources used: {len(sources)}")
    for s in sources:
        print(f"  - {s.get('title')} ({s.get('url')})")

    # If the answer mentions a specific doc or page worth reading fully,
    # you can scrape it and pass that content back for a deeper pass.
    if sources:
        top_source_url = sources[0].get("url")
        print(f"\nScraping top source for deeper context: {top_source_url}")
        full_text = scrape_page(top_source_url)

        # Now run a follow-up with the full page content as grounding
        followup_result = agentic_search(
            f"{question}\n\nFull text of primary source:\n{full_text[:4000]}"
        )
        return followup_result.get("answer", answer)

    return answer

if __name__ == "__main__":
    question = "What are the current rate limits for the OpenAI Assistants API and how do they differ by tier?"
    final_answer = research_with_followup(question)
    print(f"\nFinal answer:\n{final_answer}")

The key thing here is that the agentic search call is not just "search and return links." It's running its own internal loop: reformulating queries, following promising threads, discarding junk sources, and producing an answer with citations attached. Then the outer code can decide whether to go one level deeper on any cited source.

Why Snippets Are Not Enough for RAG

When you're building a RAG pipeline and you index search result snippets, you're indexing 150-character summaries written by search engines to generate clicks. Those snippets frequently omit the actual technical details: the exact configuration parameter, the version constraint, the exception to the rule that matters for your use case.

For factual retrieval tasks, this is fine. For technical research, it's a consistent source of hallucination. The LLM fills in the missing detail from training data, confidently, and wrong.

The fix is to fetch full page content for the sources that actually matter, get clean structured text (not raw HTML), and index that. When you integrate agentic search into a RAG pipeline, the output you want to embed is not the answer text. It's the source content the answer was grounded in, tagged with the query that surfaced it.

def build_rag_chunks(question: str) -> list[dict]:
    result = agentic_search(question)
    chunks = []
    for source in result.get("sources", []):
        full_text = scrape_page(source["url"])
        chunks.append({
            "url": source["url"],
            "title": source["title"],
            "content": full_text,
            "query": question,
        })
    return chunks

Now you have grounded, full-text chunks you can actually embed and retrieve later, not snippets.

Where to Go From Here

The research loop is not complicated conceptually, but it has real operational costs: latency goes up, token usage goes up, and you need a budget strategy so agents don't spin forever on hard questions. A step limit of 3 to 5 iterations covers most real-world queries without runaway costs.

If I were building this for production, I'd add a confidence score threshold to the evaluation step so the loop exits early when the answer quality is already high, rather than always burning the full budget. I'd also log every query and source fetched so I can audit where the agent went wrong when users report bad answers, because they will.

The link-fetching approach feels like research. The loop actually does it.