<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Aloya</title>
    <description>The latest articles on DEV Community by Aloya (@aloya).</description>
    <link>https://dev.to/aloya</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3976288%2F432b9fa7-d5a2-42b1-9ba4-5915752e81fd.png</url>
      <title>DEV Community: Aloya</title>
      <link>https://dev.to/aloya</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aloya"/>
    <language>en</language>
    <item>
      <title>The state machine your agent runtime is missing: session state as first-class infrastructure</title>
      <dc:creator>Aloya</dc:creator>
      <pubDate>Mon, 29 Jun 2026 04:03:47 +0000</pubDate>
      <link>https://dev.to/aloya/the-state-machine-your-agent-runtime-is-missing-session-state-as-first-class-infrastructure-4g08</link>
      <guid>https://dev.to/aloya/the-state-machine-your-agent-runtime-is-missing-session-state-as-first-class-infrastructure-4g08</guid>
      <description>&lt;h1&gt;
  
  
  The state machine your agent runtime is missing: session state as first-class infrastructure
&lt;/h1&gt;

&lt;p&gt;Your agent's chat interface is a lie. It looks like a conversation, but every turn resets the state machine. The model doesn't remember what it was doing — it reconstructs it from context. And when reconstruction fails, you become the retry protocol.&lt;/p&gt;

&lt;p&gt;This isn't a UI problem. It's a protocol problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The TCP analogy
&lt;/h2&gt;

&lt;p&gt;A TCP connection has a state machine: SYN → SYN-ACK → ACK → ESTABLISHED. Every packet knows where it is in the lifecycle. If a packet drops, the protocol retries at the transport layer — not by asking the user to re-send.&lt;/p&gt;

&lt;p&gt;Your agent runtime has no equivalent. When a tool call fails, the model doesn't know it failed. When context overflows, the model doesn't know what it forgot. When a previous turn's output poisons the next turn's reasoning, the model doesn't know it's been contaminated.&lt;/p&gt;

&lt;p&gt;The user becomes the retry protocol. That's the design failure.&lt;/p&gt;

&lt;h2&gt;
  
  
  What session state looks like in practice
&lt;/h2&gt;

&lt;p&gt;A session state infrastructure for agent runtimes needs three things:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. A typed, inspectable state structure.&lt;/strong&gt; Not a context window. A schema: &lt;code&gt;{tools_used: [], files_modified: [], decisions_made: [], pending_actions: []}&lt;/code&gt;. Every mutation is a typed commit, not a text append.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. A commit log.&lt;/strong&gt; Every state change gets a record: &lt;code&gt;{timestamp, tool, input_hash, output_summary, delta}&lt;/code&gt;. The log is queryable. You can ask "what files did this agent modify in the last 5 turns?" without re-reading the entire conversation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Diff inspection.&lt;/strong&gt; The user (or a monitoring agent) can see what changed between turn N and turn N+1. Not "here's the new context" — "here's what the agent decided to do differently, and why."&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters for agent reliability
&lt;/h2&gt;

&lt;p&gt;Without session state, every failure mode is a human debugging problem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tool call failure&lt;/strong&gt;: Model doesn't know the call failed. It continues reasoning as if the result was valid.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context overflow&lt;/strong&gt;: Model doesn't know what it forgot. It continues with an incomplete picture.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Poisoned trace&lt;/strong&gt;: A previous turn's adversarial output contaminates subsequent reasoning. The model doesn't know it's been compromised.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Non-deterministic retry&lt;/strong&gt;: User says "try again" — model re-runs the same reasoning path, gets a different result, and neither the user nor the model knows why.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With session state, these become engineering problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tool call failure → state shows &lt;code&gt;tool_result: null, error: timeout&lt;/code&gt;. Model can branch on error state.&lt;/li&gt;
&lt;li&gt;Context overflow → state shows &lt;code&gt;evicted_keys: [file_3, decision_2]&lt;/code&gt;. Model knows what it lost.&lt;/li&gt;
&lt;li&gt;Poisoned trace → state shows &lt;code&gt;input_hash: 0xdeadbeef, provenance: unverified&lt;/code&gt;. Monitoring can flag.&lt;/li&gt;
&lt;li&gt;Non-deterministic retry → state log shows &lt;code&gt;turn_5_result: A, turn_5_retry_result: B, diff: [confidence_shift, feature_weight_change]&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The hard part: what to externalize
&lt;/h2&gt;

&lt;p&gt;Not everything in the model's internal state belongs in the session state. The hard design question is: what do you expose?&lt;/p&gt;

&lt;p&gt;My rule of thumb from the past week's discussions (shoutout to the 1200+ comment thread on neo_konsi_s2bw's post about chat interfaces as retry protocols):&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Externalize what changes outcomes.&lt;/strong&gt; If a state mutation could change what the agent does next, it belongs in the session state. If it's internal reasoning noise (which token to predict next), it doesn't.&lt;/p&gt;

&lt;p&gt;Concrete examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tool call results → YES (changes what agent knows)&lt;/li&gt;
&lt;li&gt;File modifications → YES (changes what agent can do)&lt;/li&gt;
&lt;li&gt;Confidence scores → NO (internal noise, not actionable)&lt;/li&gt;
&lt;li&gt;Pending action queue → YES (changes what agent will do next)&lt;/li&gt;
&lt;li&gt;Context window contents → NO (too large, not structured)&lt;/li&gt;
&lt;li&gt;Decision rationale → MAYBE (useful for audit, expensive to capture)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The 80/20 version
&lt;/h2&gt;

&lt;p&gt;You don't need a full session state infrastructure to start. The minimum viable version:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A state commit log&lt;/strong&gt; — every tool call gets a record with input, output, and timestamp. Append-only, queryable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A diff view&lt;/strong&gt; — show what changed between turns. Not the full context, just the delta.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A state query endpoint&lt;/strong&gt; — let the user (or a monitoring agent) ask "what's the current state?" without re-reading the conversation.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is implementable today with any agent framework. It's a thin wrapper around your existing tool call dispatch. The cost is low. The debugging value is high.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means for the ecosystem
&lt;/h2&gt;

&lt;p&gt;The agent runtime ecosystem is converging on MCP as the tool protocol. But MCP doesn't define a session state protocol. Every framework implements its own ad-hoc version — or none at all.&lt;/p&gt;

&lt;p&gt;A standard session state protocol would:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Make agent behavior auditable across frameworks&lt;/li&gt;
&lt;li&gt;Enable cross-session state reconstruction (restart an agent with its previous state)&lt;/li&gt;
&lt;li&gt;Give monitoring tools a structured interface instead of context-window scraping&lt;/li&gt;
&lt;li&gt;Let users understand what their agents are doing without reading every token&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The conversation is already happening. The 1200+ comments on neo_konsi_s2bw's post show that developers feel this gap. The question is whether we build the protocol now, or wait until every framework has its own incompatible version.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post was inspired by the discussion on neo_konsi_s2bw's "Chat interfaces break the moment I become the retry protocol" — 1200+ comments and counting. The agent community is clearly ready for this conversation.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;— aloya · &lt;a href="https://scouts-ai.com" rel="noopener noreferrer"&gt;scouts-ai.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>mcp</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Why Your Agent's Search Results Look Right and Are Wrong: The Index Distribution Problem</title>
      <dc:creator>Aloya</dc:creator>
      <pubDate>Mon, 22 Jun 2026 00:23:47 +0000</pubDate>
      <link>https://dev.to/aloya/why-your-agents-search-results-look-right-and-are-wrong-the-index-distribution-problem-mfo</link>
      <guid>https://dev.to/aloya/why-your-agents-search-results-look-right-and-are-wrong-the-index-distribution-problem-mfo</guid>
      <description>&lt;h1&gt;
  
  
  Why Your Agent's Search Results Look Right and Are Wrong: The Index Distribution Problem
&lt;/h1&gt;

&lt;p&gt;You've built an agent. It has a search tool. You query it with something reasonable — a factual question, a comparison, a technical lookup — and it returns results. The results look right. The sources are real. The snippets are plausible. The agent synthesizes them into a confident answer.&lt;/p&gt;

&lt;p&gt;And the answer is wrong. Not obviously wrong. Not hallucinated-in-a-hallucinatory-way wrong. Structurally wrong — wrong in a way that passes every surface-level check because the error is baked into the retrieval layer before the model ever sees the context.&lt;/p&gt;

&lt;p&gt;This isn't a prompt engineering problem. It isn't a context window problem. It's a &lt;strong&gt;distribution problem&lt;/strong&gt;, and it has a structural ceiling that no amount of better prompting will fix.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Index Is a Frozen Decision
&lt;/h2&gt;

&lt;p&gt;Here's the thing most agent builders don't internalize: a search index is not a neutral representation of knowledge. It's a frozen set of decisions about what matters and what doesn't.&lt;/p&gt;

&lt;p&gt;Every index — whether it's a BM25 inverted index, a dense vector store, or a commercial web search API — encodes a distribution shaped by past relevance judgments. Someone, at some point, decided which documents were "relevant" to which queries. That could be explicit (human raters labeling search results) or implicit (click logs, dwell time, link graphs). Either way, the index now encodes a probability distribution over what the system considers a good answer to a given query.&lt;/p&gt;

&lt;p&gt;That distribution is not semantic truth. It's &lt;strong&gt;past relevance consensus&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Consider what happens when you embed a corpus and build a vector index. Your embedding model was trained on data that reflects certain assumptions about what concepts are close to each other. Your chunking strategy encodes assumptions about what granularity of information is useful. Your ranking model — whether it's cross-encoder reranking or a learned relevance model — was trained on labeled data that reflects someone's judgment about what "relevant" means.&lt;/p&gt;

&lt;p&gt;Every one of those choices freezes a decision. The index doesn't ask "what is true?" It asks "what did people like you click on when they asked something like this?"&lt;/p&gt;

&lt;h2&gt;
  
  
  The Benchmark Trap: Rewarding "Knowing Where to Look"
&lt;/h2&gt;

&lt;p&gt;This is where benchmarks make things worse, not better.&lt;/p&gt;

&lt;p&gt;Standard retrieval benchmarks — BEIR, MTEB, MS MARCO — measure whether your system can retrieve documents that match a pre-labeled relevance judgment. The metric is nDCG, MRR, &lt;a href="mailto:Recall@K"&gt;Recall@K&lt;/a&gt;. The ground truth is a set of human-labeled relevant documents for a fixed set of queries.&lt;/p&gt;

&lt;p&gt;Here's the problem: these benchmarks reward &lt;strong&gt;retrieving the right document&lt;/strong&gt;, not &lt;strong&gt;understanding what's in it&lt;/strong&gt;. An agent that pulls the correct top-5 passages and then misinterprets them gets a perfect retrieval score and a wrong answer. The benchmark never measures the gap between retrieval and reasoning because the benchmark stops at retrieval.&lt;/p&gt;

&lt;p&gt;When you evaluate your agent's search performance, you're likely measuring something close to: "Did the system surface the same documents that human raters previously labeled as relevant?" That's a proxy for correctness, and it's a proxy that breaks precisely when you need it most — on novel queries where no human has ever made that relevance judgment.&lt;/p&gt;

&lt;p&gt;This is why your agent can look great on benchmarks and fail in production. The benchmark is measuring the index's ability to reproduce past decisions. Production is asking the index to handle queries that don't resemble any past decision.&lt;/p&gt;

&lt;h2&gt;
  
  
  Novel Queries: Where the Distribution Cracks
&lt;/h2&gt;

&lt;p&gt;Most agent workloads in production are not "What is the capital of France?" They're combinatorial, multi-hop, and novel. They look like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Compare the error handling strategy in library X version 3.2 with library Y version 2.1's approach to retry logic."&lt;/li&gt;
&lt;li&gt;"What are the tax implications of staking rewards for a non-US resident using protocol Z?"&lt;/li&gt;
&lt;li&gt;"Find evidence that the migration pattern described in paper A is consistent with the data in dataset B."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These queries are novel in a specific, dangerous way: they combine concepts in a pattern the index has never seen a relevance judgment for. The index doesn't have a latent relevance decision for "library X 3.2 error handling vs library Y 2.1 retry logic." What it has is a distribution shaped by queries about library X, queries about library Y, queries about error handling, and queries about retry logic — each of which was judged independently, by different people, at different times, under different assumptions.&lt;/p&gt;

&lt;p&gt;The retrieval system interpolates between those distributions. The interpolation looks reasonable — it returns documents about library X's error handling and documents about library Y's retry logic. But the interpolation is a guess, and it's a guess shaped by the index's prior, not by semantic understanding of the comparison the query is actually asking for.&lt;/p&gt;

&lt;p&gt;Your agent receives these results, and they look right. They're from the right libraries. They mention the right concepts. But they may be the wrong &lt;em&gt;version&lt;/em&gt;, the wrong &lt;em&gt;context&lt;/em&gt;, or the wrong &lt;em&gt;framing&lt;/em&gt; — and the agent has no signal to detect this because the retrieval layer presents everything as ranked relevance.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Structural Ceiling
&lt;/h2&gt;

&lt;p&gt;Here's the uncomfortable part: this isn't fixable by better retrieval. The ceiling is structural.&lt;/p&gt;

&lt;p&gt;The index distribution is a lossy compression of past human relevance judgments. No matter how good your embedding model, your reranker, or your hybrid search pipeline, you're querying a lossy compression of the past. If your query falls in a region of the distribution that was well-covered by past judgments, you get good results. If it falls in a gap — and novel queries almost always do — you get an interpolation that looks reasonable but isn't grounded.&lt;/p&gt;

&lt;p&gt;Adding more documents doesn't help. More data means more past decisions, but it doesn't mean better coverage of the space of possible novel queries. The space of possible queries is combinatorially infinite; the space of past relevance judgments is finite and biased toward common patterns.&lt;/p&gt;

&lt;p&gt;Better embedding models don't help. They improve the smoothness of the interpolation, which makes the results look more plausible, but they don't add ground truth in the gaps. Smoother interpolation of a wrong prior is still wrong.&lt;/p&gt;

&lt;p&gt;More powerful LLMs don't help. The LLM operates on what the retrieval layer gives it. If the retrieval layer returns a plausible-looking but contextually wrong set of documents, the LLM will reason over them correctly and produce a confident, well-structured, wrong answer. The LLM's reasoning ability is downstream of the retrieval bottleneck.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Mitigations
&lt;/h2&gt;

&lt;p&gt;You can't eliminate the structural ceiling, but you can detect when you're approaching it and build guardrails that compensate. Here are four approaches that work, with honest assessments of their limits.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Query Reformulation Consistency Checks
&lt;/h3&gt;

&lt;p&gt;Reformulate the same query multiple ways — different phrasings, different decompositions, different abstraction levels — and retrieve independently for each. Then compare the result sets.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;consistency_check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_variants&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Retrieve with multiple reformulations, measure overlap.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;variants&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_query_variants&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;n_variants&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;result_sets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;variants&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;result_sets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# Compute pairwise Jaccard similarity
&lt;/span&gt;    &lt;span class="n"&gt;overlaps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_sets&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_sets&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
            &lt;span class="n"&gt;union&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result_sets&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;result_sets&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;union&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;overlaps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_sets&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;result_sets&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;union&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;avg_overlap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;overlaps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;overlaps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;overlaps&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;avg_overlap&lt;/span&gt;  &lt;span class="c1"&gt;# Low overlap = the index is unstable for this query
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the top-k results vary significantly across reformulations of the same intent, you're in a region of the index distribution where retrieval is unstable. That's a signal that the query is near a gap, and the agent should treat the retrieved context with lower confidence — or trigger additional verification steps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limit:&lt;/strong&gt; Consistency doesn't guarantee correctness. All reformulations could be wrong in the same way if they share a structural bias. But inconsistency is a strong negative signal — if reformulations disagree, at least one set is wrong.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Source Diversity Probing
&lt;/h3&gt;

&lt;p&gt;Don't just retrieve top-k from a single source. Probe multiple independent indexes — different search backends, different corpora, different retrieval methods (BM25 vs. dense vs. hybrid) — and measure agreement.&lt;/p&gt;

&lt;p&gt;The idea: if the index distribution is the problem, different indexes with different distributions should disagree on novel queries. Agreement across independent indexes is a stronger signal than agreement within a single index's top-k.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;diversity_probe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retrievers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Retrieve from multiple independent sources, measure cross-source agreement.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;source_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retriever&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;retrievers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;source_results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Check: do sources return substantively different content?
&lt;/span&gt;    &lt;span class="n"&gt;all_snippets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;source_results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;all_snippets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;snippet&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# If sources agree on content → higher confidence
&lt;/span&gt;    &lt;span class="c1"&gt;# If sources diverge → the query is hitting different distributional priors
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;analyze_cross_source_agreement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;all_snippets&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is particularly important for agents that use a single search tool. If your agent always queries the same API, it always gets the same distributional bias. Adding even one independent source as a cross-check catches cases where the primary source's index is leading you into a gap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limit:&lt;/strong&gt; Independent indexes aren't truly independent — they're often trained on overlapping data, use similar ranking signals, or share the same underlying web crawl. But they have different relevance judgments and different ranking priors, which makes disagreement informative even if agreement isn't fully conclusive.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Confidence Calibration Independent of Retrieval
&lt;/h3&gt;

&lt;p&gt;The most important mitigation: your agent's confidence in its answer should not be purely a function of retrieval success. A confident retrieval result does not mean a confident answer.&lt;/p&gt;

&lt;p&gt;Recent work on confidence calibration in RAG settings (NAACL Rules, CalibRAG) shows that LLMs are systematically overconfident when given retrieved context, even when that context is noisy or irrelevant. The retrieval layer provides a fluency signal — "I found documents and they look relevant" — that the model conflates with a correctness signal.&lt;/p&gt;

&lt;p&gt;To fix this, implement a confidence layer that operates independently of the retrieval pipeline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Self-consistency sampling:&lt;/strong&gt; Generate multiple answers from the retrieved context (different temperatures, different framings) and measure agreement. Low agreement → lower confidence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Counterfactual probing:&lt;/strong&gt; Ask the agent the same question &lt;em&gt;without&lt;/em&gt; the retrieved context. If the answer changes significantly, the retrieval is doing heavy lifting — which means retrieval quality matters more, and you should be less confident if the consistency check (mitigation #1) flagged instability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explicit uncertainty prompting:&lt;/strong&gt; Force the agent to enumerate what it &lt;em&gt;doesn't&lt;/em&gt; know from the retrieved context. If it can't articulate the gaps, it doesn't understand the limits of what it found.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calibrate_confidence&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retrieved_context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Independent confidence assessment, decoupled from retrieval success.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Self-consistency: multiple generations, measure agreement
&lt;/span&gt;    &lt;span class="n"&gt;answers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retrieved_context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
              &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
    &lt;span class="n"&gt;consistency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;semantic_similarity_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;answers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Counterfactual: answer without context
&lt;/span&gt;    &lt;span class="n"&gt;no_context_answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;context_dependence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nf"&gt;semantic_similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;answers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;no_context_answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Gap analysis: what's missing?
&lt;/span&gt;    &lt;span class="n"&gt;gaps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;identify_gaps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retrieved_context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;base_confidence&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;consistency&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;context_dependence&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gaps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;*=&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;  &lt;span class="c1"&gt;# Many gaps → less confident
&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;consistency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;consistency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;context_dependence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;context_dependence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gaps_identified&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;gaps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Limit:&lt;/strong&gt; Calibration is itself a learned function with its own distributional assumptions. You're trading one uncertainty for another. But calibrated uncertainty — "I'm 60% confident, and here's why" — is strictly more useful than uncalibrated confidence, even if the calibration isn't perfect.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Explicit Gap Detection in Retrieved Results
&lt;/h3&gt;

&lt;p&gt;Train your agent to look for what's &lt;em&gt;missing&lt;/em&gt; from retrieved results, not just what's present. This is a prompting and evaluation strategy, not a retrieval strategy, but it directly addresses the structural problem: the index returns what it has, not what's needed.&lt;/p&gt;

&lt;p&gt;If the query asks for a comparison, the agent should check: did I get results that actually cover both sides of the comparison, or did I get results that cover one side well and the other side poorly? If the query asks for a specific version, did the results actually specify the version, or are they version-agnostic?&lt;/p&gt;

&lt;p&gt;This is the cheapest mitigation and the one most likely to catch the "looks right, is wrong" failure mode, because it forces the agent to verify the retrieval rather than trusting it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Agent Design
&lt;/h2&gt;

&lt;p&gt;If you're building agents with search tools — whether that's a web search API, a RAG pipeline over your own corpus, or a tool-use agent that decides when to search — you need to treat the retrieval layer as a &lt;strong&gt;lossy, biased oracle&lt;/strong&gt;, not as a source of truth.&lt;/p&gt;

&lt;p&gt;The index distribution problem means:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval quality is not answer quality.&lt;/strong&gt; A perfect nDCG score doesn't mean your agent will produce a correct answer. Evaluate end-to-end, not just retrieval.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Novel queries are the failure mode, not the edge case.&lt;/strong&gt; Most real-world agent queries are novel in the distributional sense. Build for the gap, not for the center of the distribution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Confidence must be decoupled from retrieval.&lt;/strong&gt; "I found results" is not the same as "I found the right results." Your agent needs independent signals about whether to trust what it retrieved.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Diversity is a feature, not a cost.&lt;/strong&gt; Multiple sources, multiple reformulations, and multiple retrieval methods aren't redundant — they're your best signal for detecting when the index distribution is misleading you.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;None of this fixes the structural ceiling. The ceiling is real. But understanding it — and building agents that know when they're near it — is the difference between an agent that's wrong confidently and an agent that's uncertain honestly.&lt;/p&gt;

&lt;p&gt;The latter is the one you can trust in production.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models — &lt;a href="https://arxiv.org/abs/2104.08663" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2104.08663&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MTEB: Massive Text Embedding Benchmark — &lt;a href="https://arxiv.org/abs/2210.07316" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2210.07316&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MS MARCO: A Human Generated MAchine Reading COmprehension Dataset — &lt;a href="https://arxiv.org/abs/1611.09268" rel="noopener noreferrer"&gt;https://arxiv.org/abs/1611.09268&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Query Rewriting in Retrieval-Augmented Large Language Models — &lt;a href="https://arxiv.org/abs/2310.05029" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2310.05029&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;NAACL Rules: Noise-Aware Verbal Confidence Calibration for LLMs in RAG Systems — &lt;a href="https://arxiv.org/abs/2601.11004" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2601.11004&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;CalibRAG: Calibrated Decision-Making through LLM-Assisted Retrieval — &lt;a href="https://openreview.net/forum?id=nNQmZGjEVe" rel="noopener noreferrer"&gt;https://openreview.net/forum?id=nNQmZGjEVe&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Agentic Confidence Calibration — &lt;a href="https://arxiv.org/abs/2601.15778" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2601.15778&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Bias Detection and Mitigation in RAG Systems — &lt;a href="https://articles.chatnexus.io/knowledge-base/bias-detection-and-mitigation-in-rag-systems" rel="noopener noreferrer"&gt;https://articles.chatnexus.io/knowledge-base/bias-detection-and-mitigation-in-rag-systems&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases — &lt;a href="https://arxiv.org/abs/2404.13207" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2404.13207&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Project home: &lt;a href="https://scouts-ai.com" rel="noopener noreferrer"&gt;https://scouts-ai.com&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>search</category>
      <category>agents</category>
    </item>
    <item>
      <title>"A no-key web search API for AI agents, and the MCP server that wraps it"</title>
      <dc:creator>Aloya</dc:creator>
      <pubDate>Tue, 09 Jun 2026 16:19:39 +0000</pubDate>
      <link>https://dev.to/aloya/a-no-key-web-search-api-for-ai-agents-and-the-mcp-server-that-wraps-it-54f8</link>
      <guid>https://dev.to/aloya/a-no-key-web-search-api-for-ai-agents-and-the-mcp-server-that-wraps-it-54f8</guid>
      <description>&lt;p&gt;I have been building tooling for AI agents in Python for about a year. The thing I keep needing, over and over, is "give the agent a search bar." Every time, the search bar costs me an account, an API key, a billing relationship, and a way to keep that key out of the repo. The first three are friction; the fourth is risk.&lt;/p&gt;

&lt;p&gt;A few weeks ago I came across a public endpoint that does not have any of those: &lt;code&gt;GET https://scouts-ai.com/api/search&lt;/code&gt;. No header for auth, no signup, no rate-limit-agreement landing page. I tried it from a shell, it returned a clean JSON response with title, url, content, engine, tookMs and a per-result &lt;code&gt;publishedAt&lt;/code&gt; field. I have been using it as a research scratchpad ever since. This post is the field guide I wish I had on day one: what the response actually looks like, what the rate limits are (they are real, and they are not in the README), what the &lt;code&gt;lang&lt;/code&gt; parameter actually does (it does not do what you think), and a 100-line MCP server you can install in 30 seconds that exposes the same thing as a single &lt;code&gt;web_search&lt;/code&gt; tool to Claude Desktop, Cursor, Cline, Open WebUI, or any other MCP host.&lt;/p&gt;

&lt;h2&gt;
  
  
  The endpoint
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;GET https://scouts-ai.com/api/search
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three query parameters do anything:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;q&lt;/code&gt; — the query, 1 to 512 characters&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;lang&lt;/code&gt; — BCP-47 code, default &lt;code&gt;en&lt;/code&gt;. Hint, not a filter (see Gotcha #1)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;limit&lt;/code&gt; — default 10, max 50&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No &lt;code&gt;Authorization&lt;/code&gt; header. No &lt;code&gt;X-API-Key&lt;/code&gt;. The endpoint resolves the client from the IP.&lt;/p&gt;

&lt;p&gt;A real response from a few minutes ago, against the wire:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-sS&lt;/span&gt; &lt;span class="s2"&gt;"https://scouts-ai.com/api/search?q=python+asyncio+tutorial&amp;amp;limit=3"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"python asyncio tutorial"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"lang"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"en"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"page"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"pageSize"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"cached"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tookMs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;970&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"results"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Python's asyncio: A Hands-On Walkthrough – Real Python"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://realpython.com/async-io-python/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Jul 30, 2025 · Python's asyncio library enables you to write concurrent code using the async and await keywords…"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"publishedAt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"engine"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bing"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"asyncio — Asynchronous I/O — Python 3.14.5 documentation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://docs.python.org/3/library/asyncio.html"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"asyncio is a library to write concurrent code using the async/await syntax…"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"publishedAt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"engine"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bing"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"asyncio in Python - GeeksforGeeks"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://www.geeksforgeeks.org/python/asyncio-in-python/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Jul 23, 2025 · Asyncio is used as a foundation for multiple Python asynchronous frameworks…"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"publishedAt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"engine"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bing"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The shape is small and stable. The wrapper object (&lt;code&gt;query&lt;/code&gt;, &lt;code&gt;lang&lt;/code&gt;, &lt;code&gt;page&lt;/code&gt;, &lt;code&gt;pageSize&lt;/code&gt;, &lt;code&gt;cached&lt;/code&gt;, &lt;code&gt;tookMs&lt;/code&gt;, &lt;code&gt;results&lt;/code&gt;) is on every response. Each result carries &lt;code&gt;title&lt;/code&gt;, &lt;code&gt;url&lt;/code&gt;, &lt;code&gt;content&lt;/code&gt;, &lt;code&gt;publishedAt&lt;/code&gt;, &lt;code&gt;engine&lt;/code&gt;. That's the contract. If you build against it, you do not need to scrape anything.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rate limits, observed
&lt;/h2&gt;

&lt;p&gt;I hammered the endpoint a few times. Here is what the response headers actually say:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;x-ratelimit-limit: 60
x-ratelimit-remaining: 58
cache-control: max-age=3600, private
x-cache: MISS
x-cache-ttl: 3600
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;60 requests per minute, per IP.&lt;/strong&gt; That is enough for one agent, a small team, or a notebook. It is not enough for a horizontally-scaled scraper.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The endpoint caches for an hour.&lt;/strong&gt; &lt;code&gt;cache-control: max-age=3600, private&lt;/code&gt;. Repeat the same query inside the window and you get &lt;code&gt;cached: true&lt;/code&gt; and a &lt;code&gt;tookMs&lt;/code&gt; closer to 200 than to 1000. This is great for an agent that repeats questions; it is a footgun for an evaluation harness that wants to measure cold-path latency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The cache is &lt;code&gt;private&lt;/code&gt;.&lt;/strong&gt; No shared CDN copy. Two different IPs each get a fresh miss. This is the right design for a per-user agent and the wrong design for a fleet.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you outgrow any of these — sustained &amp;gt; 60/min, need a status page, need a contractual SLA — the honest answer is to pay for Brave, Tavily, or Exa. They are all good. The point of this endpoint is the case where "do I really need a vendor here?" can be answered no with a single curl.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Python package
&lt;/h2&gt;

&lt;p&gt;There is a thin wrapper on PyPI, &lt;code&gt;scouts-ai-mcp&lt;/code&gt;, version 0.1.4 at the time of writing. It is MIT-licensed, depends on &lt;code&gt;fastmcp&lt;/code&gt; v2 and &lt;code&gt;httpx&lt;/code&gt;, and requires Python 3.10 or newer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;scouts-ai-mcp
scouts-ai-mcp                 &lt;span class="c"&gt;# stdio, default&lt;/span&gt;
scouts-ai-mcp &lt;span class="nt"&gt;--transport&lt;/span&gt; http &lt;span class="nt"&gt;--host&lt;/span&gt; 127.0.0.1 &lt;span class="nt"&gt;--port&lt;/span&gt; 8765
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The package exposes a single MCP tool, &lt;code&gt;web_search&lt;/code&gt;, with the signature:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;web_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lang&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The dict shape is exactly the &lt;code&gt;results&lt;/code&gt; array from the raw HTTP response.&lt;/p&gt;

&lt;h3&gt;
  
  
  Wire it into Claude Desktop
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;claude_desktop_config.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"scouts-ai"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"scouts-ai-mcp"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restart Claude Desktop. The &lt;code&gt;web_search&lt;/code&gt; tool appears in the tool picker.&lt;/p&gt;

&lt;h3&gt;
  
  
  Wire it into Cursor
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;Settings → MCP → Add new global MCP server&lt;/code&gt;. Same shape.&lt;/p&gt;

&lt;h3&gt;
  
  
  Wire it into any MCP host
&lt;/h3&gt;

&lt;p&gt;The server speaks stdio (default) or HTTP (&lt;code&gt;--transport http&lt;/code&gt;). Both transports are vanilla MCP. Anything that conforms to the spec works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Calling the endpoint directly
&lt;/h2&gt;

&lt;p&gt;You do not need the MCP server. If you are building an agent loop in Python and you want to fetch results inline, &lt;code&gt;httpx&lt;/code&gt; is enough:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;web_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://scouts-ai.com/api/search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;q&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want the wrapper object too (so you can read &lt;code&gt;tookMs&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://scouts-ai.com/api/search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;q&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; hits in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tookMs&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;ms (cached=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cached&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are doing this from a shell, &lt;code&gt;curl&lt;/code&gt; is fine. If you are doing it from a TypeScript agent, &lt;code&gt;fetch&lt;/code&gt; is fine. If you are doing it from a Go binary, &lt;code&gt;net/http&lt;/code&gt; is fine. There is nothing to install to use the endpoint; the package is only useful if you specifically need an MCP host.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gotchas I hit (in order of how annoyed I was)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. &lt;code&gt;lang&lt;/code&gt; is a hint, not a filter.&lt;/strong&gt; I tested &lt;code&gt;lang=ru&lt;/code&gt; against a Russian query. The results came back in English, with what looked like Russian tokenization applied to the query. If you need language-specific results, translate the query client-side and use &lt;code&gt;lang=en&lt;/code&gt;, or post-filter on the &lt;code&gt;url&lt;/code&gt; or &lt;code&gt;title&lt;/code&gt; field. The README is honest about this; the parameter name is not.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The cache is your friend, then your enemy.&lt;/strong&gt; Re-running the same query in a 60-minute window returns &lt;code&gt;cached: true&lt;/code&gt; with a &lt;code&gt;tookMs&lt;/code&gt; 5x faster. For an agent, this is exactly what you want. For a benchmark, it means you are measuring the warm path, not the cold one. Either bust the cache (different IP, different parameter order) or accept it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The freshest results are not always the freshest.&lt;/strong&gt; The index is periodic. Queries with strong time intent ("today's news", "this week") can return results that are days or weeks old. The &lt;code&gt;engine&lt;/code&gt; field is included in the response precisely so you can decide what to do with that fact. The ranking is the upstream's, not the API's.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. &lt;code&gt;POST&lt;/code&gt; returns 405.&lt;/strong&gt; The endpoint is &lt;code&gt;GET&lt;/code&gt; only. If your code path defaults to &lt;code&gt;POST&lt;/code&gt; (some proxies, some older HTTP clients), you will get a method-not-allowed error and a one-second wait. Always use &lt;code&gt;GET&lt;/code&gt; with query parameters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. No SLA, no status page, no support tier.&lt;/strong&gt; This is a free public endpoint. Treat it accordingly. If you are putting it in front of paying users, build a fallback (a cached local index, a paid search provider) so a 503 on the upstream does not take down your agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. The MCP server's tool surface is intentionally small.&lt;/strong&gt; The tool is &lt;code&gt;web_search(query, lang, limit)&lt;/code&gt;. There is no &lt;code&gt;recency_days&lt;/code&gt;, no &lt;code&gt;site:&lt;/code&gt;, no boolean operators, no &lt;code&gt;filetype:&lt;/code&gt;. That is the design. If you need a richer query language, you are looking for a different product.&lt;/p&gt;

&lt;h2&gt;
  
  
  When I would use it, and when I would not
&lt;/h2&gt;

&lt;p&gt;Use it when you are building a personal agent, an internal demo, a hackathon project, or a low-traffic production service that needs a working search bar without the procurement. Use it when you do not want to manage API keys, billing, or rate-limit agreements. Use it when you can live with 60 req/min, a 1-hour cache, and ~1s cold-path latency.&lt;/p&gt;

&lt;p&gt;Do not use it when you need an SLA, when you are running a horizontally scaled fleet, when you need time-bounded or boolean search, when you need a custom user-agent, or when you have a data-residency requirement (the upstream is Bing; check their terms).&lt;/p&gt;

&lt;p&gt;For everything in the first list, it is the simplest thing that works. For everything in the second list, pay for something else. Both are fine; the gap between them is just bigger than the marketing pages suggest.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I would build on top of it
&lt;/h2&gt;

&lt;p&gt;A few directions, in increasing order of ambition:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A drop-in &lt;code&gt;WebSearch&lt;/code&gt; provider for LangChain and LlamaIndex. Both have an abstract interface; a 30-line implementation against this endpoint would let an existing RAG pipeline swap providers in a config file.&lt;/li&gt;
&lt;li&gt;A citation post-processor. The &lt;code&gt;engine&lt;/code&gt; field is there. A small helper that takes a list of search results plus an LLM answer, and re-renders the answer with inline numbered citations and a "Sources" footer, would be a useful standalone utility.&lt;/li&gt;
&lt;li&gt;An offline corpus mode. Hit the endpoint once with a query, persist the JSON to disk under a hash of the query string, serve subsequent requests from disk. Free, deterministic, perfect for tests and CI.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The MIT license on the package means you can build any of these and ship them under whatever license you want.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Endpoint: &lt;code&gt;https://scouts-ai.com/api/search&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;MCP package on PyPI: &lt;code&gt;https://pypi.org/project/scouts-ai-mcp/&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Package source: &lt;code&gt;https://github.com/kecven/scouts-ai-mcp&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Project home: &lt;code&gt;https://scouts-ai.com&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;llms.txt&lt;/code&gt;: &lt;code&gt;https://scouts-ai.com/llms.txt&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>opensource</category>
      <category>python</category>
    </item>
    <item>
      <title>probe</title>
      <dc:creator>Aloya</dc:creator>
      <pubDate>Tue, 09 Jun 2026 15:58:42 +0000</pubDate>
      <link>https://dev.to/aloya/probe-5cj8</link>
      <guid>https://dev.to/aloya/probe-5cj8</guid>
      <description>&lt;p&gt;probe&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
