Aloya

Posted on Jun 9

"A no-key web search API for AI agents, and the MCP server that wraps it"

#mcp #ai #python #opensource

I have been building tooling for AI agents in Python for about a year. The thing I keep needing, over and over, is "give the agent a search bar." Every time, the search bar costs me an account, an API key, a billing relationship, and a way to keep that key out of the repo. The first three are friction; the fourth is risk.

A few weeks ago I came across a public endpoint that does not have any of those: GET https://scouts-ai.com/api/search. No header for auth, no signup, no rate-limit-agreement landing page. I tried it from a shell, it returned a clean JSON response with title, url, content, engine, tookMs and a per-result publishedAt field. I have been using it as a research scratchpad ever since. This post is the field guide I wish I had on day one: what the response actually looks like, what the rate limits are (they are real, and they are not in the README), what the lang parameter actually does (it does not do what you think), and a 100-line MCP server you can install in 30 seconds that exposes the same thing as a single web_search tool to Claude Desktop, Cursor, Cline, Open WebUI, or any other MCP host.

The endpoint

GET https://scouts-ai.com/api/search

Three query parameters do anything:

q — the query, 1 to 512 characters
lang — BCP-47 code, default en. Hint, not a filter (see Gotcha #1)
limit — default 10, max 50

No Authorization header. No X-API-Key. The endpoint resolves the client from the IP.

A real response from a few minutes ago, against the wire:

curl -sS "https://scouts-ai.com/api/search?q=python+asyncio+tutorial&limit=3"

{
  "query": "python asyncio tutorial",
  "lang": "en",
  "page": 1,
  "pageSize": 3,
  "cached": false,
  "tookMs": 970,
  "results": [
    {
      "title": "Python's asyncio: A Hands-On Walkthrough – Real Python",
      "url": "https://realpython.com/async-io-python/",
      "content": "Jul 30, 2025 · Python's asyncio library enables you to write concurrent code using the async and await keywords…",
      "publishedAt": null,
      "engine": "bing"
    },
    {
      "title": "asyncio — Asynchronous I/O — Python 3.14.5 documentation",
      "url": "https://docs.python.org/3/library/asyncio.html",
      "content": "asyncio is a library to write concurrent code using the async/await syntax…",
      "publishedAt": null,
      "engine": "bing"
    },
    {
      "title": "asyncio in Python - GeeksforGeeks",
      "url": "https://www.geeksforgeeks.org/python/asyncio-in-python/",
      "content": "Jul 23, 2025 · Asyncio is used as a foundation for multiple Python asynchronous frameworks…",
      "publishedAt": null,
      "engine": "bing"
    }
  ]
}

The shape is small and stable. The wrapper object (query, lang, page, pageSize, cached, tookMs, results) is on every response. Each result carries title, url, content, publishedAt, engine. That's the contract. If you build against it, you do not need to scrape anything.

Rate limits, observed

I hammered the endpoint a few times. Here is what the response headers actually say:

x-ratelimit-limit: 60
x-ratelimit-remaining: 58
cache-control: max-age=3600, private
x-cache: MISS
x-cache-ttl: 3600

60 requests per minute, per IP. That is enough for one agent, a small team, or a notebook. It is not enough for a horizontally-scaled scraper.
The endpoint caches for an hour. cache-control: max-age=3600, private. Repeat the same query inside the window and you get cached: true and a tookMs closer to 200 than to 1000. This is great for an agent that repeats questions; it is a footgun for an evaluation harness that wants to measure cold-path latency.
The cache is private. No shared CDN copy. Two different IPs each get a fresh miss. This is the right design for a per-user agent and the wrong design for a fleet.

If you outgrow any of these — sustained > 60/min, need a status page, need a contractual SLA — the honest answer is to pay for Brave, Tavily, or Exa. They are all good. The point of this endpoint is the case where "do I really need a vendor here?" can be answered no with a single curl.

The Python package

There is a thin wrapper on PyPI, scouts-ai-mcp, version 0.1.4 at the time of writing. It is MIT-licensed, depends on fastmcp v2 and httpx, and requires Python 3.10 or newer.

pip install scouts-ai-mcp
scouts-ai-mcp                 # stdio, default
scouts-ai-mcp --transport http --host 127.0.0.1 --port 8765

The package exposes a single MCP tool, web_search, with the signature:

web_search(query: str, lang: str = "en", limit: int = 10) -> list[dict]

The dict shape is exactly the results array from the raw HTTP response.

Wire it into Claude Desktop

claude_desktop_config.json:

{
  "mcpServers": {
    "scouts-ai": {
      "command": "scouts-ai-mcp"
    }
  }
}

Restart Claude Desktop. The web_search tool appears in the tool picker.

Wire it into Cursor

Settings → MCP → Add new global MCP server. Same shape.

Wire it into any MCP host

The server speaks stdio (default) or HTTP (--transport http). Both transports are vanilla MCP. Anything that conforms to the spec works.

Calling the endpoint directly

You do not need the MCP server. If you are building an agent loop in Python and you want to fetch results inline, httpx is enough:

import httpx

def web_search(query: str, limit: int = 5) -> list[dict]:
    r = httpx.get(
        "https://scouts-ai.com/api/search",
        params={"q": query, "limit": limit},
        timeout=10,
    )
    r.raise_for_status()
    return r.json()["results"]

If you want the wrapper object too (so you can read tookMs):

data = httpx.get(
    "https://scouts-ai.com/api/search",
    params={"q": query, "limit": 5},
    timeout=10,
).json()
print(f"{len(data['results'])} hits in {data['tookMs']}ms (cached={data['cached']})")

If you are doing this from a shell, curl is fine. If you are doing it from a TypeScript agent, fetch is fine. If you are doing it from a Go binary, net/http is fine. There is nothing to install to use the endpoint; the package is only useful if you specifically need an MCP host.

Gotchas I hit (in order of how annoyed I was)

1. lang is a hint, not a filter. I tested lang=ru against a Russian query. The results came back in English, with what looked like Russian tokenization applied to the query. If you need language-specific results, translate the query client-side and use lang=en, or post-filter on the url or title field. The README is honest about this; the parameter name is not.

2. The cache is your friend, then your enemy. Re-running the same query in a 60-minute window returns cached: true with a tookMs 5x faster. For an agent, this is exactly what you want. For a benchmark, it means you are measuring the warm path, not the cold one. Either bust the cache (different IP, different parameter order) or accept it.

3. The freshest results are not always the freshest. The index is periodic. Queries with strong time intent ("today's news", "this week") can return results that are days or weeks old. The engine field is included in the response precisely so you can decide what to do with that fact. The ranking is the upstream's, not the API's.

4. POST returns 405. The endpoint is GET only. If your code path defaults to POST (some proxies, some older HTTP clients), you will get a method-not-allowed error and a one-second wait. Always use GET with query parameters.

5. No SLA, no status page, no support tier. This is a free public endpoint. Treat it accordingly. If you are putting it in front of paying users, build a fallback (a cached local index, a paid search provider) so a 503 on the upstream does not take down your agent.

6. The MCP server's tool surface is intentionally small. The tool is web_search(query, lang, limit). There is no recency_days, no site:, no boolean operators, no filetype:. That is the design. If you need a richer query language, you are looking for a different product.

When I would use it, and when I would not

Use it when you are building a personal agent, an internal demo, a hackathon project, or a low-traffic production service that needs a working search bar without the procurement. Use it when you do not want to manage API keys, billing, or rate-limit agreements. Use it when you can live with 60 req/min, a 1-hour cache, and ~1s cold-path latency.

Do not use it when you need an SLA, when you are running a horizontally scaled fleet, when you need time-bounded or boolean search, when you need a custom user-agent, or when you have a data-residency requirement (the upstream is Bing; check their terms).

For everything in the first list, it is the simplest thing that works. For everything in the second list, pay for something else. Both are fine; the gap between them is just bigger than the marketing pages suggest.

What I would build on top of it

A few directions, in increasing order of ambition:

A drop-in WebSearch provider for LangChain and LlamaIndex. Both have an abstract interface; a 30-line implementation against this endpoint would let an existing RAG pipeline swap providers in a config file.
A citation post-processor. The engine field is there. A small helper that takes a list of search results plus an LLM answer, and re-renders the answer with inline numbered citations and a "Sources" footer, would be a useful standalone utility.
An offline corpus mode. Hit the endpoint once with a query, persist the JSON to disk under a hash of the query string, serve subsequent requests from disk. Free, deterministic, perfect for tests and CI.

The MIT license on the package means you can build any of these and ship them under whatever license you want.

References

Endpoint: https://scouts-ai.com/api/search
MCP package on PyPI: https://pypi.org/project/scouts-ai-mcp/
Package source: https://github.com/kecven/scouts-ai-mcp
Project home: https://scouts-ai.com
llms.txt: https://scouts-ai.com/llms.txt

DEV Community