Agents that can't search the web are half-blind. Most frameworks solve
this with API keys — SerpAPI, Tavily, Google Custom Search. You sign up,
paste a key, configure a tool, hope the rate limits hold. We built it
into the binary instead.
Starting today, every OpenWalrus agent has two new built-in tools:
web_search and web_fetch. No API keys. No configuration. No
third-party accounts.
Why not just use an API?
Search APIs work, but they come with baggage:
- Credentials to manage. One more API key per deployment, one more secret to rotate, one more service to monitor for billing surprises.
- Rate limits. Free tiers cap at 100–1,000 queries/day. Autonomous agents burn through that fast.
- Cost. SerpAPI runs $50–250/mo for serious usage. Tavily charges per query. These add up alongside LLM inference costs.
- Privacy. Every query goes to a third-party logging service. For a local-first runtime, routing agent searches through a cloud proxy defeats the point.
We wanted something simpler: search that works the moment you install
walrus, with zero setup and zero ongoing cost.
The meta search approach
Instead of depending on a single search provider, we built a meta search
engine — walrus-search — that queries multiple free backends in
parallel, merges the results, and ranks by consensus.
The current backends are DuckDuckGo (via the Lite HTML endpoint) and
Wikipedia (via the OpenSearch API). Neither requires authentication.
Both are queried in parallel using tokio::task::JoinSet, so latency is
bounded by the slowest engine, not the sum.
Results are deduplicated by normalized URL — stripping trailing slashes,
www. prefixes, and tracking parameters (utm_*, fbclid, gclid).
When the same URL appears from multiple engines, the descriptions are
merged (longer one wins) and the result gets a consensus score boost.
Consensus ranking
Ranking is simple and deterministic:
-
Position score:
1.0 / (position + 1)— earlier results from each engine score higher. -
Consensus bonus:
0.5 * (engine_count - 1)— each additional engine that returns the same URL adds 0.5 to the score.
A result that DuckDuckGo ranks #1 and Wikipedia also returns will
outscore a result that only one engine knows about. No ML model, no
relevance tuning, no training data. Just arithmetic that rewards
agreement across independent sources.
Page fetching
web_fetch downloads a URL and extracts clean text content. It strips
<script>, <style>, <nav>, <footer>, <header>, <aside>,
<noscript>, <svg>, <iframe>, and <form> subtrees entirely,
then walks the remaining DOM and collects text nodes with proper spacing.
The fetcher rotates through 8 realistic browser user-agent strings
(Chrome, Firefox, Safari, Edge across Windows/Mac/Linux) to avoid bot
detection. No headless browser, no Playwright dependency — just HTTP
requests and HTML parsing.
Zero configuration for agents
Both tools are registered in BASE_TOOLS, which means every agent gets
them automatically — even agents with scoped tool whitelists. The
aggregator and fetch client initialize at daemon startup with sensible
defaults:
| Setting | Default |
|---|---|
| Engines | DuckDuckGo, Wikipedia |
| Timeout | 10 seconds per engine |
| Max results | 20 |
| Cache TTL | 5 minutes |
An in-memory cache prevents redundant queries. Same search within the
TTL window returns instantly.
This follows the same design principle
behind the rest of OpenWalrus: batteries included, nothing to configure
for the common case.
The standalone CLI
The search engine also ships as a standalone binary — wsearch — for
human use and debugging:
# Search
wsearch search "rust async runtime"
wsearch search "openwalrus" --engines wikipedia
wsearch search "hello world" -n 5 --format text
# Fetch a page
wsearch fetch "https://example.com"
# List available engines
wsearch engines
Configuration lives in ~/.config/wsearch/config.toml:
engines = ["duckduckgo", "wikipedia"]
timeout_secs = 10
max_results = 20
cache_ttl_secs = 300
output_format = "json"
How agents use it
The tools work like any other built-in. An agent searching for
information simply calls web_search, reviews the results, then
optionally calls web_fetch on the most relevant URLs to read full
page content:
{
"name": "web_search",
"parameters": {
"query": "rust error handling best practices",
"max_results": 5
}
}
{
"name": "web_fetch",
"parameters": {
"url": "https://doc.rust-lang.org/book/ch09-00-error-handling.html"
}
}
No tool registration, no MCP server, no plugin. It just works.
In sandbox mode, agents still have
network access for search — the OS-level isolation restricts filesystem
access, not outbound HTTP.
What's next
The meta search architecture is designed to grow. We're looking at:
- More engines — Brave Search, SearXNG instances, and domain-specific backends for code (GitHub, docs sites).
- Per-engine weighting — let users boost or suppress specific engines based on query type.
- Search-to-memory — automatically caching search results in the agent's graph memory so repeated research doesn't re-fetch the same pages.
The full tool reference is in the built-in tools docs.
Originally published at OpenWalrus.
Top comments (0)