clearloop for OpenWalrus

Posted on Mar 15 • Originally published at openwalrus.xyz

Built-in web search: no API keys, no setup

#ai #release #opensource #openwalrus

Agents that can't search the web are half-blind. Most frameworks solve
this with API keys — SerpAPI, Tavily, Google Custom Search. You sign up,
paste a key, configure a tool, hope the rate limits hold. We built it
into the binary instead.

Starting today, every OpenWalrus agent has two new built-in tools:
web_search and web_fetch. No API keys. No configuration. No
third-party accounts.

Why not just use an API?

Search APIs work, but they come with baggage:

Credentials to manage. One more API key per deployment, one more secret to rotate, one more service to monitor for billing surprises.
Rate limits. Free tiers cap at 100–1,000 queries/day. Autonomous agents burn through that fast.
Cost. SerpAPI runs $50–250/mo for serious usage. Tavily charges per query. These add up alongside LLM inference costs.
Privacy. Every query goes to a third-party logging service. For a local-first runtime, routing agent searches through a cloud proxy defeats the point.

We wanted something simpler: search that works the moment you install
walrus, with zero setup and zero ongoing cost.

The meta search approach

Instead of depending on a single search provider, we built a meta search
engine — walrus-search — that queries multiple free backends in
parallel, merges the results, and ranks by consensus.

Diagram — see original post

The current backends are DuckDuckGo (via the Lite HTML endpoint) and
Wikipedia (via the OpenSearch API). Neither requires authentication.
Both are queried in parallel using tokio::task::JoinSet, so latency is
bounded by the slowest engine, not the sum.

Results are deduplicated by normalized URL — stripping trailing slashes,
www. prefixes, and tracking parameters (utm_*, fbclid, gclid).
When the same URL appears from multiple engines, the descriptions are
merged (longer one wins) and the result gets a consensus score boost.

Consensus ranking

Ranking is simple and deterministic:

Position score: 1.0 / (position + 1) — earlier results from each engine score higher.
Consensus bonus: 0.5 * (engine_count - 1) — each additional engine that returns the same URL adds 0.5 to the score.

A result that DuckDuckGo ranks #1 and Wikipedia also returns will
outscore a result that only one engine knows about. No ML model, no
relevance tuning, no training data. Just arithmetic that rewards
agreement across independent sources.

Page fetching

web_fetch downloads a URL and extracts clean text content. It strips
<script>, <style>, <nav>, <footer>, <header>, <aside>,
<noscript>, <svg>, <iframe>, and <form> subtrees entirely,
then walks the remaining DOM and collects text nodes with proper spacing.

The fetcher rotates through 8 realistic browser user-agent strings
(Chrome, Firefox, Safari, Edge across Windows/Mac/Linux) to avoid bot
detection. No headless browser, no Playwright dependency — just HTTP
requests and HTML parsing.

Zero configuration for agents

Both tools are registered in BASE_TOOLS, which means every agent gets
them automatically — even agents with scoped tool whitelists. The
aggregator and fetch client initialize at daemon startup with sensible
defaults:

Setting	Default
Engines	DuckDuckGo, Wikipedia
Timeout	10 seconds per engine
Max results	20
Cache TTL	5 minutes

An in-memory cache prevents redundant queries. Same search within the
TTL window returns instantly.

This follows the same design principle
behind the rest of OpenWalrus: batteries included, nothing to configure
for the common case.

The standalone CLI

The search engine also ships as a standalone binary — wsearch — for
human use and debugging:

# Search
wsearch search "rust async runtime"
wsearch search "openwalrus" --engines wikipedia
wsearch search "hello world" -n 5 --format text

# Fetch a page
wsearch fetch "https://example.com"

# List available engines
wsearch engines

Configuration lives in ~/.config/wsearch/config.toml:

engines = ["duckduckgo", "wikipedia"]
timeout_secs = 10
max_results = 20
cache_ttl_secs = 300
output_format = "json"

How agents use it

The tools work like any other built-in. An agent searching for
information simply calls web_search, reviews the results, then
optionally calls web_fetch on the most relevant URLs to read full
page content:

{
  "name": "web_search",
  "parameters": {
    "query": "rust error handling best practices",
    "max_results": 5
  }
}

{
  "name": "web_fetch",
  "parameters": {
    "url": "https://doc.rust-lang.org/book/ch09-00-error-handling.html"
  }
}

No tool registration, no MCP server, no plugin. It just works.

In sandbox mode, agents still have
network access for search — the OS-level isolation restricts filesystem
access, not outbound HTTP.

What's next

The meta search architecture is designed to grow. We're looking at:

More engines — Brave Search, SearXNG instances, and domain-specific backends for code (GitHub, docs sites).
Per-engine weighting — let users boost or suppress specific engines based on query type.
Search-to-memory — automatically caching search results in the agent's graph memory so repeated research doesn't re-fetch the same pages.

The full tool reference is in the built-in tools docs.

Originally published at OpenWalrus.

DEV Community