Ted Murray

Posted on Apr 16

I Didn't Want to Pay for Web Search in My Own Homelab, So I Built the Pipeline

#ai #homelab #selfhosted #tutorial

I was setting up web search for LibreChat — a self-hosted chat interface for AI models. The config has three required components:

Search provider: Serper (paid API) or SearXNG (self-hosted)
Scraper: Firecrawl — hosted API, costs money
Reranker: Jina AI or Cohere — both paid APIs (a reranker re-scores search results by relevance to the query, rather than trusting whatever order the search engine returned)

I run a homelab specifically so I don't depend on paid APIs for things I can own. The search provider was easy — SearXNG is self-hosted and free. But the scraper and reranker had no obvious self-hosted path.

I asked Claude if there was a free way to run Firecrawl. It found firecrawl-simple, a lightweight local deployment of the same tool. Perfect.

For the reranker, I asked Claude to explain what Jina and Cohere were actually doing. When I said I didn't want to call another external API, it offered to just build one — a small FlashRank wrapper exposing a Jina-compatible /v1/rerank endpoint. That became the reranker that's been running in my stack ever since.

That was the seed. What I have now is searxng-mcp — a full private web search pipeline packaged as an MCP server. MCP (Model Context Protocol) is how AI clients like Claude Code connect to external tools; the server exposes web search as a set of callable tools that agents can use during a session. It's used by Claude Code agents and LibreChat every day.

What it does

searxng-mcp exposes five tools over stdio MCP:

Tool	What it does
`search`	Query SearXNG, rerank results with a local ML model, return top N
`search_and_fetch`	Same as search, then fetch full page content for the top 1–3 results
`search_and_summarize`	Search, fetch, then synthesize a structured summary via Ollama
`fetch_url`	Fetch and extract readable markdown from any URL
`clear_cache`	Purge the search or fetch cache when you need fresh results

The design principle throughout: every external component is optional, and the server degrades gracefully when any of them are unavailable. If the reranker is down, you get results in SearXNG's native order. If Ollama isn't running, search_and_summarize falls back to raw fetched content. Nothing hard-fails.

The architecture

MCP client (stdio)
      │
      ▼
  searxng-mcp ──────────────→ Valkey                → result cache (search 1h, fetch 24h)
      │
      ├── expand (optional) → Ollama (qwen3:4b)      → rewritten query
      ├── search ───────────→ SearXNG               → raw results
      ├── rerank ───────────→ Reranker              → ranked results
      │                       (fallback: SearXNG order)
      ├── fetch content ────┬→ GitHub API            → markdown
      │                     ├→ Firecrawl            → page markdown (tier 1)
      │                     ├→ Crawl4AI             → page markdown (tier 2)
      │                     └→ Raw HTTP             → page text (tier 3)
      └── summarize (opt.) → Ollama (qwen3:14b)     → synthesized summary

The interesting part is the fetch cascade.

The fetch cascade

Fetching web content for an AI agent turns out to be harder than it sounds. Firecrawl handles the majority of pages well — it renders JavaScript, extracts clean markdown, deals with most anti-bot measures. But some pages block it anyway. When that happens, Firecrawl returns success: true with empty content rather than throwing an error. That's a soft failure, not a hard one, and it took me a while to catch it.

The cascade handles this: if Firecrawl returns empty content, fall through to Crawl4AI, which uses a different extraction approach and handles JS-heavy pages differently. If Crawl4AI also fails or isn't configured, fall through to raw HTTP — just fetch the page and strip the HTML. Not perfect, but something.

Three tiers, each cheaper than the last, each a fallback for the one above it.

flowchart TD
    A([URL]) --> B{github.com?}
    B -->|yes, repo root| C[GitHub API\nfetch README]
    B -->|yes, file blob| D[raw.githubusercontent.com]
    B -->|no| E[Firecrawl]
    E --> F{content\nreturned?}
    F -->|yes| G([page markdown])
    F -->|empty or blocked| H{Crawl4AI\nconfigured?}
    H -->|yes| I[Crawl4AI]
    I --> J{content\nreturned?}
    J -->|yes| K([page markdown])
    J -->|no| L[Raw HTTP fetch\nno redirects]
    H -->|no| L
    C --> M([page markdown])
    D --> N([raw file content])
    L --> O([page text])

GitHub URLs are handled natively outside the cascade entirely — repo roots fetch the README via the GitHub API, file blobs fetch from raw.githubusercontent.com. No Firecrawl needed.

Reranking and why it matters

SearXNG aggregates results from multiple search engines. The order it returns them in is... whatever order the upstream engines agreed on. That's fine for casual browsing, not great for an AI agent that's going to fetch and read the top result.

The key mechanism: for every search, the server fetches 3x the requested results from SearXNG (capped at 20) to give the reranker a larger candidate pool. The reranker then re-scores all of them using a cross-encoder ML model that understands the relationship between the query and each result, and returns only the top N. A result that matches your query semantically surfaces above a result that just happens to rank well with Google — but only because it had more candidates to sort through in the first place.

In v3.2.0, I added recency weighting — a small exponential decay score based on publishedDate blended with the relevance score (weight 0.15 by default). Fresh results surface within relevance-close clusters without overriding large relevance gaps. It's skipped automatically when you've set a time_range, since the result pool is already date-filtered.

Domain profiles

Not all search results are equally useful depending on what you're looking for. If I'm researching a Docker networking issue, I want results from the Docker docs, GitHub issues, and Linux sysadmin communities — not marketing pages that happen to mention Docker.

Domain profiles let you apply a named boost/block list per query:

homelab — surfaces self-hosted and Linux documentation, suppresses content farms
dev — surfaces Stack Overflow, MDN, npm docs

You pass domain_profile: "homelab" on any query and the domain filter applies. Profiles are defined in domains.json, which hot-reloads every 5 seconds — you can tune them without restarting the server.

Query expansion

For search and search_and_fetch, there's an optional expand parameter. When true, Ollama (qwen3:4b) generates 2-3 typed query variants — a technical rephrasing, a product/version-specific form, and a community phrasing (how someone would ask it on a forum). Those variants run in parallel with the original query, and the result pools are merged and deduplicated by URL before reranking.

It's not a serial rewrite — it's a parallel fan-out. If your first phrasing misses relevant results that a slightly different framing would surface, expansion catches them. Most useful for research queries; for precise lookups it adds latency (~3s) with less benefit.

You can also set EXPAND_QUERIES=true to enable it globally.

SSRF protections

This server runs on your local network and fetches arbitrary URLs. That creates SSRF risk — an attacker (or a confused agent) could potentially get it to fetch http://192.168.1.1/admin or http://localhost:2375 (Docker socket exposed, in the worst case).

fetch_url and search_and_fetch enforce a URL allowlist that blocks private IP ranges: 10.x, 192.168.x, 172.16–31.x, localhost, 127.x, IPv6 private ranges (::1, fc00::/7, fe80::/10), and non-HTTP protocols.

The IPv6 case caught me during a security pass — URL.hostname returns brackets for IPv6 addresses (e.g., [::1]), so naive regex matching against ::1 doesn't work. The fixed version matches the bracket-wrapped form.

There's also redirect blocking in the raw HTTP fetcher — rawFetch() refuses to follow redirects, preventing SSRF bypass via redirect chains to internal addresses. And Crawl4AI task_id values are validated before being interpolated into the poll URL to prevent path traversal.

Caching

Valkey (Redis-compatible) is optional but worthwhile. Search results are cached for 1 hour, fetched pages for 24 hours. For the kind of research queries AI agents run — often the same topic from slightly different angles over a session — this saves meaningful latency and avoids hammering SearXNG and Firecrawl with redundant requests.

The clear_cache tool lets you purge when you need fresh results on a fast-moving topic.

MCP client setup

For Claude Code, the recommended setup uses claude mcp add-json:

claude mcp add-json searxng --scope user '{
  "command": "node",
  "args": ["/path/to/searxng-mcp/build/src/index.js"],
  "env": {
    "SEARXNG_URL": "http://localhost:8081",
    "FIRECRAWL_URL": "http://localhost:3002",
    "RERANKER_URL": "http://localhost:8787",
    "OLLAMA_URL": "http://localhost:11434",
    "VALKEY_URL": "redis://localhost:6379",
    "CRAWL4AI_URL": "http://localhost:11235"
  }
}'

This writes to ~/.claude.json. Don't add it to ~/.claude/settings.json — that file isn't used for MCP env var injection in Claude Code.

For LibreChat, add it to librechat.yaml under mcpServers with type: stdio.

What runs in practice

The full required stack:

SearXNG — must have JSON format enabled in settings.yml
Firecrawl (firecrawl-simple) — local deployment, no API key needed for local instances
Reranker — FlashRank wrapper, reference implementation in homelab-agent/docker/reranker

Optional:

Valkey — caching
Crawl4AI — second-tier fetch fallback
Ollama — query expansion and summarization (requires qwen3:4b and/or qwen3:14b)

The server starts fine without the optional components and tells you clearly when a feature isn't available because its dependency isn't configured.

The repo

github.com/TadMSTR/searxng-mcp

MIT licensed. The full homelab stack it runs in — including the reranker Docker image — is documented in homelab-agent.

If you're already running SearXNG, the jump to a full agent-ready search pipeline is smaller than it looks.

DEV Community