SerpBase

Posted on May 5

MetaSearchMCP: A Metasearch Backend Built for AI Agents

#ai #webdev #serp #search

While traditional metasearch engines are still optimizing for browser tabs, MetaSearchMCP has already redefined the paradigm of machine-consumable search.

The Overlooked Problem

Most developers think of metasearch and immediately picture SearXNG — an excellent project, but one fundamentally designed for human eyes in a browser window. When you're building an AI agent that needs to retrieve information autonomously, SearXNG's HTML output becomes a liability. You end up writing scrapers, handling anti-bot measures, managing timeouts, and then normalizing a mess of heterogeneous results into something an LLM can actually reason about.

That's not a knock on SearXNG. It was built for humans. AI agents need something entirely different: a machine-consumable search API.

MetaSearchMCP was built for exactly this gap.

What Is It?

MetaSearchMCP is an open-source metasearch backend that exposes both an HTTP API and an MCP server. It unifies 20+ search providers — Google, DuckDuckGo, Brave, arXiv, GitHub, Stack Overflow, and more — behind a single standardized interface that returns structured JSON.

Quick stats:

39 Stars | Python 99.9% | MIT licensed
6 search categories: Web, Knowledge, Developer, Academic, Financial, and a dedicated Google chain
Native MCP protocol support (works out of the box with Claude Desktop, Cline, and Continue)

Core Design Philosophy

1. Concurrent Multi-Provider Aggregation

The traditional approach: pick one search API and hope it stays up. MetaSearchMCP's approach: query multiple engines simultaneously and take the best of what comes back.

POST /search
{
  "query": "fastapi vs django performance 2025",
  "tags": ["web", "developer"],  # Auto-selects DuckDuckGo + Bing + GitHub + Stack Overflow
  "max_results": 10
}

A built-in deduplication engine merges results pointing to the same URL across providers, re-ranks them by relevance, and returns a uniform schema.

2. Provider-Level Failure Isolation

One engine times out or throws an error? The rest keep running. Each provider gets its own timeout budget (default 10s). Partial failures are handled gracefully — you still get results from the providers that responded.

3. Agent-Friendly Payload Control

AI agents have limited context windows. MetaSearchMCP enforces max_results_per_provider caps to prevent dumping entire pages of HTML into a prompt. The output is clean, structured, and sized for LLM consumption.

4. MCP-First Architecture

Beyond the HTTP API, the core of the project is an MCP server that exposes tools over stdio:

Tool	Purpose
`search_web`	General web search
`search_google`	Google-dedicated search chain
`search_academic`	Academic paper search
`search_github`	Code and repository search
`compare_engines`	Compare results across multiple engines

This means you can say to Claude Desktop: "Find me recent papers on RAG evaluation" — and the search happens inline, without leaving the conversation.

Supported Search Providers

MetaSearchMCP's provider coverage is impressively broad:

Web Search: Google (direct scraping / SerpBase / Serper), DuckDuckGo, Bing, Brave, Yahoo, Yandex, Baidu, Ecosia, Qwant, Startpage, Mojeek, Mwmbl

Developer Search: GitHub, GitLab, Stack Overflow, Hacker News, Reddit, npm, PyPI, crates.io, Docker Hub, Go Packages

Academic Search: arXiv, PubMed, Semantic Scholar, CrossRef

Knowledge Bases: Wikipedia, Wikidata, Internet Archive, Open Library

Financial Data: Yahoo Finance, Alpha Vantage, Finnhub

Getting Started

Installation

git clone https://github.com/gefsikatsinelou/MetaSearchMCP
cd MetaSearchMCP
python scripts/install.py --dev --test --run

Run the HTTP API

python -m metasearchmcp.server
# Default: localhost:8000

Configure Providers

export SERPBASE_API_KEY="your_key"    # Preferred for Google search chain
export BRAVE_API_KEY="your_key"       # Fallback web search
export GITHUB_TOKEN="your_token"      # Developer search

The ENABLED_PROVIDERS env var gives you whitelist control over which engines are active.

MCP Integration (Claude Desktop)

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "metasearch": {
      "command": "python",
      "args": ["-m", "metasearchmcp.broker"]
    }
  }
}

Architecture at a Glance

The project is modular and lean:

Module	Responsibility
`contracts.py`	Pydantic request/response models, unified schema
`catalog.py`	Provider discovery and selection (by name or semantic tag)
`orchestrator.py`	Concurrent execution, response assembly, timeout handling
`merge.py`	URL normalization and cross-engine deduplication
`server.py`	FastAPI HTTP entrypoint
`broker.py`	MCP stdio entrypoint

No unnecessary abstraction layers. Each module does exactly one thing.

When to Use It

AI Research Agents: Autonomously retrieve papers, code, and documentation to generate literature reviews
RAG Pipelines: Replace a single search engine with diversified context sources
Competitive Monitoring: Track keywords and content changes across multiple engines simultaneously
Developer Tooling: Search Stack Overflow and GitHub inline from your IDE via MCP

Roadmap

The author's public todo list includes:

Caching layer with provider-aware query deduplication (reduce redundant API spend)
Cross-provider ranking signals — not just deduplication, but weighted relevance scoring
Streaming aggregation responses (SSE push for better agent UX)
Provider health telemetry (automatically downgrade unstable engines)
More first-party API integrations

Bottom Line

MetaSearchMCP solves a real but rarely discussed problem: what search infrastructure should AI agents actually use?

It's not another SearXNG clone. It's a ground-up redesign — MCP-native, structured output, concurrent aggregation, failure isolation. If you're building any system that needs to retrieve information autonomously, this belongs on your evaluation shortlist.

Project: github.com/gefsikatsinelou/MetaSearchMCP

SerpBase is one of the Google search providers built into MetaSearchMCP, delivering structured JSON SERP data.

DEV Community