While traditional metasearch engines are still optimizing for browser tabs, MetaSearchMCP has already redefined the paradigm of machine-consumable search.
The Overlooked Problem
Most developers think of metasearch and immediately picture SearXNG — an excellent project, but one fundamentally designed for human eyes in a browser window. When you're building an AI agent that needs to retrieve information autonomously, SearXNG's HTML output becomes a liability. You end up writing scrapers, handling anti-bot measures, managing timeouts, and then normalizing a mess of heterogeneous results into something an LLM can actually reason about.
That's not a knock on SearXNG. It was built for humans. AI agents need something entirely different: a machine-consumable search API.
MetaSearchMCP was built for exactly this gap.
What Is It?
MetaSearchMCP is an open-source metasearch backend that exposes both an HTTP API and an MCP server. It unifies 20+ search providers — Google, DuckDuckGo, Brave, arXiv, GitHub, Stack Overflow, and more — behind a single standardized interface that returns structured JSON.
Quick stats:
- 39 Stars | Python 99.9% | MIT licensed
- 6 search categories: Web, Knowledge, Developer, Academic, Financial, and a dedicated Google chain
- Native MCP protocol support (works out of the box with Claude Desktop, Cline, and Continue)
Core Design Philosophy
1. Concurrent Multi-Provider Aggregation
The traditional approach: pick one search API and hope it stays up. MetaSearchMCP's approach: query multiple engines simultaneously and take the best of what comes back.
POST /search
{
"query": "fastapi vs django performance 2025",
"tags": ["web", "developer"], # Auto-selects DuckDuckGo + Bing + GitHub + Stack Overflow
"max_results": 10
}
A built-in deduplication engine merges results pointing to the same URL across providers, re-ranks them by relevance, and returns a uniform schema.
2. Provider-Level Failure Isolation
One engine times out or throws an error? The rest keep running. Each provider gets its own timeout budget (default 10s). Partial failures are handled gracefully — you still get results from the providers that responded.
3. Agent-Friendly Payload Control
AI agents have limited context windows. MetaSearchMCP enforces max_results_per_provider caps to prevent dumping entire pages of HTML into a prompt. The output is clean, structured, and sized for LLM consumption.
4. MCP-First Architecture
Beyond the HTTP API, the core of the project is an MCP server that exposes tools over stdio:
| Tool | Purpose |
|---|---|
search_web |
General web search |
search_google |
Google-dedicated search chain |
search_academic |
Academic paper search |
search_github |
Code and repository search |
compare_engines |
Compare results across multiple engines |
This means you can say to Claude Desktop: "Find me recent papers on RAG evaluation" — and the search happens inline, without leaving the conversation.
Supported Search Providers
MetaSearchMCP's provider coverage is impressively broad:
Web Search: Google (direct scraping / SerpBase / Serper), DuckDuckGo, Bing, Brave, Yahoo, Yandex, Baidu, Ecosia, Qwant, Startpage, Mojeek, Mwmbl
Developer Search: GitHub, GitLab, Stack Overflow, Hacker News, Reddit, npm, PyPI, crates.io, Docker Hub, Go Packages
Academic Search: arXiv, PubMed, Semantic Scholar, CrossRef
Knowledge Bases: Wikipedia, Wikidata, Internet Archive, Open Library
Financial Data: Yahoo Finance, Alpha Vantage, Finnhub
Getting Started
Installation
git clone https://github.com/gefsikatsinelou/MetaSearchMCP
cd MetaSearchMCP
python scripts/install.py --dev --test --run
Run the HTTP API
python -m metasearchmcp.server
# Default: localhost:8000
Configure Providers
export SERPBASE_API_KEY="your_key" # Preferred for Google search chain
export BRAVE_API_KEY="your_key" # Fallback web search
export GITHUB_TOKEN="your_token" # Developer search
The ENABLED_PROVIDERS env var gives you whitelist control over which engines are active.
MCP Integration (Claude Desktop)
Add to claude_desktop_config.json:
{
"mcpServers": {
"metasearch": {
"command": "python",
"args": ["-m", "metasearchmcp.broker"]
}
}
}
Architecture at a Glance
The project is modular and lean:
| Module | Responsibility |
|---|---|
contracts.py |
Pydantic request/response models, unified schema |
catalog.py |
Provider discovery and selection (by name or semantic tag) |
orchestrator.py |
Concurrent execution, response assembly, timeout handling |
merge.py |
URL normalization and cross-engine deduplication |
server.py |
FastAPI HTTP entrypoint |
broker.py |
MCP stdio entrypoint |
No unnecessary abstraction layers. Each module does exactly one thing.
When to Use It
- AI Research Agents: Autonomously retrieve papers, code, and documentation to generate literature reviews
- RAG Pipelines: Replace a single search engine with diversified context sources
- Competitive Monitoring: Track keywords and content changes across multiple engines simultaneously
- Developer Tooling: Search Stack Overflow and GitHub inline from your IDE via MCP
Roadmap
The author's public todo list includes:
- Caching layer with provider-aware query deduplication (reduce redundant API spend)
- Cross-provider ranking signals — not just deduplication, but weighted relevance scoring
- Streaming aggregation responses (SSE push for better agent UX)
- Provider health telemetry (automatically downgrade unstable engines)
- More first-party API integrations
Bottom Line
MetaSearchMCP solves a real but rarely discussed problem: what search infrastructure should AI agents actually use?
It's not another SearXNG clone. It's a ground-up redesign — MCP-native, structured output, concurrent aggregation, failure isolation. If you're building any system that needs to retrieve information autonomously, this belongs on your evaluation shortlist.
Project: github.com/gefsikatsinelou/MetaSearchMCP
SerpBase is one of the Google search providers built into MetaSearchMCP, delivering structured JSON SERP data.
Top comments (0)