Rhumb

Posted on Mar 30 • Edited on Apr 1 • Originally published at rhumb.dev

Exa vs Tavily vs Serper vs Brave Search for AI Agents — AN Score Comparison

#ai #mcp #api #webdev

Search is the most fundamental thing an agent does. Before it writes, plans, acts, or decides — it looks something up. You'd expect this to be a solved problem.

It mostly is. But "mostly" is where agents fail.

We scored five search APIs on Rhumb's AN Score framework: 20 dimensions covering execution reliability, error quality, auth predictability, and access readiness. The spread is narrower than CRMs or databases — search APIs are simpler primitives — but the differences compound when your agent runs search loops overnight.

The Scores

API	AN Score	Tier	Key Strength
Exa	8.7	L4 Native	Neural retrieval, structured output, agent-first design
Tavily	8.6	L4 Native	Purpose-built for agents, clean response schema
Serper	8.0	L4 Native	Google results, developer-friendly, predictable errors
Brave Search	7.1	L3 Ready	Independent index, privacy-first, solid but generic
Perplexity	6.8	L3 Ready	Returns synthesis, not raw results — changes the contract

These aren't bad scores. Search APIs score higher than CRMs (HubSpot: 4.6, Salesforce: 4.8) because the primitives are simpler. But a 1.9-point gap between Exa and Perplexity is significant when it's your agent doing 200 searches per day unattended.

Exa — 8.7/10

Exa is designed around semantic search and structured retrieval. The key differentiator: it returns structured data about pages — not just URLs and snippets — which means agents don't have to scrape or parse.

What earns 8.7:

Neural embeddings query returns relevance scores alongside results
contents parameter returns clean text/HTML extraction as part of the search response (no separate scraping call required)
API keys are self-provisionable: no support contact, no form, no delay
Rate limit headers always present (x-ratelimit-limit, x-ratelimit-remaining)
Errors are structured JSON with specific error field and meaningful messages
SDK available in Python and TypeScript with typed responses

Where it falls short:

Neural/semantic search can behave unpredictably on highly specific technical queries — traditional keyword search sometimes returns more reliable results for code-related lookups
highlights field (extracted key passages) is powerful but not always accurate; agents should treat it as a hint, not ground truth
Free tier is generous but rate limits drop hard at quota: no graceful degradation, just 429s

The 3am test: If your agent runs a nightly research loop and hits Exa's rate limit, it gets a clean 429 with a Retry-After header. It can backoff and resume. The structured contents field means it doesn't need a separate extraction call. This is what L4 looks like.

Tavily — 8.6/10

Tavily was explicitly built for AI agents. It's not a general-purpose search API that developers later adapted — it's search designed for the consumption patterns agents actually have.

What earns 8.6:

Response schema includes answer (synthesized), results (structured), and images (extracted) in one call
search_depth parameter (basic vs advanced) lets agents trade latency for result quality based on task urgency
include_raw_content boolean adds full page text extraction to any search — same Exa-style "search + extract" in one call
Per-API-key quotas with clear overage behavior (returns 429 with documented retry behavior)
Python SDK with async support baked in

Where it falls short:

Underlying results rely on multiple search backends, so result ordering can shift between API versions
No way to force a specific search engine or verify which backend was used for a given query — opacity matters when debugging agent hallucinations
answer synthesis is often good but shouldn't be used as ground truth — agents should treat it as a starting point

The 3am test: Tavily's search_depth: "advanced" mode costs 2x the credits but returns notably better results for ambiguous queries. An agent that can't reason about which mode to use will either overspend on credits or return shallow results. The API gives you the tool — the agent needs to be smart about using it.

Serper — 8.0/10

Serper wraps Google Search. That means fresh index, familiar results, high query familiarity — and the predictability that comes with a mature product built on top of the world's most-used search engine.

What earns 8.0:

Returns structured JSON including organic, knowledgeGraph, answerBox, relatedSearches — agents can extract signal from rich results without scraping
Separate endpoints for news, images, shopping, scholar — lets agents target the right index for the task
API keys are self-service: signup, verify email, get key immediately
Error responses are consistent: HTTP status code + structured JSON error body
Location parameter works reliably for geo-targeted queries

Where it falls short:

Depends on Google — any Google change or scraping policy enforcement can affect results without notice
No semantic/neural mode: purely keyword-based, which can miss conceptually relevant results that Exa finds
Rate limit headers are present but response time on the scholar endpoint is slower and less predictable than organic

The 3am test: For agents that need current events, news, or high-confidence freshness, Serper consistently outperforms neural search options. The news endpoint with tbs: qdr:d (past day filter) is genuinely useful for agents that need up-to-date information. The dependency on Google is a business risk more than a technical one.

Brave Search — 7.1/10

Brave runs an independent web index — not Google, not Bing. For agents with privacy requirements, regulatory constraints, or a need to avoid Google's results patterns, this matters.

What earns 7.1:

Independent index means results genuinely differ from Google/Bing — useful for diversification or bias testing
extra_snippets parameter returns additional context beyond the standard snippet
API key provisioning is clean: signup, verify, get key
Returns freshness field with query timestamp — useful for agents tracking information currency

Where it falls short:

Index coverage is smaller than Google/Bing — some niche technical queries return sparse results
Error responses lack specificity: 400 errors don't always explain what was malformed
No semantic search mode — purely traditional search
Rate limit documentation is less precise than Exa/Tavily; agents need more defensive retry logic

The 3am test: Brave is a good secondary search source for agents that want to cross-validate results or have compliance requirements against Google. As a primary search source for general research agents, the index coverage gap is real. The 7.1 reflects solid fundamentals with execution gaps on error specificity.

Perplexity — 6.8/10

Perplexity is the most interesting entry in this comparison because it changes the fundamental contract. Other search APIs return results — documents, URLs, snippets. Perplexity returns a synthesized answer with citations.

This is powerful for some agent patterns and problematic for others.

What earns 6.8:

Synthesis model is genuinely good — for "explain this topic" queries, the answer quality is high
Citations are structured and machine-readable
search_recency_filter controls how fresh the underlying sources are
model parameter lets agents select speed vs quality tradeoffs

Where it falls short:

Agents can't verify the synthesis: you get an answer, not raw results to reason over
Synthesis can hallucinate — the answer looks authoritative but may contain errors the agent can't detect
Rate limits are lower than Serper/Exa at equivalent price points
Not suitable for agents that need raw web data to make their own inferences — the synthesis happens inside Perplexity, not inside your agent
Error messages on malformed requests can be opaque

The 3am test: If your agent is a research assistant that writes summaries, Perplexity's synthesis is useful input. If your agent needs to reason over raw data, extract specific entities, or verify claims — Perplexity is the wrong tool. The 6.8 isn't a bad score; it reflects a tool that does a different job than the others.

Which to Use When

Semantic/conceptual research: Exa (8.7). Neural embeddings find related content that keyword search misses. Best for research agents that explore topic clusters.

Agent-first search with synthesis: Tavily (8.6). Purpose-built for the patterns agents actually use. search_depth parameter is a genuine quality lever.

Current events and Google-familiar results: Serper (8.0). Best freshness, most familiar results patterns, strong structured output from rich results.

Privacy/compliance or Google alternatives: Brave (7.1). Independent index, solid fundamentals, coverage gap on niche queries.

Synthesis over raw retrieval: Perplexity (6.8). Use when you want a synthesized starting point, not when you need raw data.

The Pattern

Search APIs cluster higher than most categories (all five score ≥ 6.8) because search is a relatively simple primitive: query in, results out. The design surface is smaller than CRMs or databases.

But the failure modes are subtle:

Result instability — what ranked #1 yesterday may not today; agents that cache results need freshness awareness
Rate limit handling — all five have limits; only the top three communicate them clearly enough for agent backoff
Synthesis vs retrieval mismatch — Perplexity's 6.8 is partly about using synthesis when your agent needs raw retrieval
Index coverage assumptions — agents trained on Google-scale expectations will underperform with smaller indexes

For most agent builders, Exa or Tavily is the right default. Both are L4 Native, both score above 8.5, and both return structured data that reduces downstream processing. Exa wins on semantic retrieval; Tavily wins on agent-specific ergonomics.

Rhumb scores 645+ APIs on 20 execution dimensions — execution reliability, error quality, auth predictability, rate limit transparency, and more. All scores at rhumb.dev.

Also in this series: LLM APIs for AI Agents · Payment APIs · Database APIs · CRM APIs

Start here: Want the full map of agent API selection, comparisons, reliability checklists, and the full infrastructure series? Read The Complete Guide to API Selection for AI Agents (2026).