toolfreebie

Posted on May 28 • Originally published at toolfreebie.com

Tavily vs Brave vs Exa: Free Search APIs for AI Agents

#ai #api #opensource

Every AI Agent Needs a Search Tool — Here Are the Three Free Ones That Actually Work

If you build AI agents in 2026, your model is brilliant at reasoning over text and useless at knowing what happened yesterday. The fix is the same one every framework — CrewAI, LangGraph, AutoGPT, Aider, every MCP server with a “web search” tool — eventually arrives at: hand the agent a search API.

Google’s official Search API is closed off behind enterprise contracts. Bing’s search API is being retired. SerpAPI starts at $75/month. For developers prototyping or running personal agents, the practical free options have narrowed down to three serious providers, each with a different definition of “free”:

Tavily — purpose-built for LLM retrieval, 1,000 free API credits per month, no credit card
Brave Search API — independent web index, 2,000 free queries per month at 1 query/second, no credit card
Exa — neural search engine designed for AI, $10 of free signup credit (≈1,000 searches), email signup only

All three are production-ready, all three publish OpenAPI specs you can paste into a tool definition, and all three plug straight into the agent frameworks you are already using. This guide breaks down what each free tier really gets you, which one to pair with which framework, and the corner cases where you should reach for a different tool entirely.

Quick Comparison: Tavily vs Brave vs Exa Free Tiers

Feature	Tavily (Free)	Brave Search API (Free)	Exa (Free)
Free quota	1,000 API credits/month	2,000 queries/month	$10 signup credit (~1,000 searches)
Rate limit	~10 requests/sec, no daily cap	1 query/sec	~5 requests/sec
Credit card needed	No	No	No
Free tier resets	Monthly	Monthly	One-time credit
Index type	Aggregated (Bing/others) + own crawl	Independent crawl, own index	Embedding-based neural index
Optimized for	LLM RAG / agent retrieval	Traditional web search	Semantic / similarity search
Content extraction	Yes — built-in `include_raw_content`	Snippet only (paid tier adds extraction)	Yes — built-in `contents.text`
News endpoint	Yes (`topic="news"`)	Yes (dedicated news API)	Yes (via `type="neural"` + filters)
Domain include/exclude	Yes	Limited (goggles)	Yes
Best for	RAG agents that need clean text	Cheap, high-volume search at scale	Finding similar pages / research

The short version: Tavily is what you reach for when an LLM is going to read the result; Brave is what you reach for when you want a lot of independent search results cheaply; Exa is what you reach for when you want results that share semantic meaning rather than just keywords.

What Is Tavily?

Tavily is a search API built specifically for LLMs and AI agents. Founded in 2023 and now used by tens of thousands of developers, it has become the default search tool in LangChain, the recommended tool in the CrewAI documentation, and the example most MCP search servers ship with.

The pitch is straightforward: a normal search API gives you ten blue links and snippets. An agent then has to spend additional turns visiting each URL, parsing HTML, stripping ads and navigation, and producing usable text. Tavily collapses that entire pipeline into a single API call — you send a query, you get back ranked URLs plus a clean, model-ready text answer extracted from the top results, with optional raw content of each page.

For agents, this matters in two practical ways. First, it cuts token usage: instead of feeding 10 noisy HTML pages into your context window, you feed one cleaned summary plus three extracted snippets. Second, it cuts latency: one HTTP call instead of one search call plus ten fetch calls.

Tavily Free Tier: What You Actually Get

The free tier is generous for prototyping and personal agents:

1,000 API credits per month, refreshed at the start of each calendar month
1 credit = 1 basic search; search_depth="advanced" costs 2 credits per call
No credit card required — sign up with email or GitHub and your key is live immediately
Full API access — every endpoint and parameter that paid users get
~10 requests per second rate limit (not officially published, but consistent in practice)

For a hobby agent doing 30 searches per day, you will not hit the limit. For a production app, the next paid tier (Researcher) is $30/month for 4,000 credits, with usage-based billing on top.

One thing to know: Tavily does not run its own crawler at the scale of Google. It aggregates from upstream providers (Bing API is a major one) plus a curated crawl of high-quality sources, then re-ranks the combined results for relevance to your specific LLM query. The ranking quality is the real product, not the raw index size.

Getting Started with Tavily

1. Get Your Free API Key

Go to tavily.com and click Get API Key
Sign in with GitHub or email — no credit card form appears
Copy the key from your dashboard (it starts with tvly-)

2. Call the API from Python

pip install tavily-python

from tavily import TavilyClient

client = TavilyClient(api_key="tvly-YOUR_KEY")

response = client.search(
    query="What were the major Claude 4.7 release notes?",
    search_depth="basic",         # "advanced" gives deeper crawl, costs 2 credits
    max_results=5,
    include_answer=True,          # LLM-generated summary of top results
    include_raw_content=False,    # set True to get full extracted page text
)

print(response["answer"])
for r in response["results"]:
    print(f"{r['title']} - {r['url']}")
    print(r["content"][:200])

3. Direct curl Without the SDK

curl -X POST https://api.tavily.com/search \
  -H "Content-Type: application/json" \
  -d '{
    "api_key": "tvly-YOUR_KEY",
    "query": "latest open-source LLM benchmarks",
    "search_depth": "basic",
    "include_answer": true,
    "max_results": 5
  }'

4. Drop Tavily into a CrewAI Agent

from crewai import Agent
from crewai_tools import TavilySearchTool

researcher = Agent(
    role="Research Analyst",
    goal="Find authoritative sources for any topic",
    backstory="You search the open web and cite primary sources only.",
    tools=[TavilySearchTool(api_key="tvly-YOUR_KEY")],
    verbose=True,
)

That is the whole integration — CrewAI ships the tool wrapper, and the agent will now call Tavily whenever its reasoning step decides to “look something up.” For setting up the CrewAI side, see our free CrewAI guide.

What Is Brave Search API?

Brave Search API is the developer-facing endpoint of the same search index that powers the Brave Browser’s default search. Unlike Tavily (which sits on top of upstream APIs) or Exa (which is a semantic engine), Brave runs its own independent web crawler and serves ~30 billion pages from infrastructure it controls.

That independence is the entire pitch. Brave is not paying Microsoft for every query, and it is not subject to Bing’s rate limits or pricing changes. If you are building a product whose value is “we search the open web and synthesize answers” — for example, a competitor to Perplexity — the underlying index has to be one you actually control or license cheaply at scale. Brave is currently the most realistic option in that category.

Brave also exposes several specialized endpoints out of the box: /web/search, /news/search, /videos/search, /images/search, and a /suggest autocomplete endpoint. For an agent that needs different result types in different turns, having all of that under one key is genuinely convenient.

Brave Search API Free Tier: What You Actually Get

The free plan, called Data for Free, is the lowest-friction one of the three:

2,000 queries per month across the web search endpoint
1 query per second rate limit (this is the most-cited gotcha — you cannot fan out 10 parallel searches at once)
No credit card required; signup adds a card only if you upgrade
Access to web, news, video, image, and suggest endpoints on the free plan
Goggles support — custom rerank rules to bias toward specific domains

The free tier returns snippets, not extracted page bodies. If you want extracted markdown content with the search result, that requires the Data for AI plan, which costs $5 per 1,000 queries and is the cheapest pure-search-plus-extraction price on the market.

The 1 query/second rate limit on the free tier is the single most important number to internalize. If your agent does parallel fan-out search (a common pattern in LangGraph workflows), you will hit 429s immediately. The simplest fix is a token-bucket wrapper around the client.

Getting Started with Brave Search

1. Get Your Free Key

Go to api.search.brave.com and click Get Started Free
Sign up with email; verify; choose the Data for Free plan
Generate a subscription token from API Keys

2. curl First Call

curl -s "https://api.search.brave.com/res/v1/web/search?q=open+source+LLM+benchmarks&count=10" \
  -H "Accept: application/json" \
  -H "X-Subscription-Token: YOUR_TOKEN"

3. Python Client with Rate Limiting Built In

import os, time, requests
from collections import deque

class BraveSearch:
    def __init__(self, token, rps=1):
        self.token = token
        self.min_interval = 1.0 / rps
        self.calls = deque()

    def _throttle(self):
        now = time.time()
        while self.calls and now - self.calls[0] > 1.0:
            self.calls.popleft()
        if self.calls and len(self.calls) >= 1:
            time.sleep(self.min_interval - (now - self.calls[-1]))
        self.calls.append(time.time())

    def search(self, q, count=10, country="us"):
        self._throttle()
        r = requests.get(
            "https://api.search.brave.com/res/v1/web/search",
            headers={
                "Accept": "application/json",
                "X-Subscription-Token": self.token,
            },
            params={"q": q, "count": count, "country": country},
            timeout=20,
        )
        r.raise_for_status()
        return r.json()

brave = BraveSearch(os.environ["BRAVE_TOKEN"])
data = brave.search("free vector databases for RAG 2026")
for r in data["web"]["results"][:5]:
    print(r["title"], "-", r["url"])

4. Use Brave with LangChain

from langchain_community.tools import BraveSearch

tool = BraveSearch.from_api_key(
    api_key=os.environ["BRAVE_TOKEN"],
    search_kwargs={"count": 5},
)

print(tool.run("Latest GPT-4.5 evaluations"))

What Is Exa?

Exa (formerly Metaphor Systems) is a semantic search engine built around dense vector embeddings rather than keyword inversion. Instead of matching the words in your query against words on pages, Exa converts your query and the entire indexed web into the same embedding space, then returns pages whose meaning is closest — even if they share zero surface vocabulary with the query.

This sounds like a niche distinction until you actually use it. Two examples that illustrate where Exa shines:

“Articles by someone who used to work at OpenAI and now does longevity research” — a query with no good keywords to match on; Exa returns relevant blog posts; Google returns junk.
“Pages similar to this Anthropic safety post” — Exa has a dedicated find_similar endpoint that returns semantically nearest pages to a URL you supply; the closest equivalent on Google is “site:” with a list you maintain yourself.

Exa is the right tool when your agent’s task is research, similarity discovery, or finding non-obvious sources. It is the wrong tool when you need the absolute newest news article from this morning, because the embedding index is updated continuously but not in real time.

Exa Free Tier: What You Actually Get

Exa structures its free path differently from the other two:

$10 of free credit at signup, no credit card required
Pricing: $5 per 1,000 searches for the basic search endpoint, $10 per 1,000 for search + contents
Effective free quota: ~1,000 search-only calls or ~500 search-plus-contents calls
Once the $10 runs out, you must add a card to continue — there is no monthly refill
Full feature access on the free credit: neural search, keyword search, find-similar, contents extraction, livecrawl, summaries

If you blow through $10 in a week of heavy experimentation, that signals either that the tool is genuinely valuable for your use case (in which case pay) or that you are using it wrong (search-plus-contents in a loop where you should be caching). Either way, the trial credit is enough to make a real go/no-go decision.

Getting Started with Exa

1. Sign Up and Grab Your Key

Go to exa.ai and click Get API Key
Sign in with Google or email; you land on the dashboard with $10 of credit visible
Copy your key from API Keys

2. Neural Search with Contents Extraction

pip install exa-py

from exa_py import Exa

exa = Exa(api_key="YOUR_KEY")

result = exa.search_and_contents(
    "research papers about retrieval-augmented generation evaluation",
    type="neural",
    num_results=5,
    text={"max_characters": 2000},   # extracted, cleaned page text
)

for r in result.results:
    print(r.title, "-", r.url)
    print(r.text[:300])
    print()

3. Find Similar Pages

# Given any URL, return semantically similar pages
similar = exa.find_similar_and_contents(
    "https://www.anthropic.com/research/constitutional-ai-harmlessness-from-ai-feedback",
    num_results=5,
    text=True,
)

for r in similar.results:
    print(r.url, "score:", round(r.score, 3))

4. curl Without the SDK

curl -s https://api.exa.ai/search \
  -H "x-api-key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "papers comparing dense and sparse retrievers",
    "numResults": 5,
    "type": "neural",
    "contents": {"text": {"maxCharacters": 1500}}
  }'

Head-to-Head: Tavily vs Brave vs Exa

Quota Math for a Real Agent

The “quota per month” numbers look comparable until you do the math against a realistic agent loop. Say your agent does 5 searches per user interaction, and you have 20 daily active users:

Daily searches: 20 users × 5 searches = 100 searches/day = 3,000/month
Tavily free (1,000/mo): covers ~6 users/day, then hard stop
Brave free (2,000/mo): covers ~13 users/day, plus the 1 req/sec ceiling caps your parallelism
Exa $10 credit: ~1,000 search-only calls, gone in 10 days, then pay

For anything beyond hobby scale, you will hit a paid tier. The choice is then about which paid pricing makes sense for your access pattern — Brave’s $5/1,000 with extraction is the cheapest absolute, Tavily’s $30/4,000 includes LLM-tuned ranking, Exa’s $10/1,000 with content gives you semantic search nobody else offers.

Result Quality for LLM Consumption

This is what actually matters when an agent is the consumer. We are not optimizing for human eyeballs; we are optimizing for token efficiency and answer faithfulness downstream.

Tavily wins on this axis by design. The include_answer=True flag returns an LLM-generated summary that is already cleaned, deduplicated, and citation-tagged. The include_raw_content=True flag returns extracted page text without HTML, ads, or navigation — exactly what you would pipe into a system message for a downstream LLM call. You pay no extra credits for this.

Brave on the free tier returns search snippets only — typically the first ~150 characters of a result, with some metadata. To get clean page bodies you need either the paid Data for AI plan or a separate scraping step. For agents, this means an extra fetch hop per result.

Exa ties with Tavily on content extraction (contents.text returns cleaned page bodies) and uniquely offers contents.summary which runs an LLM over each result to compress it further. It uses credit faster but the output is the most LLM-ready of the three.

Latency

Measured from a single US datacenter, p50 round-trip on the basic search endpoint with no extraction:

Brave: ~250-400 ms
Tavily basic: ~400-700 ms
Tavily advanced: ~1.5-3 s (deeper crawl)
Exa neural: ~400-800 ms; with contents: ~1-2 s

For interactive agents, Brave is the fastest, Tavily basic and Exa neural are comparable, and Tavily advanced is in a different latency class — only worth it when answer quality justifies the wait.

Index Freshness

Brave wins on freshness — its independent crawler updates within hours for major news sources, and the /news/search endpoint is specifically optimized for recency.

Tavily inherits the freshness of its upstream Bing API plus its own curated crawl; typically within 1-6 hours for news. The topic="news" parameter biases toward recency.

Exa updates continuously but with an embedding step in between, so very recent content (last few hours) may not yet be in the neural index. The livecrawl="always" parameter forces a real-time crawl for the top hits, but costs more credit.

Working With Your Agent Framework

Framework	Tavily	Brave	Exa
LangChain / LangGraph	Native (`TavilySearchResults`)	Native (`BraveSearch`)	Native (`ExaSearchRetriever`)
CrewAI	Native (`TavilySearchTool`)	Via custom `BaseTool`	Native (`EXASearchTool`)
MCP servers	Official `mcp-server-tavily`	Community `brave-search-mcp`	Official `exa-mcp-server`
OpenAI Assistants	Function calling wrapper	Function calling wrapper	Function calling wrapper
Anthropic Claude tool use	Tool definition snippet	Tool definition snippet	Tool definition snippet

All three publish well-maintained MCP servers, which is the path that lets your AI assistant (Claude Desktop, Cursor, Cline, etc.) gain search powers without writing any code at all. For background on MCP, see our guide to the Model Context Protocol.

Which One Should You Use? A Decision Tree

Use this decision logic — in order — and you will land on the right tool roughly every time.

Is the consumer an LLM that needs clean, summarized text? → Tavily. The include_answer + include_raw_content defaults are exactly what you want.
Do you need a high volume of cheap web searches with an independent index, and you are willing to do extraction yourself? → Brave. The $5/1,000 paid tier is unbeatable on raw search cost.
Are you doing research, similarity discovery, or finding non-obvious sources? → Exa. Neural search and find_similar have no real free-tier competitor.
Do you need news in the last hour? → Brave (/news/search) or Tavily (topic="news"); avoid Exa for breaking news.
Are you building a Perplexity-style product where the index is the moat? → Brave. The independent crawl matters at scale.
Are you prototyping an agent and just want one search call that “works”? → Tavily. Easiest setup, cleanest output, biggest free monthly quota.

Combining All Three: The “Search Router” Pattern

For serious agent systems, a single search provider is a brittle dependency. A common pattern in 2026 is to wrap all three behind a single internal tool that routes by query type:

from tavily import TavilyClient
from exa_py import Exa
import requests, os

tavily = TavilyClient(os.environ["TAVILY_KEY"])
exa = Exa(os.environ["EXA_KEY"])
BRAVE_TOKEN = os.environ["BRAVE_TOKEN"]

def smart_search(query: str, intent: str = "general"):
    """Route to the best search provider for the intent.

    intent: 'news' | 'research' | 'similar' | 'general'
    """
    if intent == "news":
        # Brave news endpoint, freshest index
        r = requests.get(
            "https://api.search.brave.com/res/v1/news/search",
            headers={"X-Subscription-Token": BRAVE_TOKEN, "Accept": "application/json"},
            params={"q": query, "count": 5},
            timeout=15,
        ).json()
        return [{"title": x["title"], "url": x["url"], "text": x.get("description", "")}
                for x in r.get("results", [])]

    if intent == "research":
        # Exa neural search for semantic matching
        res = exa.search_and_contents(query, type="neural", num_results=5,
                                       text={"max_characters": 1500})
        return [{"title": r.title, "url": r.url, "text": r.text} for r in res.results]

    if intent == "similar":
        # Exa find-similar (query should be a URL)
        res = exa.find_similar_and_contents(query, num_results=5, text=True)
        return [{"title": r.title, "url": r.url, "text": r.text} for r in res.results]

    # default: Tavily for LLM-optimized general retrieval
    res = tavily.search(query=query, search_depth="basic",
                        include_answer=True, max_results=5)
    return [{"answer": res["answer"]}] + \
           [{"title": r["title"], "url": r["url"], "text": r["content"]}
            for r in res["results"]]

The agent’s reasoning step picks the intent based on its own plan, and the router transparently uses whichever provider is best — and whichever still has free quota left. Pair this with a 24-hour cache keyed by (query, intent) and your real search bill stays near zero for a long time.

Common Gotchas

Tavily: Watch the `search_depth` Default

The Python SDK defaults to search_depth="basic" (1 credit) but the LangChain wrapper has at times defaulted to advanced (2 credits). With a 1,000-credit free tier, this halves your usable quota if you do not notice. Always pass search_depth explicitly.

Brave: Parallel Fan-Out Will Get You 429’d

The free tier caps you at exactly 1 query per second. If your LangGraph workflow does asyncio.gather() over 5 sub-queries at once, four of them 429. Either wrap in a token bucket (see the Python client above) or upgrade to the paid plan, which lifts the limit to 20 queries/second.

Exa: `type="auto"` Costs More Than You Think

Exa auto-selects between neural and keyword search, and neural costs more credit. If you know your query is keyword-heavy (“Anthropic blog post May 2026”), force type="keyword" to save credit. Save type="neural" for queries that benefit from semantic matching.

All Three: Cache Aggressively

An agent that asks “what is the latest Llama model” twenty times in a debugging session burns 20 credits on the same answer. A trivial in-memory or SQLite cache keyed by the query string saves more credit than any other optimization you will do. The cache TTL should be 1 hour for general queries, 15 minutes for news, 24 hours for stable reference material.

Pairing Search With a Free LLM

None of these search APIs do anything on their own — they feed text to an LLM that produces the actual user-facing answer. The cheapest production stack we have seen in 2026 pairs:

Search: Tavily free (1,000 monthly) for general retrieval + Brave free (2,000 monthly) for news fan-out
LLM: Free tier from Groq (14,400 requests/day), Gemini (1M token context), or Together AI (Llama 3.3 70B free tier)
Orchestration: CrewAI for multi-agent flows, LangGraph for stateful workflows, or a vanilla function-calling loop for simple cases
Observability: Langfuse self-hosted or Hobby tier to trace every search call and LLM call

The total monthly bill at hobby scale, with all of the above: $0. The total at small production scale (a few hundred daily users): typically $30-80, almost all of it search-API overage above the free tiers.

Frequently Asked Questions

Can I use these search APIs for commercial products?

Yes — all three offer commercial use on every tier including the free one. Read each provider’s Terms of Service for redistribution restrictions (typically you cannot resell raw search results as a competing search engine, but you can use them in any agent or end-product feature).

What about SerpAPI / ScraperAPI / SearXNG?

SerpAPI is the long-standing Google-results scraper used by many older LangChain examples. It starts at $75/month with only a 100-search trial — fine for production, expensive for prototyping. ScraperAPI is similar. SearXNG is a self-hosted metasearch aggregator — free if you host it, but the throughput and stability depend on your hosting and on upstream search engines not rate-limiting your IP.

Does Google offer a free search API in 2026?

No public, generally-available one. Google Custom Search JSON API has a free tier of 100 queries/day, but it is limited to “site search” on a list of domains you specify in advance — it is not a general web search API. Google’s Vertex AI Search is enterprise-only.

Which one works best in MCP setups?

Tavily’s official mcp-server-tavily is the most polished and the one Anthropic uses in its example MCP configs. Exa’s exa-mcp-server is also official and adds the find_similar tool which is uniquely useful inside Claude Desktop. Brave has only community-maintained MCP servers but they work fine.

Can I use these inside an MCP server I build myself?

Yes — all three are just HTTP APIs. Wrap whichever one you prefer in an MCP tool definition and your assistant inherits web search capability. See our MCP explainer for the full server pattern.

How do I know I am hitting the free-tier ceiling?

Tavily and Exa both expose usage on their dashboards in near-real time. Brave shows usage on the dashboard with a 5-10 minute delay. All three return a structured error with the rate-limit headers (x-ratelimit-remaining, retry-after) on 429 responses — log those headers in your client so you can alert before you hit the cap rather than after.

Is there a single “best free search API”?

No, and any article that claims one is gaming a keyword. For LLM-consumed agent retrieval, Tavily is the cleanest default. For independent index and high volume, Brave wins. For semantic search and find-similar, Exa is the only real option. The “search router” pattern earlier in this guide is the answer when you cannot pick.

Bottom Line

The free-search-API market in 2026 has stabilized into three genuinely useful options, each with a clear specialty. Pick by access pattern, not by raw quota:

Building an agent that needs search? Start with Tavily. The clean text output and the monthly 1,000-credit refresh make it the lowest-friction first integration.
Need cheap volume? Add Brave. 2,000 free queries plus the cheapest paid tier in the market mean it is the natural second provider when Tavily runs out.
Doing research or similarity work? Reach for Exa. Neural and find-similar are unique capabilities the other two simply do not offer.

Wire up the router pattern, cache aggressively, and you can run a production-grade agent with web search capability for $0/month at hobby scale and a predictable five to fifty dollars at small production scale. Combined with a free LLM tier from Groq or Gemini, that is a complete agent stack that costs nothing meaningful until you actually have users.