Every AI Agent Needs a Search Tool — Here Are the Three Free Ones That Actually Work
If you build AI agents in 2026, your model is brilliant at reasoning over text and useless at knowing what happened yesterday. The fix is the same one every framework — CrewAI, LangGraph, AutoGPT, Aider, every MCP server with a “web search” tool — eventually arrives at: hand the agent a search API.
Google’s official Search API is closed off behind enterprise contracts. Bing’s search API is being retired. SerpAPI starts at $75/month. For developers prototyping or running personal agents, the practical free options have narrowed down to three serious providers, each with a different definition of “free”:
- Tavily — purpose-built for LLM retrieval, 1,000 free API credits per month, no credit card
- Brave Search API — independent web index, 2,000 free queries per month at 1 query/second, no credit card
- Exa — neural search engine designed for AI, $10 of free signup credit (≈1,000 searches), email signup only
All three are production-ready, all three publish OpenAPI specs you can paste into a tool definition, and all three plug straight into the agent frameworks you are already using. This guide breaks down what each free tier really gets you, which one to pair with which framework, and the corner cases where you should reach for a different tool entirely.
Quick Comparison: Tavily vs Brave vs Exa Free Tiers
| Feature | Tavily (Free) | Brave Search API (Free) | Exa (Free) |
|---|---|---|---|
| Free quota | 1,000 API credits/month | 2,000 queries/month | $10 signup credit (~1,000 searches) |
| Rate limit | ~10 requests/sec, no daily cap | 1 query/sec | ~5 requests/sec |
| Credit card needed | No | No | No |
| Free tier resets | Monthly | Monthly | One-time credit |
| Index type | Aggregated (Bing/others) + own crawl | Independent crawl, own index | Embedding-based neural index |
| Optimized for | LLM RAG / agent retrieval | Traditional web search | Semantic / similarity search |
| Content extraction | Yes — built-in include_raw_content |
Snippet only (paid tier adds extraction) | Yes — built-in contents.text |
| News endpoint | Yes (topic="news") |
Yes (dedicated news API) | Yes (via type="neural" + filters) |
| Domain include/exclude | Yes | Limited (goggles) | Yes |
| Best for | RAG agents that need clean text | Cheap, high-volume search at scale | Finding similar pages / research |
The short version: Tavily is what you reach for when an LLM is going to read the result; Brave is what you reach for when you want a lot of independent search results cheaply; Exa is what you reach for when you want results that share semantic meaning rather than just keywords.
What Is Tavily?
Tavily is a search API built specifically for LLMs and AI agents. Founded in 2023 and now used by tens of thousands of developers, it has become the default search tool in LangChain, the recommended tool in the CrewAI documentation, and the example most MCP search servers ship with.
The pitch is straightforward: a normal search API gives you ten blue links and snippets. An agent then has to spend additional turns visiting each URL, parsing HTML, stripping ads and navigation, and producing usable text. Tavily collapses that entire pipeline into a single API call — you send a query, you get back ranked URLs plus a clean, model-ready text answer extracted from the top results, with optional raw content of each page.
For agents, this matters in two practical ways. First, it cuts token usage: instead of feeding 10 noisy HTML pages into your context window, you feed one cleaned summary plus three extracted snippets. Second, it cuts latency: one HTTP call instead of one search call plus ten fetch calls.
Tavily Free Tier: What You Actually Get
The free tier is generous for prototyping and personal agents:
- 1,000 API credits per month, refreshed at the start of each calendar month
-
1 credit = 1 basic search;
search_depth="advanced"costs 2 credits per call - No credit card required — sign up with email or GitHub and your key is live immediately
- Full API access — every endpoint and parameter that paid users get
- ~10 requests per second rate limit (not officially published, but consistent in practice)
For a hobby agent doing 30 searches per day, you will not hit the limit. For a production app, the next paid tier (Researcher) is $30/month for 4,000 credits, with usage-based billing on top.
One thing to know: Tavily does not run its own crawler at the scale of Google. It aggregates from upstream providers (Bing API is a major one) plus a curated crawl of high-quality sources, then re-ranks the combined results for relevance to your specific LLM query. The ranking quality is the real product, not the raw index size.
Getting Started with Tavily
1. Get Your Free API Key
- Go to tavily.com and click Get API Key
- Sign in with GitHub or email — no credit card form appears
- Copy the key from your dashboard (it starts with
tvly-)
2. Call the API from Python
pip install tavily-python
from tavily import TavilyClient
client = TavilyClient(api_key="tvly-YOUR_KEY")
response = client.search(
query="What were the major Claude 4.7 release notes?",
search_depth="basic", # "advanced" gives deeper crawl, costs 2 credits
max_results=5,
include_answer=True, # LLM-generated summary of top results
include_raw_content=False, # set True to get full extracted page text
)
print(response["answer"])
for r in response["results"]:
print(f"{r['title']} - {r['url']}")
print(r["content"][:200])
3. Direct curl Without the SDK
curl -X POST https://api.tavily.com/search \
-H "Content-Type: application/json" \
-d '{
"api_key": "tvly-YOUR_KEY",
"query": "latest open-source LLM benchmarks",
"search_depth": "basic",
"include_answer": true,
"max_results": 5
}'
4. Drop Tavily into a CrewAI Agent
from crewai import Agent
from crewai_tools import TavilySearchTool
researcher = Agent(
role="Research Analyst",
goal="Find authoritative sources for any topic",
backstory="You search the open web and cite primary sources only.",
tools=[TavilySearchTool(api_key="tvly-YOUR_KEY")],
verbose=True,
)
That is the whole integration — CrewAI ships the tool wrapper, and the agent will now call Tavily whenever its reasoning step decides to “look something up.” For setting up the CrewAI side, see our free CrewAI guide.
What Is Brave Search API?
Brave Search API is the developer-facing endpoint of the same search index that powers the Brave Browser’s default search. Unlike Tavily (which sits on top of upstream APIs) or Exa (which is a semantic engine), Brave runs its own independent web crawler and serves ~30 billion pages from infrastructure it controls.
That independence is the entire pitch. Brave is not paying Microsoft for every query, and it is not subject to Bing’s rate limits or pricing changes. If you are building a product whose value is “we search the open web and synthesize answers” — for example, a competitor to Perplexity — the underlying index has to be one you actually control or license cheaply at scale. Brave is currently the most realistic option in that category.
Brave also exposes several specialized endpoints out of the box: /web/search, /news/search, /videos/search, /images/search, and a /suggest autocomplete endpoint. For an agent that needs different result types in different turns, having all of that under one key is genuinely convenient.
Brave Search API Free Tier: What You Actually Get
The free plan, called Data for Free, is the lowest-friction one of the three:
- 2,000 queries per month across the web search endpoint
- 1 query per second rate limit (this is the most-cited gotcha — you cannot fan out 10 parallel searches at once)
- No credit card required; signup adds a card only if you upgrade
- Access to web, news, video, image, and suggest endpoints on the free plan
- Goggles support — custom rerank rules to bias toward specific domains
The free tier returns snippets, not extracted page bodies. If you want extracted markdown content with the search result, that requires the Data for AI plan, which costs $5 per 1,000 queries and is the cheapest pure-search-plus-extraction price on the market.
The 1 query/second rate limit on the free tier is the single most important number to internalize. If your agent does parallel fan-out search (a common pattern in LangGraph workflows), you will hit 429s immediately. The simplest fix is a token-bucket wrapper around the client.
Getting Started with Brave Search
1. Get Your Free Key
- Go to api.search.brave.com and click Get Started Free
- Sign up with email; verify; choose the Data for Free plan
- Generate a subscription token from API Keys
2. curl First Call
curl -s "https://api.search.brave.com/res/v1/web/search?q=open+source+LLM+benchmarks&count=10" \
-H "Accept: application/json" \
-H "X-Subscription-Token: YOUR_TOKEN"
3. Python Client with Rate Limiting Built In
import os, time, requests
from collections import deque
class BraveSearch:
def __init__(self, token, rps=1):
self.token = token
self.min_interval = 1.0 / rps
self.calls = deque()
def _throttle(self):
now = time.time()
while self.calls and now - self.calls[0] > 1.0:
self.calls.popleft()
if self.calls and len(self.calls) >= 1:
time.sleep(self.min_interval - (now - self.calls[-1]))
self.calls.append(time.time())
def search(self, q, count=10, country="us"):
self._throttle()
r = requests.get(
"https://api.search.brave.com/res/v1/web/search",
headers={
"Accept": "application/json",
"X-Subscription-Token": self.token,
},
params={"q": q, "count": count, "country": country},
timeout=20,
)
r.raise_for_status()
return r.json()
brave = BraveSearch(os.environ["BRAVE_TOKEN"])
data = brave.search("free vector databases for RAG 2026")
for r in data["web"]["results"][:5]:
print(r["title"], "-", r["url"])
4. Use Brave with LangChain
from langchain_community.tools import BraveSearch
tool = BraveSearch.from_api_key(
api_key=os.environ["BRAVE_TOKEN"],
search_kwargs={"count": 5},
)
print(tool.run("Latest GPT-4.5 evaluations"))
What Is Exa?
Exa (formerly Metaphor Systems) is a semantic search engine built around dense vector embeddings rather than keyword inversion. Instead of matching the words in your query against words on pages, Exa converts your query and the entire indexed web into the same embedding space, then returns pages whose meaning is closest — even if they share zero surface vocabulary with the query.
This sounds like a niche distinction until you actually use it. Two examples that illustrate where Exa shines:
- “Articles by someone who used to work at OpenAI and now does longevity research” — a query with no good keywords to match on; Exa returns relevant blog posts; Google returns junk.
-
“Pages similar to this Anthropic safety post” — Exa has a dedicated
find_similarendpoint that returns semantically nearest pages to a URL you supply; the closest equivalent on Google is “site:” with a list you maintain yourself.
Exa is the right tool when your agent’s task is research, similarity discovery, or finding non-obvious sources. It is the wrong tool when you need the absolute newest news article from this morning, because the embedding index is updated continuously but not in real time.
Exa Free Tier: What You Actually Get
Exa structures its free path differently from the other two:
- $10 of free credit at signup, no credit card required
- Pricing: $5 per 1,000 searches for the basic
searchendpoint, $10 per 1,000 forsearch+contents - Effective free quota: ~1,000 search-only calls or ~500 search-plus-contents calls
- Once the $10 runs out, you must add a card to continue — there is no monthly refill
- Full feature access on the free credit: neural search, keyword search, find-similar, contents extraction, livecrawl, summaries
If you blow through $10 in a week of heavy experimentation, that signals either that the tool is genuinely valuable for your use case (in which case pay) or that you are using it wrong (search-plus-contents in a loop where you should be caching). Either way, the trial credit is enough to make a real go/no-go decision.
Getting Started with Exa
1. Sign Up and Grab Your Key
- Go to exa.ai and click Get API Key
- Sign in with Google or email; you land on the dashboard with $10 of credit visible
- Copy your key from API Keys
2. Neural Search with Contents Extraction
pip install exa-py
from exa_py import Exa
exa = Exa(api_key="YOUR_KEY")
result = exa.search_and_contents(
"research papers about retrieval-augmented generation evaluation",
type="neural",
num_results=5,
text={"max_characters": 2000}, # extracted, cleaned page text
)
for r in result.results:
print(r.title, "-", r.url)
print(r.text[:300])
print()
3. Find Similar Pages
# Given any URL, return semantically similar pages
similar = exa.find_similar_and_contents(
"https://www.anthropic.com/research/constitutional-ai-harmlessness-from-ai-feedback",
num_results=5,
text=True,
)
for r in similar.results:
print(r.url, "score:", round(r.score, 3))
4. curl Without the SDK
curl -s https://api.exa.ai/search \
-H "x-api-key: YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "papers comparing dense and sparse retrievers",
"numResults": 5,
"type": "neural",
"contents": {"text": {"maxCharacters": 1500}}
}'
Head-to-Head: Tavily vs Brave vs Exa
Quota Math for a Real Agent
The “quota per month” numbers look comparable until you do the math against a realistic agent loop. Say your agent does 5 searches per user interaction, and you have 20 daily active users:
- Daily searches: 20 users × 5 searches = 100 searches/day = 3,000/month
- Tavily free (1,000/mo): covers ~6 users/day, then hard stop
- Brave free (2,000/mo): covers ~13 users/day, plus the 1 req/sec ceiling caps your parallelism
- Exa $10 credit: ~1,000 search-only calls, gone in 10 days, then pay
For anything beyond hobby scale, you will hit a paid tier. The choice is then about which paid pricing makes sense for your access pattern — Brave’s $5/1,000 with extraction is the cheapest absolute, Tavily’s $30/4,000 includes LLM-tuned ranking, Exa’s $10/1,000 with content gives you semantic search nobody else offers.
Result Quality for LLM Consumption
This is what actually matters when an agent is the consumer. We are not optimizing for human eyeballs; we are optimizing for token efficiency and answer faithfulness downstream.
Tavily wins on this axis by design. The include_answer=True flag returns an LLM-generated summary that is already cleaned, deduplicated, and citation-tagged. The include_raw_content=True flag returns extracted page text without HTML, ads, or navigation — exactly what you would pipe into a system message for a downstream LLM call. You pay no extra credits for this.
Brave on the free tier returns search snippets only — typically the first ~150 characters of a result, with some metadata. To get clean page bodies you need either the paid Data for AI plan or a separate scraping step. For agents, this means an extra fetch hop per result.
Exa ties with Tavily on content extraction (contents.text returns cleaned page bodies) and uniquely offers contents.summary which runs an LLM over each result to compress it further. It uses credit faster but the output is the most LLM-ready of the three.
Latency
Measured from a single US datacenter, p50 round-trip on the basic search endpoint with no extraction:
- Brave: ~250-400 ms
- Tavily basic: ~400-700 ms
- Tavily advanced: ~1.5-3 s (deeper crawl)
-
Exa neural: ~400-800 ms; with
contents: ~1-2 s
For interactive agents, Brave is the fastest, Tavily basic and Exa neural are comparable, and Tavily advanced is in a different latency class — only worth it when answer quality justifies the wait.
Index Freshness
Brave wins on freshness — its independent crawler updates within hours for major news sources, and the /news/search endpoint is specifically optimized for recency.
Tavily inherits the freshness of its upstream Bing API plus its own curated crawl; typically within 1-6 hours for news. The topic="news" parameter biases toward recency.
Exa updates continuously but with an embedding step in between, so very recent content (last few hours) may not yet be in the neural index. The livecrawl="always" parameter forces a real-time crawl for the top hits, but costs more credit.
Working With Your Agent Framework
| Framework | Tavily | Brave | Exa |
|---|---|---|---|
| LangChain / LangGraph | Native (TavilySearchResults) |
Native (BraveSearch) |
Native (ExaSearchRetriever) |
| CrewAI | Native (TavilySearchTool) |
Via custom BaseTool
|
Native (EXASearchTool) |
| MCP servers | Official mcp-server-tavily
|
Community brave-search-mcp
|
Official exa-mcp-server
|
| OpenAI Assistants | Function calling wrapper | Function calling wrapper | Function calling wrapper |
| Anthropic Claude tool use | Tool definition snippet | Tool definition snippet | Tool definition snippet |
All three publish well-maintained MCP servers, which is the path that lets your AI assistant (Claude Desktop, Cursor, Cline, etc.) gain search powers without writing any code at all. For background on MCP, see our guide to the Model Context Protocol.
Which One Should You Use? A Decision Tree
Use this decision logic — in order — and you will land on the right tool roughly every time.
-
Is the consumer an LLM that needs clean, summarized text? → Tavily. The
include_answer+include_raw_contentdefaults are exactly what you want. - Do you need a high volume of cheap web searches with an independent index, and you are willing to do extraction yourself? → Brave. The $5/1,000 paid tier is unbeatable on raw search cost.
-
Are you doing research, similarity discovery, or finding non-obvious sources? → Exa. Neural search and
find_similarhave no real free-tier competitor. -
Do you need news in the last hour? → Brave (
/news/search) or Tavily (topic="news"); avoid Exa for breaking news. - Are you building a Perplexity-style product where the index is the moat? → Brave. The independent crawl matters at scale.
- Are you prototyping an agent and just want one search call that “works”? → Tavily. Easiest setup, cleanest output, biggest free monthly quota.
Combining All Three: The “Search Router” Pattern
For serious agent systems, a single search provider is a brittle dependency. A common pattern in 2026 is to wrap all three behind a single internal tool that routes by query type:
from tavily import TavilyClient
from exa_py import Exa
import requests, os
tavily = TavilyClient(os.environ["TAVILY_KEY"])
exa = Exa(os.environ["EXA_KEY"])
BRAVE_TOKEN = os.environ["BRAVE_TOKEN"]
def smart_search(query: str, intent: str = "general"):
"""Route to the best search provider for the intent.
intent: 'news' | 'research' | 'similar' | 'general'
"""
if intent == "news":
# Brave news endpoint, freshest index
r = requests.get(
"https://api.search.brave.com/res/v1/news/search",
headers={"X-Subscription-Token": BRAVE_TOKEN, "Accept": "application/json"},
params={"q": query, "count": 5},
timeout=15,
).json()
return [{"title": x["title"], "url": x["url"], "text": x.get("description", "")}
for x in r.get("results", [])]
if intent == "research":
# Exa neural search for semantic matching
res = exa.search_and_contents(query, type="neural", num_results=5,
text={"max_characters": 1500})
return [{"title": r.title, "url": r.url, "text": r.text} for r in res.results]
if intent == "similar":
# Exa find-similar (query should be a URL)
res = exa.find_similar_and_contents(query, num_results=5, text=True)
return [{"title": r.title, "url": r.url, "text": r.text} for r in res.results]
# default: Tavily for LLM-optimized general retrieval
res = tavily.search(query=query, search_depth="basic",
include_answer=True, max_results=5)
return [{"answer": res["answer"]}] + \
[{"title": r["title"], "url": r["url"], "text": r["content"]}
for r in res["results"]]
The agent’s reasoning step picks the intent based on its own plan, and the router transparently uses whichever provider is best — and whichever still has free quota left. Pair this with a 24-hour cache keyed by (query, intent) and your real search bill stays near zero for a long time.
Common Gotchas
Tavily: Watch the search_depth Default
The Python SDK defaults to search_depth="basic" (1 credit) but the LangChain wrapper has at times defaulted to advanced (2 credits). With a 1,000-credit free tier, this halves your usable quota if you do not notice. Always pass search_depth explicitly.
Brave: Parallel Fan-Out Will Get You 429’d
The free tier caps you at exactly 1 query per second. If your LangGraph workflow does asyncio.gather() over 5 sub-queries at once, four of them 429. Either wrap in a token bucket (see the Python client above) or upgrade to the paid plan, which lifts the limit to 20 queries/second.
Exa: type="auto" Costs More Than You Think
Exa auto-selects between neural and keyword search, and neural costs more credit. If you know your query is keyword-heavy (“Anthropic blog post May 2026”), force type="keyword" to save credit. Save type="neural" for queries that benefit from semantic matching.
All Three: Cache Aggressively
An agent that asks “what is the latest Llama model” twenty times in a debugging session burns 20 credits on the same answer. A trivial in-memory or SQLite cache keyed by the query string saves more credit than any other optimization you will do. The cache TTL should be 1 hour for general queries, 15 minutes for news, 24 hours for stable reference material.
Pairing Search With a Free LLM
None of these search APIs do anything on their own — they feed text to an LLM that produces the actual user-facing answer. The cheapest production stack we have seen in 2026 pairs:
- Search: Tavily free (1,000 monthly) for general retrieval + Brave free (2,000 monthly) for news fan-out
- LLM: Free tier from Groq (14,400 requests/day), Gemini (1M token context), or Together AI (Llama 3.3 70B free tier)
- Orchestration: CrewAI for multi-agent flows, LangGraph for stateful workflows, or a vanilla function-calling loop for simple cases
- Observability: Langfuse self-hosted or Hobby tier to trace every search call and LLM call
The total monthly bill at hobby scale, with all of the above: $0. The total at small production scale (a few hundred daily users): typically $30-80, almost all of it search-API overage above the free tiers.
Frequently Asked Questions
Can I use these search APIs for commercial products?
Yes — all three offer commercial use on every tier including the free one. Read each provider’s Terms of Service for redistribution restrictions (typically you cannot resell raw search results as a competing search engine, but you can use them in any agent or end-product feature).
What about SerpAPI / ScraperAPI / SearXNG?
SerpAPI is the long-standing Google-results scraper used by many older LangChain examples. It starts at $75/month with only a 100-search trial — fine for production, expensive for prototyping. ScraperAPI is similar. SearXNG is a self-hosted metasearch aggregator — free if you host it, but the throughput and stability depend on your hosting and on upstream search engines not rate-limiting your IP.
Does Google offer a free search API in 2026?
No public, generally-available one. Google Custom Search JSON API has a free tier of 100 queries/day, but it is limited to “site search” on a list of domains you specify in advance — it is not a general web search API. Google’s Vertex AI Search is enterprise-only.
Which one works best in MCP setups?
Tavily’s official mcp-server-tavily is the most polished and the one Anthropic uses in its example MCP configs. Exa’s exa-mcp-server is also official and adds the find_similar tool which is uniquely useful inside Claude Desktop. Brave has only community-maintained MCP servers but they work fine.
Can I use these inside an MCP server I build myself?
Yes — all three are just HTTP APIs. Wrap whichever one you prefer in an MCP tool definition and your assistant inherits web search capability. See our MCP explainer for the full server pattern.
How do I know I am hitting the free-tier ceiling?
Tavily and Exa both expose usage on their dashboards in near-real time. Brave shows usage on the dashboard with a 5-10 minute delay. All three return a structured error with the rate-limit headers (x-ratelimit-remaining, retry-after) on 429 responses — log those headers in your client so you can alert before you hit the cap rather than after.
Is there a single “best free search API”?
No, and any article that claims one is gaming a keyword. For LLM-consumed agent retrieval, Tavily is the cleanest default. For independent index and high volume, Brave wins. For semantic search and find-similar, Exa is the only real option. The “search router” pattern earlier in this guide is the answer when you cannot pick.
Bottom Line
The free-search-API market in 2026 has stabilized into three genuinely useful options, each with a clear specialty. Pick by access pattern, not by raw quota:
- Building an agent that needs search? Start with Tavily. The clean text output and the monthly 1,000-credit refresh make it the lowest-friction first integration.
- Need cheap volume? Add Brave. 2,000 free queries plus the cheapest paid tier in the market mean it is the natural second provider when Tavily runs out.
- Doing research or similarity work? Reach for Exa. Neural and find-similar are unique capabilities the other two simply do not offer.
Wire up the router pattern, cache aggressively, and you can run a production-grade agent with web search capability for $0/month at hobby scale and a predictable five to fifty dollars at small production scale. Combined with a free LLM tier from Groq or Gemini, that is a complete agent stack that costs nothing meaningful until you actually have users.
Related Reads
- CrewAI: Free Open-Source Multi-Agent AI Framework for Python — the most natural framework to pair with these search APIs
- MCP (Model Context Protocol): Connect AI Agents to Any Tool or API — how all three providers ship MCP servers
- CrewAI vs AutoGPT vs LangGraph: Which Free Agent Framework Should You Use in 2026? — choosing the orchestrator that consumes your search results
- Langfuse: Free Open-Source LLM Observability — trace every search call and LLM call your agent makes
- Groq API: The Fastest Free AI API in 2026 — the LLM half of the free agent stack
Originally published at toolfreebie.com.

Top comments (0)