Ollama's Python library now includes native OLlama web search capabilities. With just a few lines of code, you can augment your local LLMs with real-time information from the web, reducing hallucinations and improving accuracy.
Getting Started
How do I install the Ollama Python library for web search? Install version 0.6.0 or higher using pip install 'ollama>=0.6.0'. This version includes the web_search and web_fetch functions.
pip install 'ollama>=0.6.0'
For managing Python environments and packages, consider using uv, a fast Python package manager, or set up a virtual environment using venv to keep your dependencies isolated.
Create an API key from your Ollama account and set it as an environment variable:
export OLLAMA_API_KEY="your_api_key"
On Windows PowerShell:
$env:OLLAMA_API_KEY = "your_api_key"
Basic Web Search
The simplest way to search the web with Ollama:
import ollama
# Simple web search
response = ollama.web_search("What is Ollama?")
print(response)
Output:
results = [
{
"title": "Ollama",
"url": "https://ollama.com/",
"content": "Cloud models are now available in Ollama..."
},
{
"title": "What is Ollama? Features, Pricing, and Use Cases",
"url": "https://www.walturn.com/insights/what-is-ollama",
"content": "Our services..."
},
{
"title": "Complete Ollama Guide: Installation, Usage & Code Examples",
"url": "https://collabnix.com/complete-ollama-guide",
"content": "Join our Discord Server..."
}
]
Controlling Result Count
import ollama
# Get more results
response = ollama.web_search("latest AI news", max_results=10)
for result in response.results:
print(f"📌 {result.title}")
print(f" {result.url}")
print(f" {result.content[:100]}...")
print()
Fetching Full Page Content
What is web_search vs web_fetch in Ollama Python? web_search queries the internet and returns multiple search results with titles, URLs, and snippets. web_fetch retrieves the full content of a specific URL, returning the page title, markdown content, and links. The markdown content returned by web_fetch is perfect for further processing—if you need to convert HTML to markdown in other contexts, see our guide on converting HTML to Markdown with Python.
from ollama import web_fetch
result = web_fetch('https://ollama.com')
print(result)
Output:
WebFetchResponse(
title='Ollama',
content='[Cloud models](https://ollama.com/blog/cloud-models) are now available in Ollama\n\n**Chat & build with open models**\n\n[Download](https://ollama.com/download) [Explore models](https://ollama.com/models)\n\nAvailable for macOS, Windows, and Linux',
links=['https://ollama.com/', 'https://ollama.com/models', 'https://github.com/ollama/ollama']
)
Combining Search and Fetch
A common pattern is to search first, then fetch full content from relevant results:
from ollama import web_search, web_fetch
# Search for information
search_results = web_search("Ollama new features 2025")
# Fetch full content from the first result
if search_results.results:
first_url = search_results.results[0].url
full_content = web_fetch(first_url)
print(f"Title: {full_content.title}")
print(f"Content: {full_content.content[:500]}...")
print(f"Links found: {len(full_content.links)}")
Building a Search Agent
Which Python models work best for Ollama search agents? Models with strong tool-use capabilities work best, including qwen3, gpt-oss, and cloud models like qwen3:480b-cloud and deepseek-v3.1-cloud. For more advanced use cases requiring structured outputs from these models, check out our guide on LLMs with Structured Output using Ollama and Qwen3.
First, pull a capable model:
ollama pull qwen3:4b
Simple Search Agent
Here's a basic search agent that can autonomously decide when to search:
from ollama import chat, web_fetch, web_search
available_tools = {'web_search': web_search, 'web_fetch': web_fetch}
messages = [{'role': 'user', 'content': "what is ollama's new engine"}]
while True:
response = chat(
model='qwen3:4b',
messages=messages,
tools=[web_search, web_fetch],
think=True
)
if response.message.thinking:
print('🧠 Thinking:', response.message.thinking[:200], '...')
if response.message.content:
print('💬 Response:', response.message.content)
messages.append(response.message)
if response.message.tool_calls:
print('🔧 Tool calls:', response.message.tool_calls)
for tool_call in response.message.tool_calls:
function_to_call = available_tools.get(tool_call.function.name)
if function_to_call:
args = tool_call.function.arguments
result = function_to_call(**args)
print('📥 Result:', str(result)[:200], '...')
# Truncate result for limited context lengths
messages.append({
'role': 'tool',
'content': str(result)[:2000 * 4],
'tool_name': tool_call.function.name
})
else:
messages.append({
'role': 'tool',
'content': f'Tool {tool_call.function.name} not found',
'tool_name': tool_call.function.name
})
else:
break
How do I handle large web search results in Python? Truncate results to fit context limits. The recommended approach is to slice the result string to approximately 8000 characters (2000 tokens × 4 chars) before passing to the model.
Advanced Search Agent with Error Handling
Here's an enhanced version with better error handling:
from ollama import chat, web_fetch, web_search
import json
class SearchAgent:
def __init__(self, model: str = 'qwen3:4b'):
self.model = model
self.tools = {'web_search': web_search, 'web_fetch': web_fetch}
self.messages = []
self.max_iterations = 10
def query(self, question: str) -> str:
self.messages = [{'role': 'user', 'content': question}]
for iteration in range(self.max_iterations):
try:
response = chat(
model=self.model,
messages=self.messages,
tools=[web_search, web_fetch],
think=True
)
except Exception as e:
return f"Error during chat: {e}"
self.messages.append(response.message)
# If no tool calls, we have a final answer
if not response.message.tool_calls:
return response.message.content or "No response generated"
# Execute tool calls
for tool_call in response.message.tool_calls:
result = self._execute_tool(tool_call)
self.messages.append({
'role': 'tool',
'content': result,
'tool_name': tool_call.function.name
})
return "Max iterations reached without final answer"
def _execute_tool(self, tool_call) -> str:
func_name = tool_call.function.name
args = tool_call.function.arguments
if func_name not in self.tools:
return f"Unknown tool: {func_name}"
try:
result = self.tools[func_name](**args)
# Truncate for context limits
result_str = str(result)
if len(result_str) > 8000:
result_str = result_str[:8000] + "... [truncated]"
return result_str
except Exception as e:
return f"Tool error: {e}"
# Usage
agent = SearchAgent(model='qwen3:4b')
answer = agent.query("What are the latest features in Ollama?")
print(answer)
Async Web Search
Can I use Ollama Python web search with async code? Yes, the Ollama Python library supports async operations. Use AsyncClient for non-blocking web search and fetch operations in async applications. For performance comparisons between Python and other languages in serverless contexts, see our analysis of AWS Lambda performance across JavaScript, Python, and Golang.
import asyncio
from ollama import AsyncClient
async def async_search():
client = AsyncClient()
# Perform multiple searches concurrently
tasks = [
client.web_search("Ollama features"),
client.web_search("local LLM tools"),
client.web_search("AI search agents"),
]
results = await asyncio.gather(*tasks)
for i, result in enumerate(results):
print(f"Search {i + 1}:")
for r in result.results[:2]:
print(f" - {r.title}")
print()
# Run async search
asyncio.run(async_search())
Async Search Agent
import asyncio
from ollama import AsyncClient
async def async_research_agent(question: str):
client = AsyncClient()
messages = [{'role': 'user', 'content': question}]
while True:
response = await client.chat(
model='qwen3:4b',
messages=messages,
tools=[client.web_search, client.web_fetch],
)
messages.append(response.message)
if not response.message.tool_calls:
return response.message.content
# Execute tool calls concurrently
tool_tasks = []
for tool_call in response.message.tool_calls:
if tool_call.function.name == 'web_search':
task = client.web_search(**tool_call.function.arguments)
elif tool_call.function.name == 'web_fetch':
task = client.web_fetch(**tool_call.function.arguments)
else:
continue
tool_tasks.append((tool_call.function.name, task))
# Gather results
for tool_name, task in tool_tasks:
result = await task
messages.append({
'role': 'tool',
'content': str(result)[:8000],
'tool_name': tool_name
})
# Run
answer = asyncio.run(async_research_agent("What's new in Python 3.13?"))
print(answer)
Context Length and Performance
What context length should I set for Python search agents? Set context length to approximately 32000 tokens for reasonable performance. Search agents work best with full context length since web_search and web_fetch can return thousands of tokens.
from ollama import chat, web_search
# Set higher context for search-heavy tasks
response = chat(
model='qwen3:4b',
messages=[{'role': 'user', 'content': 'Research the latest AI developments'}],
tools=[web_search],
options={
'num_ctx': 32768, # 32K context
}
)
MCP Server Integration
Ollama provides a Python MCP server that enables web search in any MCP client. For a comprehensive guide on building MCP servers in Python with web search and scraping capabilities, see our detailed tutorial on Building MCP Servers in Python.
Cline Integration
Configure MCP servers in Cline settings:
Manage MCP Servers → Configure MCP Servers → Add:
{
"mcpServers": {
"web_search_and_fetch": {
"type": "stdio",
"command": "uv",
"args": ["run", "path/to/web-search-mcp.py"],
"env": { "OLLAMA_API_KEY": "your_api_key_here" }
}
}
}
Codex Integration
Add to ~/.codex/config.toml:
[mcp_servers.web_search]
command = "uv"
args = ["run", "path/to/web-search-mcp.py"]
env = { "OLLAMA_API_KEY" = "your_api_key_here" }
Creating Your Own MCP Server
#!/usr/bin/env python3
"""Simple MCP server for Ollama web search."""
import os
from mcp.server import Server
from mcp.types import Tool, TextContent
from ollama import web_search, web_fetch
app = Server("ollama-web-search")
@app.tool()
async def search_web(query: str, max_results: int = 5) -> str:
"""Search the web for information."""
results = web_search(query, max_results=max_results)
output = []
for r in results.results:
output.append(f"**{r.title}**\n{r.url}\n{r.content}\n")
return "\n---\n".join(output)
@app.tool()
async def fetch_page(url: str) -> str:
"""Fetch the full content of a web page."""
result = web_fetch(url)
return f"# {result.title}\n\n{result.content}"
if __name__ == "__main__":
app.run()
Practical Examples
These examples demonstrate real-world applications of Ollama's web search API. You can extend these patterns to build more complex systems—for instance, combining search results with PDF generation in Python to create research reports.
News Summarizer
from ollama import chat, web_search
def summarize_news(topic: str) -> str:
# Search for recent news
results = web_search(f"{topic} latest news", max_results=5)
# Format search results for the model
news_content = "\n\n".join([
f"**{r.title}**\n{r.content}"
for r in results.results
])
# Ask the model to summarize
response = chat(
model='qwen3:4b',
messages=[{
'role': 'user',
'content': f"Summarize these news items about {topic}:\n\n{news_content}"
}]
)
return response.message.content
summary = summarize_news("artificial intelligence")
print(summary)
Research Assistant
from ollama import chat, web_search, web_fetch
from dataclasses import dataclass
@dataclass
class ResearchResult:
question: str
sources: list
answer: str
def research(question: str) -> ResearchResult:
# Search for relevant information
search_results = web_search(question, max_results=3)
# Fetch full content from top sources
sources = []
full_content = []
for result in search_results.results[:3]:
try:
page = web_fetch(result.url)
sources.append(result.url)
full_content.append(f"Source: {result.url}\n{page.content[:2000]}")
except:
continue
# Generate comprehensive answer
context = "\n\n---\n\n".join(full_content)
response = chat(
model='qwen3:4b',
messages=[{
'role': 'user',
'content': f"""Based on the following sources, answer this question: {question}
Sources:
{context}
Provide a comprehensive answer with citations to the sources."""
}]
)
return ResearchResult(
question=question,
sources=sources,
answer=response.message.content
)
# Usage
result = research("How does Ollama's new model scheduling work?")
print(f"Question: {result.question}")
print(f"Sources: {result.sources}")
print(f"Answer: {result.answer}")
Recommended Models
| Model | Parameters | Best For |
|---|---|---|
qwen3:4b |
4B | Quick local searches |
qwen3 |
8B | General purpose agent |
gpt-oss |
Various | Research tasks |
qwen3:480b-cloud |
480B | Complex reasoning (cloud) |
gpt-oss:120b-cloud |
120B | Long-form research (cloud) |
deepseek-v3.1-cloud |
- | Advanced analysis (cloud) |
Best Practices
- Truncate Results: Always truncate web results to fit context limits (~8000 chars)
- Error Handling: Wrap tool calls in try/except for network failures
- Rate Limiting: Respect Ollama's API rate limits for web search
- Context Length: Use ~32000 tokens for search agents
- Async for Scale: Use AsyncClient for concurrent operations
- Testing: Write unit tests for your search agents to ensure reliability
- Python Basics: Keep a Python cheatsheet handy for quick reference on syntax and common patterns
Useful Links
- Ollama Web Search Blog Post
- Ollama Python Library
- Ollama Official Documentation
- Ollama GitHub Repository
- Python Cheatsheet
- Converting HTML to Markdown with Python: A Comprehensive Guide
- Building MCP Servers in Python: WebSearch & Scrape
- LLMs with Structured Output: Ollama, Qwen3 & Python or Go
- Unit Testing in Python
- AWS lambda performance: JavaScript vs Python vs Golang
- venv Cheatsheet
- uv - Python Package, Project, and Environment Manager
- Generating PDF in Python - Libraries and examples
Top comments (1)
Great article!