Martin Simons

Posted on Mar 20

The Only Honest Guide to Web Search in Ollama

Every other tutorial gets this wrong. Here's what actually works.

I wanted to add web search to my local Ollama setup. I found a Medium article
that looked promising. It used DuckDuckGo's free API. I followed it exactly.

It returned empty results. Every time. For every query.

So I spent two days testing every approach on a real Debian 13 server with
Ollama 0.18.2. This is what I found.

First: The DuckDuckGo Lie

Every tutorial recommends this:

response = requests.get(
    "https://api.duckduckgo.com/",
    params={"q": query, "format": "json"}
)

Here's what it actually returns:

{
  "Abstract": "",
  "AbstractText": "",
  "RelatedTopics": [],
  "Results": []
}

Empty. I verified it with curl:

curl "https://api.duckduckgo.com/?q=ollama+latest+version&format=json&no_html=1"

The meta section reveals why — it's a test endpoint in "development" state,
marked offline. DuckDuckGo's free API is not a real search API.

What Actually Works

After testing every option, here's the honest comparison:

Backend	API Key	Real Results	Private	Cost
DuckDuckGo free API	❌	❌ Empty	✅	Free
SearXNG self-hosted	❌	✅	✅ Full	Free
Google Custom Search	✅	✅ Excellent	❌	100/day free
Bing Search API	✅	✅	❌	1000/month free
OpenClaw built-in	❌	✅	⚠️	Free

Winner: SearXNG. Self-hosted, private, no API key, real results from
multiple engines simultaneously.

The Timeout Problem Nobody Mentions

The second mistake most tutorials make: no streaming.

# This will timeout on any real query
response = requests.post(OLLAMA_API, json=payload, timeout=60)

Ollama needs time to think, search, and generate. 60 seconds isn't enough.
The solution is streaming — you start receiving output immediately:

with requests.post(OLLAMA_API, json=payload, stream=True) as response:
    for line in response.iter_lines():
        chunk = json.loads(line)
        content = chunk.get("message", {}).get("content", "")
        if content:
            print(content, end="", flush=True)

No timeout. Real-time output. Works every time.

Which Model Should You Use?

Not all Ollama models support tool calling. Tested on Ollama 0.18.2:

Model	Tool Calling	Speed on CPU
`llama3.1:8B`	✅ Reliable	~2-5 min
`qwen3:8B`	✅ Excellent	⚠️ 16+ min*
`mistral-nemo:12b`	⚠️ Moderate	~3-7 min
`deepseek-r1:14b`	⚠️ Limited	Very slow

qwen3 enables extended reasoning (thinking mode) by default. On a CPU-only
server this caused a **16-minute hang* for a simple web search query, consuming
21GB RAM. On a GPU it's the best choice. On CPU, use llama3.1.

Setting Up SearXNG

# Start SearXNG
docker run -d \
  --name searxng \
  --restart always \
  -p 8080:8080 \
  -e BASE_URL=http://localhost:8080 \
  searxng/searxng:latest

SearXNG needs JSON format enabled — it's off by default:

docker cp searxng:/etc/searxng/settings.yml ./settings.yml

Add to the search: section:

search:
  formats:
    - html
    - json

server:
  limiter: false

Restart with config mounted:

docker stop searxng && docker rm searxng
docker run -d --name searxng --restart always \
  -p 8080:8080 \
  -e BASE_URL=http://localhost:8080 \
  -v $(pwd)/settings.yml:/etc/searxng/settings.yml \
  searxng/searxng:latest

Verify:

curl "http://localhost:8080/search?q=ollama&format=json" | jq .results[0].title

The Working Python Example

import json, requests, sys

OLLAMA_API = "http://localhost:11434/api/chat"
SEARXNG_URL = "http://localhost:8080/search"
MODEL = "llama3.1:latest"

def web_search(query: str) -> str:
    response = requests.get(SEARXNG_URL,
        params={"q": query, "format": "json", "language": "en"},
        timeout=10)
    results = response.json().get("results", [])[:5]
    return "\n\n".join(
        f"[{i+1}] {r['title']}\n    {r.get('content', '')}"
        for i, r in enumerate(results)
    ) or "No results found."

def stream_response(messages, tools=None):
    payload = {"model": MODEL, "messages": messages, "stream": True}
    if tools: payload["tools"] = tools
    tool_calls = []
    with requests.post(OLLAMA_API, json=payload, stream=True, timeout=30) as r:
        for line in r.iter_lines():
            if not line: continue
            chunk = json.loads(line)
            msg = chunk.get("message", {})
            if msg.get("tool_calls"): tool_calls.extend(msg["tool_calls"])
            if msg.get("content"): print(msg["content"], end="", flush=True)
            if chunk.get("done"): break
    return tool_calls

def chat(question: str):
    tools = [{"type": "function", "function": {
        "name": "web_search",
        "description": "Search the web for current information",
        "parameters": {"type": "object",
            "properties": {"query": {"type": "string"}},
            "required": ["query"]}
    }}]
    messages = [{"role": "user", "content": question}]
    print(f"\n❓ {question}\n🤔
    tool_calls = stream_response(messages, tools)
    if not tool_calls: return
    messages.append({"role": "assistant", "content": "", "tool_calls": tool_calls})
    for tc in tool_calls:
        query = tc["function"]["arguments"].get("query", "")
        print(f"\n\n🔍
        messages.append({"role": "tool", "content": web_search(query)})
    print("\n💬
    stream_response(messages)

chat(sys.argv[1] if len(sys.argv) > 1 else "What is the latest version of Ollama?")

Run it:

python3 websearch_searxng.py "What is the weather in Nijmegen today?"

Output:

❓ What is the weather in Nijmegen today?
🤔
🔍
📄
💬
According to AccuWeather, it will be cloudy with a chance of rain today...

It works. Unlike every other tutorial.

What About Ollama's Built-in Web Search?

Ollama 0.18 introduced OpenClaw integration with built-in web search.
The reality is more nuanced than the release notes suggest:

Web search lives inside OpenClaw — a coding assistant, not a standalone feature
Requires Node.js 22.12+ (not documented)
Must install openclaw under nvm's npm, not system npm
Requires psmisc package for the --force flag (sudo apt install psmisc)
Gateway must be started before ollama launch
Not recommended for CPU-only servers — session auto-resume causes runaway inference

For developers who want web search in their own applications, the API +
SearXNG approach gives full control. OpenClaw is a UI tool, not a programmable API.

The Full Code

All examples — Python, Node.js, and shell/curl — are on GitHub:
github.com/Webhuis/ollama-websearch

Includes:

SearXNG setup with Docker
Google Custom Search API integration
Bing Search API integration
Streaming responses in all three languages
Production tips for multi-user servers
systemd service for SearXNG auto-start

Tested on a real Debian 13 server. No VMs, no Docker-in-Docker, no assumptions.
If something doesn't work, open an issue.

Update: Ollama's Native Web Search API

Since this guide was written, Ollama has added a native web search API
available via their cloud service:

from ollama import Ollama
client = Ollama()
results = client.webSearch("what is ollama?")

Key differences from our SearXNG approach:

	Ollama Cloud Search	Our SearXNG Approach
API Key	✅ Required	❌ Not needed
Privacy	❌ Cloud	✅ Fully local
Cost	Paid tier	Free
Control	❌ Black box	✅ Full control
Works offline	❌ No	✅ Yes

For privacy-first, offline, or production deployments — SearXNG remains
the better choice.

DEV Community

The Only Honest Guide to Web Search in Ollama

The Only Honest Guide to Web Search in Ollama

First: The DuckDuckGo Lie

What Actually Works

The Timeout Problem Nobody Mentions

Which Model Should You Use?

Setting Up SearXNG

The Working Python Example

What About Ollama's Built-in Web Search?

The Full Code

Update: Ollama's Native Web Search API

Top comments (0)