This article was originally published on aifoss.dev
TL;DR: Open WebUI Pipelines is a Python middleware server that runs between Open WebUI and your LLM, adding custom logic without touching Open WebUI's source code. It deploys in one docker run command. Most users running Open WebUI never touch it — which means most users are missing the feature that turns a personal chat UI into a multi-user platform with web access and programmable behavior.
After this guide you'll have:
- A Pipelines server connected to Open WebUI v0.9.5, running alongside Ollama
- A working web search pipeline that pulls live results before your LLM responds
- A rate limit filter that caps requests per user for shared team deployments
What Pipelines Actually Is
Pipelines is not a plugin that loads inside Open WebUI. It's a separate server — a transparent OpenAI API proxy running on port 9099. Open WebUI talks to it exactly like it talks to Ollama or any OpenAI-compatible backend. Pipelines does whatever you've programmed it to do, then either returns a result directly or forwards the (possibly modified) request to your actual LLM.
That architecture matters because it means:
- Pipelines logic runs server-side, not in the browser
- Users can't bypass it by switching clients
- You can maintain state across requests (request counters, caches, memory)
There are two types of pipelines:
| Type | How it works | Use for |
|---|---|---|
| Filter | Wraps the request: inlet() → LLM → outlet()
|
Rate limiting, logging, system prompt injection, content filtering |
| Pipe | Replaces the LLM entirely; appears as a "model" in Open WebUI | Web search, custom RAG, wrapping non-OpenAI APIs |
A filter adds behavior around your existing model. A pipe is the model.
Prerequisites
- Open WebUI v0.9.5 running (see the Ollama + Open WebUI Linux setup guide)
- Docker installed on the same host
- Ollama running on port 11434 (default)
- Python 3.11 only if you want to develop pipelines locally — Docker handles it otherwise
Step 1: Deploy the Pipelines Server
docker run -d \
-p 9099:9099 \
--add-host=host.docker.internal:host-gateway \
-v pipelines:/app/pipelines \
--name pipelines \
--restart always \
ghcr.io/open-webui/pipelines:main
Flag breakdown:
-
-p 9099:9099— exposes the Pipelines API on your host -
--add-host=host.docker.internal:host-gateway— lets the container reach services on your host machine (Ollama, local APIs, SearXNG) -
-v pipelines:/app/pipelines— persists your pipeline Python files to a named Docker volume; they survive container restarts and updates
Confirm it's alive:
curl http://localhost:9099/
# Expected: {"detail":"Not Found"} ← means the server is responding
The default API key is 0p3n-w3bu!. It's public knowledge — fine for localhost, not fine for anything network-accessible. Override it by adding -e WEBUI_SECRET_KEY=your-actual-secret to the docker run command.
Step 2: Connect Pipelines to Open WebUI
- Open WebUI → Admin Panel → Settings → Connections
- Click + to add a new OpenAI-compatible connection
- API URL:
http://localhost:9099- If Open WebUI itself runs in Docker, use
http://host.docker.internal:9099instead
- If Open WebUI itself runs in Docker, use
- API key:
0p3n-w3bu!(or whatever you set) - Save, refresh the page
Pipe-type pipelines now appear in Open WebUI's model picker. Filter-type pipelines appear under Admin Panel → Pipelines where you assign them to specific models or all models.
Pipeline Example 1: Live Web Search
This pipe pipeline fetches search results and injects them as context before forwarding the query to your local Ollama model. The result: your LLM can answer questions about current events without any fine-tuning.
A word on DuckDuckGo: DDG's unofficial scraping API is the obvious free choice, but in 2026 it rate-limits hard — you hit 202 Ratelimit errors within a few queries from the same IP. It works for light personal use with delays between requests, but it's unreliable for a pipeline that runs on every message. The two practical alternatives are:
- Brave Search API — free tier, 2,000 queries/month, real JSON API
- SearXNG (self-hosted, zero cost, zero rate limits) — swap the API call and you're done
Create the file where Docker maps your pipelines volume. On a default install, find the path with:
docker inspect pipelines | grep -A5 Mounts
# Look for the "Source" path, typically /var/lib/docker/volumes/pipelines/_data
Save this as web_search_pipeline.py in that directory:
from typing import List, Optional
import requests
from pydantic import BaseModel
class Pipeline:
class Valves(BaseModel):
pipelines: List[str] = ["*"]
search_api_key: str = "" # Brave API key
searxng_url: str = "" # e.g. http://host.docker.internal:8080
num_results: int = 5
ollama_model: str = "llama3.2:3b"
def __init__(self):
self.name = "Web Search"
self.valves = self.Valves()
def _search_brave(self, query: str) -> str:
headers = {
"Accept": "application/json",
"X-Subscription-Token": self.valves.search_api_key,
}
r = requests.get(
"https://api.search.brave.com/res/v1/web/search",
params={"q": query, "count": self.valves.num_results},
headers=headers,
timeout=10,
)
results = r.json().get("web", {}).get("results", [])
return "\n\n".join(
f"**{res['title']}**\n{res['description']}\n{res['url']}"
for res in results
)
def _search_searxng(self, query: str) -> str:
r = requests.get(
f"{self.valves.searxng_url}/search",
params={"q": query, "format": "json", "results": self.valves.num_results},
timeout=10,
)
results = r.json().get("results", [])
return "\n\n".join(
f"**{res.get('title','')}**\n{res.get('content','')}\n{res.get('url','')}"
for res in results
)
async def pipe(
self,
user_message: str,
model_id: str,
messages: List[dict],
body: dict,
) -> str:
if self.valves.searxng_url:
context = self._search_searxng(user_message)
elif self.valves.search_api_key:
context = self._search_brave(user_message)
else:
return "Configure either searxng_url or search_api_key in Valves."
import openai
client = openai.OpenAI(
base_url="http://host.docker.internal:11434/v1",
api_key="ollama",
)
response = client.chat.completions.create(
model=self.valves.ollama_model,
messages=[
{
"role": "system",
"content": f"Answer using these current search results:\n\n{context}"
},
*messages,
],
)
return response.choices[0].message.content
After saving, go to Admin Panel → Pipelines and click Refresh. "Web Search" appears. Configure the Valves (Brave key or SearXNG URL) through the UI — changes apply immediately without restarting anything.
Pipeline Example 2: Per-User Rate Limiting
If more than one person uses your Open WebUI instance, you need rate limits. Without them, one heavy user can queue up requests that lock everyone else out. This filter tracks requests per user ID with a sliding window:
Save as rate_limit_filter.py:
python
from typing import List, Optional
from datetime import datetime, timedelta
from pydantic import BaseModel
class Pipeline:
class Valves(BaseModel):
pipelines: List[str] = ["*"]
priority: int = 0
requests_per_minute: Optional[int] = 10
requests_per_hour: Optional[int] = 100
def __init__(self):
self.name = "Rate Limit Filter"
self.type = "f
Top comments (0)