DEV Community

Jovan Chan
Jovan Chan

Posted on • Originally published at aifoss.dev

Open WebUI Pipelines Guide 2026: Web Search, Rate Limiting, and Custom Logic for Your Local LLM

This article was originally published on aifoss.dev

TL;DR: Open WebUI Pipelines is a Python middleware server that runs between Open WebUI and your LLM, adding custom logic without touching Open WebUI's source code. It deploys in one docker run command. Most users running Open WebUI never touch it — which means most users are missing the feature that turns a personal chat UI into a multi-user platform with web access and programmable behavior.

After this guide you'll have:

  • A Pipelines server connected to Open WebUI v0.9.5, running alongside Ollama
  • A working web search pipeline that pulls live results before your LLM responds
  • A rate limit filter that caps requests per user for shared team deployments

What Pipelines Actually Is

Pipelines is not a plugin that loads inside Open WebUI. It's a separate server — a transparent OpenAI API proxy running on port 9099. Open WebUI talks to it exactly like it talks to Ollama or any OpenAI-compatible backend. Pipelines does whatever you've programmed it to do, then either returns a result directly or forwards the (possibly modified) request to your actual LLM.

That architecture matters because it means:

  • Pipelines logic runs server-side, not in the browser
  • Users can't bypass it by switching clients
  • You can maintain state across requests (request counters, caches, memory)

There are two types of pipelines:

Type How it works Use for
Filter Wraps the request: inlet() → LLM → outlet() Rate limiting, logging, system prompt injection, content filtering
Pipe Replaces the LLM entirely; appears as a "model" in Open WebUI Web search, custom RAG, wrapping non-OpenAI APIs

A filter adds behavior around your existing model. A pipe is the model.

Prerequisites

  • Open WebUI v0.9.5 running (see the Ollama + Open WebUI Linux setup guide)
  • Docker installed on the same host
  • Ollama running on port 11434 (default)
  • Python 3.11 only if you want to develop pipelines locally — Docker handles it otherwise

Step 1: Deploy the Pipelines Server

docker run -d \
  -p 9099:9099 \
  --add-host=host.docker.internal:host-gateway \
  -v pipelines:/app/pipelines \
  --name pipelines \
  --restart always \
  ghcr.io/open-webui/pipelines:main
Enter fullscreen mode Exit fullscreen mode

Flag breakdown:

  • -p 9099:9099 — exposes the Pipelines API on your host
  • --add-host=host.docker.internal:host-gateway — lets the container reach services on your host machine (Ollama, local APIs, SearXNG)
  • -v pipelines:/app/pipelines — persists your pipeline Python files to a named Docker volume; they survive container restarts and updates

Confirm it's alive:

curl http://localhost:9099/
# Expected: {"detail":"Not Found"}  ← means the server is responding
Enter fullscreen mode Exit fullscreen mode

The default API key is 0p3n-w3bu!. It's public knowledge — fine for localhost, not fine for anything network-accessible. Override it by adding -e WEBUI_SECRET_KEY=your-actual-secret to the docker run command.

Step 2: Connect Pipelines to Open WebUI

  1. Open WebUI → Admin PanelSettingsConnections
  2. Click + to add a new OpenAI-compatible connection
  3. API URL: http://localhost:9099
    • If Open WebUI itself runs in Docker, use http://host.docker.internal:9099 instead
  4. API key: 0p3n-w3bu! (or whatever you set)
  5. Save, refresh the page

Pipe-type pipelines now appear in Open WebUI's model picker. Filter-type pipelines appear under Admin Panel → Pipelines where you assign them to specific models or all models.

Pipeline Example 1: Live Web Search

This pipe pipeline fetches search results and injects them as context before forwarding the query to your local Ollama model. The result: your LLM can answer questions about current events without any fine-tuning.

A word on DuckDuckGo: DDG's unofficial scraping API is the obvious free choice, but in 2026 it rate-limits hard — you hit 202 Ratelimit errors within a few queries from the same IP. It works for light personal use with delays between requests, but it's unreliable for a pipeline that runs on every message. The two practical alternatives are:

  • Brave Search API — free tier, 2,000 queries/month, real JSON API
  • SearXNG (self-hosted, zero cost, zero rate limits) — swap the API call and you're done

Create the file where Docker maps your pipelines volume. On a default install, find the path with:

docker inspect pipelines | grep -A5 Mounts
# Look for the "Source" path, typically /var/lib/docker/volumes/pipelines/_data
Enter fullscreen mode Exit fullscreen mode

Save this as web_search_pipeline.py in that directory:

from typing import List, Optional
import requests
from pydantic import BaseModel

class Pipeline:
    class Valves(BaseModel):
        pipelines: List[str] = ["*"]
        search_api_key: str = ""       # Brave API key
        searxng_url: str = ""          # e.g. http://host.docker.internal:8080
        num_results: int = 5
        ollama_model: str = "llama3.2:3b"

    def __init__(self):
        self.name = "Web Search"
        self.valves = self.Valves()

    def _search_brave(self, query: str) -> str:
        headers = {
            "Accept": "application/json",
            "X-Subscription-Token": self.valves.search_api_key,
        }
        r = requests.get(
            "https://api.search.brave.com/res/v1/web/search",
            params={"q": query, "count": self.valves.num_results},
            headers=headers,
            timeout=10,
        )
        results = r.json().get("web", {}).get("results", [])
        return "\n\n".join(
            f"**{res['title']}**\n{res['description']}\n{res['url']}"
            for res in results
        )

    def _search_searxng(self, query: str) -> str:
        r = requests.get(
            f"{self.valves.searxng_url}/search",
            params={"q": query, "format": "json", "results": self.valves.num_results},
            timeout=10,
        )
        results = r.json().get("results", [])
        return "\n\n".join(
            f"**{res.get('title','')}**\n{res.get('content','')}\n{res.get('url','')}"
            for res in results
        )

    async def pipe(
        self,
        user_message: str,
        model_id: str,
        messages: List[dict],
        body: dict,
    ) -> str:
        if self.valves.searxng_url:
            context = self._search_searxng(user_message)
        elif self.valves.search_api_key:
            context = self._search_brave(user_message)
        else:
            return "Configure either searxng_url or search_api_key in Valves."

        import openai
        client = openai.OpenAI(
            base_url="http://host.docker.internal:11434/v1",
            api_key="ollama",
        )
        response = client.chat.completions.create(
            model=self.valves.ollama_model,
            messages=[
                {
                    "role": "system",
                    "content": f"Answer using these current search results:\n\n{context}"
                },
                *messages,
            ],
        )
        return response.choices[0].message.content
Enter fullscreen mode Exit fullscreen mode

After saving, go to Admin Panel → Pipelines and click Refresh. "Web Search" appears. Configure the Valves (Brave key or SearXNG URL) through the UI — changes apply immediately without restarting anything.

Pipeline Example 2: Per-User Rate Limiting

If more than one person uses your Open WebUI instance, you need rate limits. Without them, one heavy user can queue up requests that lock everyone else out. This filter tracks requests per user ID with a sliding window:

Save as rate_limit_filter.py:


python
from typing import List, Optional
from datetime import datetime, timedelta
from pydantic import BaseModel

class Pipeline:
    class Valves(BaseModel):
        pipelines: List[str] = ["*"]
        priority: int = 0
        requests_per_minute: Optional[int] = 10
        requests_per_hour: Optional[int] = 100

    def __init__(self):
        self.name = "Rate Limit Filter"
        self.type = "f
Enter fullscreen mode Exit fullscreen mode

Top comments (0)