Perplexica Review 2026: Open-Source AI Search With Ollama

#opensource #ai #selfhosted #linux

This article was originally published on aifoss.dev

TL;DR: Perplexica (now officially renamed Vane as of March 2026) is the most mature open-source Perplexity alternative, chaining SearXNG for web search with any Ollama-compatible LLM for cited answers — all on your hardware. Setup takes about 15 minutes via Docker Compose. Below 14B models, citation quality is noticeably shallow; at 14B+ it holds up for moderate research without sending a single query to a third party.

	Perplexica / Vane	Perplexity AI	SearXNG (standalone)
Best for	Privacy-conscious devs who want cited answers offline	Fast, reliable web answers with zero setup	Ultra-private meta-search with no AI layer
Setup complexity	Docker Compose, ~15 min	None — it's a website	Docker, ~5 min
Search + citation quality	Good at 14B+; mediocre at 7B	Consistently good; Pro is excellent	No AI — raw results only
Privacy	Full — queries stay local	Queries logged by Perplexity (US company)	Full — no AI, no external logging
Monthly cost	~$0–$10 electricity depending on GPU	Free (rate-limited) / $20/mo Pro	~$0–$2 electricity
The catch	SearXNG blocked by Cloudflare-heavy sites; no mobile app	Your data on their servers	No LLM, no summarization

Honest take: If you're a light-to-moderate Perplexity user and privacy matters more than response latency, run Perplexica with a 14B model. Heavy Perplexity Pro users will feel the drop in source freshness and coherence — the gap is real, especially for current events.

What Perplexica actually is

Perplexica is an open-source AI answering engine that gives you a Perplexity-like experience — type a question, get a cited answer with linked sources — without any data leaving your network.

The project launched in early 2024 under the name Perplexica and crossed 33,000 GitHub stars by early 2026. In March 2026, the maintainer (ItzCrazyKns) renamed it Vane — both to reduce confusion with the Perplexity brand and to reflect that the scope had grown beyond "Perplexity clone." The old GitHub URL redirects to github.com/ItzCrazyKns/Vane, the Docker images support both names, and every tutorial written before March 2026 still works. This review uses Perplexica because that is still what most people search for.

Current version: v1.12.2, released April 2026. The 1.12.x series added a Chromium-based scraper for better compatibility with JavaScript-heavy pages, timeout validation to prevent hung requests, and updated deep research mode with improved context management.

License: MIT. No AGPL complications — build on it, fork it, ship a commercial product with it.

How it works under the hood

The query pipeline:

Your question
  → Perplexica backend (query rewriting)
  → SearXNG (hits Google, Bing, DuckDuckGo, Brave simultaneously)
  → Top pages fetched and chunked
  → nomic-embed-text ranks chunks by semantic similarity to your query
  → LLM synthesizes a cited answer from the top-ranked chunks
  → Frontend renders response with inline numbered citations

The critical design decision is the ranking step. The LLM never sees raw, unfiltered search results — the similarity search filters out low-relevance content before it reaches the model's context window. This reduces hallucinations and keeps prompts focused. With smaller 7B models, the chunking still happens but the model struggles to correctly attribute which source said what when the citations are more than a sentence apart.

SearXNG is the privacy layer. It queries multiple search engines simultaneously without identifying you to any of them. Perplexica's Docker Compose file spins up its own SearXNG instance — you do not configure SearXNG manually.

For more on how this retrieval pipeline compares to full document-based RAG setups, see the RAG Architecture Deep Dive — Perplexica uses a simpler single-hop retrieval against live web results rather than a pre-indexed corpus.

Setup: Docker Compose in about 15 minutes

Prerequisites: Docker and Docker Compose. That's it.

git clone https://github.com/ItzCrazyKns/Vane.git perplexica
cd perplexica

cp sample.config.toml config.toml

Edit config.toml before starting anything:

[GENERAL]
PORT = 3001
SIMILARITY_MEASURE = "cosine"
KEEP_ALIVE = "5m"

[API_KEYS]
OPENAI = ""          # leave blank if using Ollama
GROQ   = ""          # optional — Groq's free tier is fast for testing

[API_ENDPOINTS]
SEARXNG = "http://searxng:4000"                    # internal Docker network
OLLAMA  = "http://host.docker.internal:11434"      # your local Ollama instance

Start the stack:

docker compose up -d

Three containers come up: the Perplexica frontend (port 3000), backend API (port 3001), and SearXNG (port 4000, internal only). First pull takes 2–5 minutes. Open http://localhost:3000 and you have a working Perplexity-like interface.

Linux Ollama gotcha: If Ollama is running as a systemd service, it binds to localhost by default. Docker containers can't reach localhost on the host — you need to tell Ollama to listen on all interfaces:

# Add this under [Service] in /etc/systemd/system/ollama.service
Environment="OLLAMA_HOST=0.0.0.0"

sudo systemctl daemon-reload && sudo systemctl restart ollama

macOS users with Docker Desktop get host.docker.internal resolved automatically and don't need this change.

Model selection and hardware requirements

After the stack starts, go to Settings → Models in the UI, select Ollama as the chat model provider, and pick a model you've already pulled.

Hardware	Model	Citation quality
8 GB RAM, CPU only	`mistral:7b` or `llama3.2:3b`	Shallow — usable for simple factual lookups
8 GB VRAM	`llama3.1:8b` or `qwen2.5:7b`	Reasonable for technical documentation
16 GB VRAM	`mistral-nemo:12b` or `qwen2.5:14b`	Solid — comparable to Perplexity free tier
24 GB VRAM	`qwen2.5:32b` or `deepseek-r1:14b`	Strong — approaches Perplexity Pro for research

Also pull the embedding model before you start:

ollama pull nomic-embed-text
ollama pull qwen2.5:14b   # or your chosen chat model

The 14B inflection point is genuine. Below 14B, models frequently lose track of which source supports which claim mid-answer, leading to citations that point to the wrong link. The problem isn't hallucination exactly — the content is usually in the sources — it's attribution failure. At 14B, that largely disappears.

If you don't own a GPU capable of running 14B+ models, RunPod lets you rent an RTX 4090 pod by the hour and run Ollama there, pointing Perplexica at the external endpoint.

For a full review of Ollama and model selection guidance, see the Ollama 2026 review.

Focus modes

Six modes are available from the search bar:

All (General Web) — Default. Routes through SearXNG against general web results. Works for open-ended factual queries, recent software changes, product comparisons, anything you'd normally search.

Academic — Prioritizes scholarly sources: arXiv, Semantic Scholar, PubMed. Useful for literature review and understanding research areas. Quality depends on the model — 7B models struggle to synthesize dense academic prose into coherent answers; 14B+ handles it cleanly.

YouTube — Searches YouTube and summarizes based on video titles, descriptions, and available transcripts. Works well when the top results have actual transcripts; gives thin summaries of titles when transcripts aren't available.

Reddit — Routes through Reddit discussions. Surprisingly effective for "what do real users think of X" questions — software frustrations, product failure modes, niche community knowledge that never makes it into official documentation.

Wolfram Alpha — Handles computational queries: unit conversions, math, scientific constants, date ar