James

Posted on May 13

How to Build a Privacy-First Search Engine in 2026

#privacy #search #python #dsgvo

How to Build a Privacy-First Search Engine in 2026

Most search engines today operate on a simple business model: you are the product. Every query you type, every result you click, every page you visit — it all feeds into a profile that gets sold to advertisers.

I spent six months building an alternative. Here's what I learned.

The Problem: Privacy Is an Afterthought

Google processes 8.5 billion searches per day. DuckDuckGo is better but still routes queries through Microsoft's infrastructure. Startpage proxies Google results. Brave Search exists but its index is limited.

None of these give you control over:

Where your data is stored
Who processes your queries
Whether your search history is retained
What jurisdiction governs your data

For European businesses — especially in Germany, where DSGVO compliance isn't optional — this matters.

Architecture: What You Actually Need

A privacy-first search engine has three components:

1. The Index

You don't need to crawl the entire web. You need federated search across multiple sources:

Bing Web Search API (Microsoft's commercial index)
Brave Search API (privacy-focused index)
SearXNG instances (community-maintained meta-search)
Your own focused crawlers for niche domains

The key insight: aggregation beats ownership. A meta-search layer that queries 20+ sources and ranks results locally gives you better coverage than any single index.

2. The Query Processor

This is where privacy lives. The query processor:

Receives user queries
Dispatches them to index sources in parallel
Aggregates, deduplicates, and ranks results
Never stores the query text
Returns results via encrypted connection

We built ours in Python with FastAPI. Each query gets a UUID for session correlation, but the query text itself is discarded after 30 seconds.

3. The Frontend

The frontend is just a search box and results. No tracking pixels. No third-party analytics. No cookies for "personalization."

We use Next.js with server-side rendering. The server makes the API calls, so the user's IP never touches the index sources directly.

Technical Decisions That Matter

Rate Limiting and Politeness

If you're querying multiple search APIs, you will hit rate limits. We implemented:

Token bucket rate limiting per source
Exponential backoff on 429 responses
Request deduplication (same query within 60s = cached)
Automatic failover to secondary sources

Result Ranking

Meta-search ranking is harder than single-index ranking because scores aren't comparable across sources. We use a weighted combination:

Source authority weight (Bing = 1.0, niche crawler = 0.3)
Position decay (result #1 at source A vs. #5 at source B)
Content freshness (recency boost for time-sensitive queries)
Click-through rate from our own anonymized logs

DSGVO Compliance by Design

Three principles:

Data minimization: We store only what's necessary (search UUID, timestamp, result count — no query text)
Purpose limitation: Data is used only for service improvement, never profiling
Storage limitation: Logs rotate after 7 days; analytics are aggregate-only

What It Cost

Component	Cost
VPS (Hetzner, 4 vCPU / 8GB)	€15/month
Bing Search API (1,000 queries/day)	~$7/month
Brave Search API (10,000 queries/month)	Free tier
Domain + SSL	€12/year
Total	~€25/month

This is not a $50K infrastructure project. It's a focused weekend build that scales to thousands of users.

Launch and First Users

We launched asearchz.online in March 2026 with zero marketing budget. First users came from:

Dev.to technical articles
Reddit r/privacy and r/selfhosted
Hacker News "Show HN" post
GitHub README backlinks

Current stats: ~200 daily active users, 65% from Germany and EU. Not viral, but the right users.

Lessons Learned

Privacy is a feature, not a product. Users don't switch search engines for privacy alone. They switch because the tool is better AND private.
Speed matters more than comprehensiveness. A search that returns 10 excellent results in 300ms beats one that returns 100 mediocre results in 2s.
DSGVO compliance is a competitive advantage. German businesses actively ask about data residency and processing agreements. Having a clear privacy policy and technical architecture gives you an edge over US competitors.
Open source builds trust. We publish our architecture and API patterns. Users can verify our claims instead of trusting them.

What's Next

We're building:

AI summarization of search results (local LLM, no data sent to OpenAI)
Enterprise self-hosted version for companies that want full control
German-language optimization (better stemming, compound word handling)

If you're building something similar — or want to — I'm happy to share more specifics. Drop a comment or find me at grahammiranda.com.

Graham Miranda is the founder of Graham Miranda UG (Berlin, HRB 36794), building privacy-first automation tools and search infrastructure.

Top comments (1)

Bhavin Sheth • May 15

Really liked the “aggregation beats ownership” point. I built a few privacy-focused tools too, and users definitely care more about speed + simplicity first, then privacy becomes the reason they stay.