Just testing out my automated dev-log pipeline for SearchWala. Moving from Python to Rust dropped my RAM from 512 MB → 38 MB and made cold starts nearly instant. Here's the short version of why and how.
The Problem
SearchWala aggregates results from 90+ search engines — Google, Bing, DuckDuckGo, Brave, Mojeek, and dozens of niche/academic sources. The original Python stack (FastAPI + asyncio + BeautifulSoup) worked, but:
- RAM hungry: Each worker held parsed DOM trees in memory. Under load, a single instance ate ~512 MB.
- Cold start pain: On a fresh container, Python import chains + dependency init took 4-6 seconds.
- GIL bottleneck: True parallelism across 90 engines was faked with async I/O, but CPU-bound parsing still serialized.
The Rust Rewrite
I rewrote the core in Rust using tokio for async, reqwest for HTTP, and scraper for HTML parsing. The results:
| Metric | Python | Rust |
|---|---|---|
| RAM (idle) | 512 MB | 38 MB |
| Cold start | 4.2s | 0.3s |
| P95 latency (90 engines) | 2.8s | 0.9s |
| Binary size | ~180 MB (venv) | 12 MB |
The dual-path LLM synthesis pipeline (lite mode for speed, research mode for depth) stayed as a sidecar microservice, but all search orchestration, ranking (BM25 + Reciprocal Rank Fusion), and content extraction now run natively in Rust.
Key Takeaway
If your I/O-heavy Python service is eating memory and you need predictable latency — Rust with tokio is the move. Not everything needs a rewrite, but the hot path absolutely does.
Check out the full source code and drop a star on GitHub: SearchWala on GitHub
Top comments (0)