DuckDB vs PostgreSQL vs Meilisearch: Full-Text Search at Scale
When dealing with 100 million documents, the choice of a full-text search engine isn't just about features—it's about raw performance, resource efficiency, and how well the tool fits your workload. A recent benchmark comparing DuckDB, PostgreSQL, and Meilisearch reveals surprising tradeoffs in build times, query latency, and memory usage that could reshape how you approach search infrastructure.
The Test Setup: Real-World Workload, Real Hardware
The benchmark used a 100-million-document corpus of Reddit comments (~50GB raw text, 14.8GB compressed Parquet) on a Hetzner AX-52 server (AMD Ryzen 7 7700, 64GB RAM, 2x 1TB NVMe). This wasn’t a synthetic test—queries were derived from production search logs, covering four classes: simple matches, multi-word phrases, fuzzy matches, and boolean queries. Each engine was tested with its latest stable version (DuckDB 1.1, PostgreSQL 17.4, Meilisearch 1.10) and optimized for performance.
Key Finding 1: Index Build Time—DuckDB Surprises
DuckDB’s FTS extension dominated cold builds, completing in 38 minutes—2.4x faster than PostgreSQL’s 91 minutes and roughly on par with Meilisearch’s 44 minutes. The key advantage? Columnar I/O and pipelined tokenization. DuckDB reads only the indexed column (body) from Parquet, avoiding unnecessary data movement. PostgreSQL, by contrast, ingests rows into heap pages before building a GIN index, doubling the I/O overhead.
Meilisearch, while fast, was memory-hungry, peaking at 29GB RAM during indexing—prohibitive for smaller deployments. PostgreSQL won on incremental updates (14 seconds for 1M new docs) thanks to its GIN index, but DuckDB’s columnar architecture made partial updates cheaper than a full rebuild.
Key Finding 2: Query Latency—Specialization Matters
Query performance varied sharply by workload:
- PostgreSQL GIN excelled at simple boolean AND queries (P50 latency: 4ms), leveraging its mature query planner and index optimizations.
- DuckDB dominated fuzzy and analytical queries (e.g., Levenshtein matches), outperforming PostgreSQL by 4x. Its columnar design allows fast scans and aggregations, making it ideal for search-as-an-analytics-primitive.
- Meilisearch delivered the best typo-tolerant ranking but struggled at scale—P99 latencies hit 800ms+ at 100M documents, likely due to its single-shard design.
Key Finding 3: Resource Efficiency—DuckDB’s Disk Advantage
DuckDB’s index was 3x smaller than Meilisearch’s, thanks to its compressed columnar storage. PostgreSQL’s GIN index was larger than DuckDB’s but more compact than Meilisearch’s. For disk-constrained environments, this alone could tip the scales.
The Verdict: No Universal Winner
- Choose PostgreSQL if you need OLTP-integrated search with fast boolean queries and incremental updates.
- Choose DuckDB if you prioritize fast analytical queries, low disk usage, and batch indexing.
- Choose Meilisearch if typo tolerance and developer experience are critical— but only for smaller corpora or with horizontal scaling.
Read the full article at novvista.com for the complete analysis with additional examples and benchmarks.
Originally published at NovVista
Top comments (0)