Elasticsearch vs. PostgreSQL Full-Text: Search Engine Indexing Deep Dive
Every search engine needs an index. The question is: which technology builds it?
PostgreSQL Full-Text Search
Built-in since PostgreSQL 8.3. Surprisingly capable.
Pros:
- Zero additional infrastructure
- ACID compliance (index stays consistent with data)
- Good for < 1M documents
- Supports German stemming, compound word handling
- tsvector/tsquery is fast for simple needs
Cons:
- No distributed search
- Limited faceting and aggregation
- Relevance scoring is basic
- No built-in synonym support
Elasticsearch
The industry standard for search.
Pros:
- Distributed by design (shards, replicas)
- Advanced relevance scoring (BM25, custom functions)
- Real-time indexing
- Faceting, aggregation, geospatial
- Plugin ecosystem (synonyms, analyzers)
Cons:
- Additional infrastructure (cluster management)
- Memory hungry (heap size tuning required)
- Eventually consistent (not ACID)
- Complex query DSL learning curve
What We Chose (and Why)
We use both:
- PostgreSQL: Primary data store, user accounts, query logs (minimal), metadata
- Elasticsearch: Search index only, rebuilt from PostgreSQL nightly
This hybrid gives us ACID for critical data and search performance for queries. If Elasticsearch fails, we can rebuild from PostgreSQL. If PostgreSQL is slow for search, Elasticsearch handles it.
German Language Challenges
German search is harder than English:
- Compound words ("Datenschutzgrundverordnung")
- Umlaut normalization (ä → ae or a?)
- Case inflection
- Dialect variations
Our solution: Custom Elasticsearch analyzer chain:
- ICU tokenizer (handles compound words)
- German stemmer (Snowball)
- Umlaut normalizer (ä → a, ö → o, ü → u)
- Synonym filter (DSGVO → Datenschutzgrundverordnung)
Graham Miranda builds search infrastructure at Graham Miranda UG (Berlin, HRB 36794).
Top comments (0)