DEV Community

James
James

Posted on

Search Index Architecture for Privacy-First Engines

Elasticsearch vs. PostgreSQL Full-Text: Search Engine Indexing Deep Dive

Every search engine needs an index. The question is: which technology builds it?

PostgreSQL Full-Text Search

Built-in since PostgreSQL 8.3. Surprisingly capable.

Pros:

  • Zero additional infrastructure
  • ACID compliance (index stays consistent with data)
  • Good for < 1M documents
  • Supports German stemming, compound word handling
  • tsvector/tsquery is fast for simple needs

Cons:

  • No distributed search
  • Limited faceting and aggregation
  • Relevance scoring is basic
  • No built-in synonym support

Elasticsearch

The industry standard for search.

Pros:

  • Distributed by design (shards, replicas)
  • Advanced relevance scoring (BM25, custom functions)
  • Real-time indexing
  • Faceting, aggregation, geospatial
  • Plugin ecosystem (synonyms, analyzers)

Cons:

  • Additional infrastructure (cluster management)
  • Memory hungry (heap size tuning required)
  • Eventually consistent (not ACID)
  • Complex query DSL learning curve

What We Chose (and Why)

We use both:

  • PostgreSQL: Primary data store, user accounts, query logs (minimal), metadata
  • Elasticsearch: Search index only, rebuilt from PostgreSQL nightly

This hybrid gives us ACID for critical data and search performance for queries. If Elasticsearch fails, we can rebuild from PostgreSQL. If PostgreSQL is slow for search, Elasticsearch handles it.

German Language Challenges

German search is harder than English:

  • Compound words ("Datenschutzgrundverordnung")
  • Umlaut normalization (ä → ae or a?)
  • Case inflection
  • Dialect variations

Our solution: Custom Elasticsearch analyzer chain:

  1. ICU tokenizer (handles compound words)
  2. German stemmer (Snowball)
  3. Umlaut normalizer (ä → a, ö → o, ü → u)
  4. Synonym filter (DSGVO → Datenschutzgrundverordnung)

Graham Miranda builds search infrastructure at Graham Miranda UG (Berlin, HRB 36794).

Top comments (0)