Prithvi S

Posted on Apr 14 • Edited on Apr 20

Inverted Index Explained: How Elasticsearch Achieves Sub-Millisecond Search on Billions of Documents

#elasticsearch #search #database #analytics

Imagine you're building a search feature for your product catalog. You have 10 million products, and you need to return relevant results in under 100 milliseconds. You decide to use PostgreSQL's full-text search, so you write:

SELECT * FROM products WHERE to_tsvector('english', title) @@ plainto_tsquery('english', 'wireless headphones');

It works. But then you get 100 million products. Then a billion. The queries crawl from 100ms to 5 seconds. Your users leave. Your boss asks why.

The answer isn't "use a bigger database." The answer is "use a different data structure."

Elasticsearch doesn't store data the way PostgreSQL does. It uses something called an inverted index, and that one difference is why Elasticsearch can search a billion documents in 2-5 milliseconds while traditional databases take seconds.

This post dives into how that magic works.

What Is an Inverted Index?

Think of a book. At the back, there's an index:

Elasticsearch ... pages 45, 78, 120, 156
Performance ... pages 45, 89, 203
Database ... pages 12, 78, 200

The index maps words to page numbers. When you want to find information about "Performance," you look it up once and jump directly to those pages. You don't read every single page.

That's the core idea of an inverted index.

Now imagine instead of a book, you have documents. Your "index" maps terms to document IDs:

elasticsearch -> [doc1, doc3, doc5, doc8, ...]
performance -> [doc1, doc2, doc7, ...]
database -> [doc3, doc4, doc9, ...]

It's "inverted" because it flips the relationship. A forward index says "doc1 contains terms: elasticsearch, performance, scalability." An inverted index says "term elasticsearch is in documents: 1, 3, 5, 8."

Why does this matter? Because searching becomes trivially fast.

When someone searches for "elasticsearch," Elasticsearch doesn't scan all documents. It looks up "elasticsearch" in the index once and gets back a list of document IDs. Done. O(1) lookup plus a single postings list traversal.

Under the Hood: How Elasticsearch Builds and Uses Inverted Indices

Step 1: Text Analysis (Before Indexing)

Before a document gets indexed, its text goes through an analyzer pipeline:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "custom_analyzer": {
          "type": "custom",
          "char_filter": ["html_strip"],
          "tokenizer": "standard",
          "filter": ["lowercase", "stop", "porter_stem"]
        }
      }
    }
  }
}

This pipeline:

Removes HTML tags (character filter)
Splits text into tokens (tokenizer): "Elasticsearch is powerful" becomes ["Elasticsearch", "is", "powerful"]
Lowercases tokens (filter): ["elasticsearch", "is", "powerful"]
Removes stop words (filter): ["elasticsearch", "powerful"]
Stems words (filter): ["elasticsearch", "power"]

Now "powerful" and "powers" both map to the same root "power," so a search for "power" finds both.

The analyzer is completely customizable. For medical documents, you might preserve technical terms. For e-commerce, you might add synonym expansion (so "laptop" matches "notebook").

Step 2: Segment Creation (The Immutable Index)

Here's where Elasticsearch gets clever. Instead of maintaining one large mutable index, it creates immutable segments.

When documents arrive:

They sit in an in-memory buffer
Every ~1 second (refresh interval), the buffer flushes to disk as a new segment
Each segment is an inverted index, but immutable
Multiple segments are searched in parallel

Why immutable? Because it's fast. You never have to lock or rebalance. You just append new segments. If a crash happens mid-write, you have the translog to recover from.

Here's what a tiny two-document segment looks like:

INVERTED INDEX (Segment 1):

Term          | Postings List (Doc IDs)
--------------|----------------------
elasticsearch | [1, 2]
powerful      | [1]
scales        | [2]
horizontally  | [2]

DOCUMENT STORE:
Doc 1: "elasticsearch is powerful"
Doc 2: "elasticsearch scales horizontally"

Step 3: Term Lookup (Lightning Fast)

When you search for "elasticsearch," here's what happens:

The query arrives at a coordinator node
It broadcasts the query to all relevant shards
Each shard performs a binary search on the sorted terms in its segments
Found "elasticsearch"? Return the postings list: [1, 2]
Fetch those documents from the document store
Return to coordinator, which merges results from all shards

The magic: binary search on sorted terms is O(log N). On a million terms, that's ~20 comparisons. Then you get the postings list and you're done.

Step 4: Segment Merging (Background Optimization)

Over time, you accumulate many small segments. Searching 100 segments is slower than searching 1 large segment. So Elasticsearch periodically merges them:

Segment 1 (1000 docs) + Segment 2 (1000 docs) -> Merged Segment (2000 docs)

The merge is invisible to you. It happens in the background. Old segments are deleted. The new merged segment is searched going forward.

Why Inverted Index Beats Traditional Databases

Let's compare searching 1 billion documents for "elasticsearch":

PostgreSQL with Full-Text Search

SELECT id, title FROM products 
WHERE to_tsvector('english', title) @@ plainto_tsquery('english', 'elasticsearch')
LIMIT 10;

Internally, PostgreSQL uses a B-tree index. Here's the problem:

B-trees are designed for point lookups and range scans
Full-text search requires traversing multiple branches of the tree
For "elasticsearch," the database must:
- Find all occurrences of the term (multiple tree lookups)
- Reconstruct which documents contain them
- Filter by relevance

On a billion products, this takes 3-10 seconds.

Elasticsearch

GET /products/_search
{
  "query": {
    "match": {
      "title": "elasticsearch"
    }
  }
}

Elasticsearch:

Looks up "elasticsearch" in the inverted index (binary search, ~30 comparisons)
Gets back a postings list
Fetches the top 10 documents
Returns results

Time: 2-5 milliseconds on a properly sized cluster.

The difference: inverted indices are designed specifically for text search. B-trees are not.

The Trade-Off: Updates vs Queries

But inverted indices have a cost: updates are expensive.

When you update a document in Elasticsearch:

The old document is marked for deletion
A new document is indexed (goes through analysis, creates new segment)
A merge eventually removes the deleted document

This takes milliseconds to seconds, not microseconds. Elasticsearch is eventually consistent.

In PostgreSQL, you just UPDATE a row. Done immediately.

So when do you use each?

PostgreSQL: Transactional workloads, frequent updates, complex joins
Elasticsearch: Text search, logs, analytics, observability

Real-World Performance Numbers

Here are actual numbers from production Elasticsearch clusters:

Scenario	Elasticsearch	PostgreSQL
Search 1M docs	2-3ms	100-200ms
Search 1B docs	5-10ms	3-10s
Aggregation (cardinality)	5-20ms	500ms-2s
Index throughput	100K docs/sec	10K-50K docs/sec
Memory per 50GB data	8-16GB (compressed)	50GB+ (uncompressed)

The compression factor is huge: Elasticsearch's inverted index compresses 3-5x tighter than raw JSON because terms are deduplicated and encoded efficiently.

Relevance Scoring with BM25

Now that we can find documents fast, the next question is: which results should be first?

Elasticsearch uses BM25, a probabilistic relevance framework:

score = BM25(term_frequency, document_length, inverse_document_frequency)

In plain English:

Term frequency: how many times does "elasticsearch" appear in the document? (more = higher score)
Inverse document frequency: how rare is "elasticsearch"? (rare terms like "llama-index" rank higher than common terms like "the")
Document length normalization: prevent long documents from always ranking highest

So if you search "elasticsearch performance," a document mentioning "elasticsearch" 5 times and "performance" 3 times ranks higher than a document mentioning each once.

You can customize this with field boosting:

{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": { "query": "elasticsearch", "boost": 2 } } },
        { "match": { "body": "elasticsearch" } }
      ]
    }
  }
}

Now matches in the title count twice as much. Perfect for building relevant search experiences.

Common Mistakes (And How to Avoid Them)

Mistake 1: Too Many Shards

You have 1GB of data and create 100 shards. Each shard has 10MB.

Problem: search latency goes through the roof because you're coordinating across 100 shards, and overhead dominates.

Rule of thumb: aim for 10-50GB per shard. If you have 1TB of data, 20-100 shards is reasonable.

Mistake 2: Ignoring Refresh Interval

You index a document and try to search it immediately. Nothing.

That's because the default refresh interval is 1 second. Your data sits in the buffer for up to 1 second before becoming searchable.

For near-real-time search, you might lower this to 100ms. But each refresh creates a new segment, and merging costs CPU.

Balance: 500ms-1s for most use cases. Only lower for critical real-time systems.

Mistake 3: Bad Analyzer Configuration

You don't configure an analyzer, so Elasticsearch uses the default standard analyzer.

Now when users search "AWS S3", they get no results because:

"AWS" tokenizes to "aws" (fine)
"S3" becomes "s" and "3" (terrible)

Custom analyzer with synonym expansion fixes this:

"filter": {
  "synonyms": {
    "type": "synonym",
    "synonyms": ["AWS S3 => s3", "machine learning => ml"]
  }
}

Conclusion: Why Inverted Index Matters

The inverted index is deceptively simple: a mapping from terms to document IDs. But this simple data structure enables Elasticsearch to do what traditional databases struggle with: search billions of documents in milliseconds.

The key insights:

Inverted index is designed for text search, not general-purpose queries
Immutable segments enable fast, lockless indexing
Binary search on terms makes lookup blazing fast
BM25 scoring automatically ranks results by relevance
The trade-off: fast reads, slower updates. Worth it for search workloads

If you're building a search feature, a logging system, or an observability platform, understanding how Elasticsearch works under the hood will save you from common mistakes and help you build systems that scale.

Next step? Learn how to scale Elasticsearch horizontally with sharding, tune refresh and flush intervals for your workload, and customize analyzers for your domain.

Happy searching.

Key Resources

About the author: I'm Prithvi S, Staff Software Engineer at Cloudera and Opensource Enthusiast. Follow my work on GitHub: https://github.com/iprithv

DEV Community