Md. Monowarul Amin

Posted on Feb 8

Step by Step Guide to Elasticsearch

#backend #beginners #database #tutorial

Complete Guide to Elasticsearch

What is Elasticsearch?
Why Use Elasticsearch?
When NOT to Use Elasticsearch
Core Concepts
Elasticsearch vs SQL - Side by Side
Query Types Explained
Real-World Use Cases
Performance Impact
Implementation Guide
Best Practices
Common Pitfalls

What is Elasticsearch?

Elasticsearch is a distributed, open-source search and analytics engine built on Apache Lucene. It's designed for horizontal scalability, maximum reliability, and real-time search.

Simple Analogy

SQL Database = Library with a card catalog (search by exact criteria)
Elasticsearch = Google for your data (fuzzy search, typo-tolerant, relevance-ranked)

Key Characteristics

Document-oriented - Stores data as JSON documents
Full-text search - Optimized for text search and analysis
Near real-time - Data is searchable within 1 second of indexing
Distributed - Scales horizontally across multiple nodes
Schema-free - Dynamic mapping (but can define schema)

Why Use Elasticsearch?

1. Full-Text Search Capabilities

Traditional SQL

-- ❌ Poor full-text search
SELECT * FROM products 
WHERE description LIKE '%laptop%';

-- Problems:
-- - No relevance scoring
-- - No typo tolerance
-- - Doesn't handle "laptops", "LAPTOP", "lap-top"
-- - Slow on large datasets

Elasticsearch

// ✅ Powerful full-text search
GET /products/_search
{
  "query": {
    "match": {
      "description": "laptop"
    }
  }
}

// Automatically handles:
// - "laptop", "laptops", "LAPTOP", "Laptop"
// - Typos: "loptop" → finds "laptop"
// - Relevance scoring
// - Fast even with millions of documents

2. Search Speed

Dataset: 10 million product records

SQL LIKE Query:
SELECT * FROM products WHERE description LIKE '%gaming laptop%';
→ Full table scan: 15-30 seconds

Elasticsearch:
GET /products/_search { "query": { "match": { "description": "gaming laptop" }}}
→ Inverted index: 50-200 milliseconds

Speed: 150x faster!

3. Complex Search Requirements

Scenario: E-commerce product search

Search across multiple fields (title, description, brand)
Fuzzy matching (typo tolerance)
Boost certain fields (title more important than description)
Filter by price range, category, rating
Sort by relevance, price, or rating
Faceted search (aggregations)

SQL: Requires complex, slow queries with multiple JOINs and LIKE clauses

Elasticsearch: Built for this exact use case

4. Log Analysis & Monitoring

Use Case: Analyze 1 billion application logs

SQL Database:
- Insert: Slow (indexes slow down writes)
- Query: "Find all errors in the last hour" → Slow
- Aggregation: "Count errors by service" → Very slow
- Storage: Expensive for time-series data

Elasticsearch:
- Insert: Fast (optimized for writes)
- Query: Near instant (time-based indexing)
- Aggregation: Milliseconds (built-in analytics)
- Storage: Time-based indices, easy to delete old data

5. Real-Time Analytics

// SQL: Complex GROUP BY query (slow)
SELECT 
  category, 
  AVG(price), 
  COUNT(*) 
FROM products 
WHERE created_at > NOW() - INTERVAL 1 DAY
GROUP BY category;

// Elasticsearch: Fast aggregations
GET /products/_search
{
  "query": {
    "range": {
      "created_at": { "gte": "now-1d" }
    }
  },
  "aggs": {
    "by_category": {
      "terms": { "field": "category" },
      "aggs": {
        "avg_price": { "avg": { "field": "price" }}
      }
    }
  }
}
// Returns in milliseconds, even with millions of docs

6. Autocomplete & Suggestions

// User types "iph" → Suggest "iPhone 15 Pro Max"
GET /products/_search
{
  "suggest": {
    "product-suggest": {
      "prefix": "iph",
      "completion": {
        "field": "name.suggest"
      }
    }
  }
}

// SQL: Nearly impossible to implement efficiently

7. Geospatial Search

// Find restaurants within 5km
GET /restaurants/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "geo_distance": {
          "distance": "5km",
          "location": {
            "lat": 23.8103,
            "lon": 90.4125
          }
        }
      }
    }
  }
}

// SQL: Requires PostGIS extension, much slower

When NOT to Use Elasticsearch

❌ Don't Use Elasticsearch For:

1. Transactional Data (ACID Requirements)

Use Case: Banking transactions, order processing
Problem: Elasticsearch is eventually consistent, not ACID
Solution: Use PostgreSQL/MySQL for transactions, sync to ES for search

2. Primary Data Store

Use Case: Customer records, inventory
Problem: No foreign keys, joins are expensive, data can be lost
Solution: Use RDBMS as source of truth, ES as search layer

3. Complex Relationships

Use Case: Social network (users, friends, posts, comments, likes)
Problem: No JOINs, denormalization required
Solution: Use graph database (Neo4j) or RDBMS

4. Small Datasets (< 10,000 records)

Use Case: Small product catalog
Problem: Overhead not worth it
Solution: PostgreSQL with full-text search is sufficient

5. Strong Consistency Requirements

Use Case: Inventory management (exact stock counts)
Problem: Near real-time (1s delay), eventual consistency
Solution: RDBMS for writes, ES for search (sync with delay acceptable)

6. Frequent Updates to Same Document

Use Case: Real-time stock prices, live scores
Problem: Each update creates a new document version (inefficient)
Solution: Use Redis or time-series database

Core Concepts

SQL vs Elasticsearch Terminology

SQL	Elasticsearch	Description
Database	Index	Container for data
Table	Type (deprecated in 7.x+)	Collection of documents
Row	Document	Single record (JSON)
Column	Field	Attribute of document
Schema	Mapping	Field definitions
Index	Inverted Index	Data structure for fast search
SELECT	Query DSL	Retrieve data
INSERT	Index API	Add data
UPDATE	Update API	Modify data
DELETE	Delete API	Remove data

Document Example

SQL Row:

id | name              | price | category     | in_stock
1  | iPhone 15 Pro Max | 1199  | Electronics  | true

Elasticsearch Document:

{
  "_index": "products",
  "_id": "1",
  "_source": {
    "name": "iPhone 15 Pro Max",
    "price": 1199,
    "category": "Electronics",
    "in_stock": true,
    "tags": ["smartphone", "apple", "5g"],
    "specs": {
      "screen": "6.7 inch",
      "storage": "256GB"
    }
  }
}

Index Structure

SQL Database:

database: ecommerce
  ├── table: products
  ├── table: orders
  └── table: customers

Elasticsearch:

cluster: ecommerce-cluster
  ├── index: products
  │   ├── shard 0 (primary)
  │   ├── shard 0 (replica)
  │   ├── shard 1 (primary)
  │   └── shard 1 (replica)
  ├── index: orders
  └── index: customers

Elasticsearch vs SQL - Side by Side

1. Simple SELECT

SQL:

SELECT * FROM products WHERE id = 1;

Elasticsearch:

GET /products/_doc/1

2. SELECT with WHERE Clause

SQL:

SELECT * FROM products 
WHERE category = 'Electronics' 
  AND price < 1000;

Elasticsearch:

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        { "term": { "category": "Electronics" }},
        { "range": { "price": { "lt": 1000 }}}
      ]
    }
  }
}

3. LIKE Query (Partial Match)

SQL:

SELECT * FROM products 
WHERE name LIKE '%iPhone%';

Elasticsearch:

GET /products/_search
{
  "query": {
    "match": {
      "name": "iPhone"
    }
  }
}

OR (more precise):

GET /products/_search
{
  "query": {
    "wildcard": {
      "name": "*iPhone*"
    }
  }
}

4. Full-Text Search (Advanced)

SQL:

-- ❌ This is painful and slow
SELECT * FROM products 
WHERE LOWER(name) LIKE '%gaming%laptop%'
   OR LOWER(description) LIKE '%gaming%laptop%'
ORDER BY (
  CASE WHEN name LIKE '%gaming laptop%' THEN 1
       WHEN name LIKE '%gaming%' THEN 2
       ELSE 3 END
);

Elasticsearch:

// ✅ Clean and fast
GET /products/_search
{
  "query": {
    "multi_match": {
      "query": "gaming laptop",
      "fields": ["name^2", "description"],
      "fuzziness": "AUTO"
    }
  }
}

// Explanation:
// - Searches both name and description
// - name^2 = boost name field 2x (more important)
// - fuzziness: AUTO = tolerates typos
// - Automatic relevance scoring

5. ORDER BY

SQL:

SELECT * FROM products 
ORDER BY price DESC 
LIMIT 10;

Elasticsearch:

GET /products/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "price": "desc" }
  ],
  "size": 10
}

6. Pagination

SQL:

SELECT * FROM products 
LIMIT 10 OFFSET 20;

Elasticsearch:

GET /products/_search
{
  "query": { "match_all": {} },
  "from": 20,
  "size": 10
}

⚠️ Warning: Deep pagination (from > 10,000) is slow in Elasticsearch.

Solution: Use search_after (cursor-based pagination)

// First page
GET /products/_search
{
  "query": { "match_all": {} },
  "size": 10,
  "sort": [{ "price": "asc" }, { "_id": "asc" }]
}

// Next page (using last doc's sort values)
GET /products/_search
{
  "query": { "match_all": {} },
  "size": 10,
  "sort": [{ "price": "asc" }, { "_id": "asc" }],
  "search_after": [1199, "abc123"]
}

7. COUNT

SQL:

SELECT COUNT(*) FROM products 
WHERE category = 'Electronics';

Elasticsearch:

GET /products/_count
{
  "query": {
    "term": { "category": "Electronics" }
  }
}

8. GROUP BY (Aggregations)

SQL:

SELECT 
  category,
  COUNT(*) as count,
  AVG(price) as avg_price,
  MAX(price) as max_price
FROM products
GROUP BY category;

Elasticsearch:

GET /products/_search
{
  "size": 0,
  "aggs": {
    "by_category": {
      "terms": { "field": "category" },
      "aggs": {
        "avg_price": { "avg": { "field": "price" }},
        "max_price": { "max": { "field": "price" }}
      }
    }
  }
}

Response:

{
  "aggregations": {
    "by_category": {
      "buckets": [
        {
          "key": "Electronics",
          "doc_count": 1234,
          "avg_price": { "value": 599.99 },
          "max_price": { "value": 1999.99 }
        },
        {
          "key": "Clothing",
          "doc_count": 5678,
          "avg_price": { "value": 49.99 },
          "max_price": { "value": 299.99 }
        }
      ]
    }
  }
}

9. BETWEEN

SQL:

SELECT * FROM products 
WHERE price BETWEEN 100 AND 500;

Elasticsearch:

GET /products/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 100,
        "lte": 500
      }
    }
  }
}

10. IN Clause

SQL:

SELECT * FROM products 
WHERE category IN ('Electronics', 'Books', 'Toys');

Elasticsearch:

GET /products/_search
{
  "query": {
    "terms": {
      "category": ["Electronics", "Books", "Toys"]
    }
  }
}

11. NOT / Negation

SQL:

SELECT * FROM products 
WHERE category != 'Clothing';

Elasticsearch:

GET /products/_search
{
  "query": {
    "bool": {
      "must_not": [
        { "term": { "category": "Clothing" }}
      ]
    }
  }
}

12. AND / OR Logic

SQL:

SELECT * FROM products 
WHERE (category = 'Electronics' OR category = 'Books')
  AND price < 100;

Elasticsearch:

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        { "range": { "price": { "lt": 100 }}}
      ],
      "should": [
        { "term": { "category": "Electronics" }},
        { "term": { "category": "Books" }}
      ],
      "minimum_should_match": 1
    }
  }
}

13. Nested Queries (Subqueries)

SQL:

SELECT * FROM products 
WHERE price > (
  SELECT AVG(price) FROM products
);

Elasticsearch:

// Two-step approach (no subqueries in ES)

// Step 1: Get average
GET /products/_search
{
  "size": 0,
  "aggs": {
    "avg_price": { "avg": { "field": "price" }}
  }
}
// Returns: avg_price = 150

// Step 2: Query with result
GET /products/_search
{
  "query": {
    "range": {
      "price": { "gt": 150 }
    }
  }
}

14. JOIN (Parent-Child Relationship)

SQL:

SELECT 
  orders.id,
  orders.total,
  customers.name
FROM orders
JOIN customers ON orders.customer_id = customers.id
WHERE customers.country = 'USA';

Elasticsearch:

// ❌ Elasticsearch doesn't support JOINs like SQL
// ✅ Use denormalization (embed customer data in order)

// Document structure:
{
  "_index": "orders",
  "_id": "order123",
  "_source": {
    "order_id": "order123",
    "total": 299.99,
    "customer": {
      "id": "cust456",
      "name": "John Doe",
      "country": "USA"
    }
  }
}

// Query:
GET /orders/_search
{
  "query": {
    "term": { "customer.country": "USA" }
  }
}

OR use Parent-Child relationship (complex):

// Define mapping
PUT /orders
{
  "mappings": {
    "properties": {
      "join_field": {
        "type": "join",
        "relations": {
          "customer": "order"
        }
      }
    }
  }
}

// Query with has_parent
GET /orders/_search
{
  "query": {
    "has_parent": {
      "parent_type": "customer",
      "query": {
        "term": { "country": "USA" }
      }
    }
  }
}

15. Date Range Query

SQL:

SELECT * FROM orders 
WHERE created_at >= '2024-01-01' 
  AND created_at < '2024-02-01';

Elasticsearch:

GET /orders/_search
{
  "query": {
    "range": {
      "created_at": {
        "gte": "2024-01-01",
        "lt": "2024-02-01"
      }
    }
  }
}

OR using relative dates:

GET /orders/_search
{
  "query": {
    "range": {
      "created_at": {
        "gte": "now-7d",
        "lte": "now"
      }
    }
  }
}

Query Types Explained

1. Match Query (Full-Text Search)

What: Analyzes the search text and finds relevant documents

GET /products/_search
{
  "query": {
    "match": {
      "description": "gaming laptop"
    }
  }
}

How it works:

Analyzes "gaming laptop" → ["gaming", "laptop"]
Searches for documents containing either word
Scores by relevance (both words > one word)

Use when:

Full-text search across analyzed fields
Natural language queries
Typo tolerance needed

2. Term Query (Exact Match)

What: Exact match on a single term (not analyzed)

GET /products/_search
{
  "query": {
    "term": {
      "category.keyword": "Electronics"
    }
  }
}

Use when:

Filtering by exact values (IDs, statuses, categories)
Keyword fields
Not for analyzed text fields

⚠️ Common mistake:

// ❌ WRONG: Won't find "Electronics" if field is analyzed
{
  "query": {
    "term": { "category": "Electronics" }
  }
}

// ✅ CORRECT: Use .keyword subfield or match query
{
  "query": {
    "term": { "category.keyword": "Electronics" }
  }
}

3. Bool Query (Combine Multiple Queries)

What: Combines queries with boolean logic

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        // AND logic - must match
        { "match": { "name": "laptop" }}
      ],
      "should": [
        // OR logic - nice to have (boosts score)
        { "match": { "brand": "Apple" }}
      ],
      "must_not": [
        // NOT logic - exclude
        { "term": { "in_stock": false }}
      ],
      "filter": [
        // AND logic - must match (no scoring)
        { "range": { "price": { "lte": 2000 }}}
      ]
    }
  }
}

Clauses:

must: Documents MUST match (affects score)
should: Documents SHOULD match (boosts score)
must_not: Documents MUST NOT match (filter)
filter: Documents MUST match (no scoring, cacheable)

Use when:

Combining multiple conditions
Complex search logic

4. Range Query

What: Find documents within a range

GET /products/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 100,   // Greater than or equal
        "lte": 1000   // Less than or equal
      }
    }
  }
}

Operators:

gte: >= (greater than or equal)
gt: > (greater than)
lte: <= (less than or equal)
lt: < (less than)

Use when:

Price ranges
Date ranges
Numeric ranges

5. Multi-Match Query

What: Search across multiple fields

GET /products/_search
{
  "query": {
    "multi_match": {
      "query": "gaming laptop",
      "fields": ["name^3", "description", "brand^2"],
      "fuzziness": "AUTO"
    }
  }
}

Field boosting:

name^3: Boost name field 3x
brand^2: Boost brand field 2x
description: No boost (1x)

Use when:

Searching across multiple fields
Different field importance
Google-like search

6. Fuzzy Query (Typo Tolerance)

What: Finds terms within edit distance

GET /products/_search
{
  "query": {
    "fuzzy": {
      "name": {
        "value": "loptop",
        "fuzziness": "AUTO"
      }
    }
  }
}

Finds: "laptop", "laptops" (even with typo "loptop")

Fuzziness values:

0: No typos allowed
1: 1 character difference
2: 2 character differences
AUTO: Adaptive (0 for 1-2 chars, 1 for 3-5 chars, 2 for 6+ chars)

Use when:

User input may have typos
Autocomplete/suggestions

7. Wildcard Query

What: Pattern matching with wildcards

GET /products/_search
{
  "query": {
    "wildcard": {
      "name": "*Phone*"
    }
  }
}

Wildcards:

*: Any number of characters
?: Single character

⚠️ Warning: Slow! Avoid leading wildcards (*Phone)

Use when:

Pattern matching
Prefix/suffix search (but prefer prefix query for performance)

8. Prefix Query

What: Matches terms starting with prefix

GET /products/_search
{
  "query": {
    "prefix": {
      "name": "iph"
    }
  }
}

Finds: "iPhone", "iPhone 15", "iPhones"

Use when:

Autocomplete
Typeahead search
Fast prefix matching

9. Exists Query

What: Checks if field exists

GET /products/_search
{
  "query": {
    "exists": {
      "field": "discount"
    }
  }
}

SQL equivalent: WHERE discount IS NOT NULL

Use when:

Check for presence of field
Filter documents with/without specific fields

10. Aggregations (GROUP BY on steroids)

What: Analytics and statistics

GET /products/_search
{
  "size": 0,
  "aggs": {
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 100 },
          { "from": 100, "to": 500 },
          { "from": 500 }
        ]
      }
    },
    "popular_categories": {
      "terms": {
        "field": "category.keyword",
        "size": 10
      }
    },
    "stats": {
      "stats": { "field": "price" }
    }
  }
}

Response:

{
  "aggregations": {
    "price_ranges": {
      "buckets": [
        { "key": "*-100.0", "doc_count": 234 },
        { "key": "100.0-500.0", "doc_count": 567 },
        { "key": "500.0-*", "doc_count": 123 }
      ]
    },
    "popular_categories": {
      "buckets": [
        { "key": "Electronics", "doc_count": 456 },
        { "key": "Books", "doc_count": 234 }
      ]
    },
    "stats": {
      "count": 924,
      "min": 9.99,
      "max": 1999.99,
      "avg": 299.50,
      "sum": 276738.0
    }
  }
}

Types:

Metric aggregations: avg, max, min, sum, stats
Bucket aggregations: terms, range, date_histogram
Pipeline aggregations: derivative, cumulative_sum

Real-World Use Cases

1. E-Commerce Product Search

Requirements:

Search by product name, description
Filter by category, price, brand
Faceted search (category count, price ranges)
Autocomplete
"Did you mean?" suggestions

Implementation:

Index Mapping

PUT /products
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "fields": {
          "keyword": { "type": "keyword" },
          "suggest": { "type": "completion" }
        }
      },
      "description": { "type": "text" },
      "category": {
        "type": "text",
        "fields": { "keyword": { "type": "keyword" }}
      },
      "brand": { "type": "keyword" },
      "price": { "type": "float" },
      "rating": { "type": "float" },
      "in_stock": { "type": "boolean" },
      "tags": { "type": "keyword" }
    }
  }
}

Search Query

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "gaming laptop",
            "fields": ["name^3", "description", "tags^2"],
            "fuzziness": "AUTO"
          }
        }
      ],
      "filter": [
        { "term": { "in_stock": true }},
        { "range": { "price": { "lte": 2000 }}},
        { "terms": { "brand": ["Dell", "HP", "Lenovo"] }}
      ]
    }
  },
  "aggs": {
    "categories": {
      "terms": { "field": "category.keyword" }
    },
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "key": "Under $500", "to": 500 },
          { "key": "$500-$1000", "from": 500, "to": 1000 },
          { "key": "$1000-$2000", "from": 1000, "to": 2000 },
          { "key": "Over $2000", "from": 2000 }
        ]
      }
    },
    "brands": {
      "terms": { "field": "brand", "size": 20 }
    }
  },
  "sort": [
    { "_score": "desc" },
    { "rating": "desc" }
  ],
  "from": 0,
  "size": 20
}

2. Log Analysis (ELK Stack)

Scenario: Analyze application logs

Index Mapping

PUT /logs-2024.02
{
  "mappings": {
    "properties": {
      "timestamp": { "type": "date" },
      "level": { "type": "keyword" },
      "service": { "type": "keyword" },
      "message": { "type": "text" },
      "user_id": { "type": "keyword" },
      "ip_address": { "type": "ip" },
      "response_time": { "type": "integer" }
    }
  }
}

Query: Find Errors in Last Hour

GET /logs-2024.02/_search
{
  "query": {
    "bool": {
      "must": [
        { "term": { "level": "ERROR" }},
        { "range": { "timestamp": { "gte": "now-1h" }}}
      ]
    }
  },
  "aggs": {
    "errors_by_service": {
      "terms": { "field": "service" }
    },
    "errors_over_time": {
      "date_histogram": {
        "field": "timestamp",
        "fixed_interval": "5m"
      }
    }
  }
}

Query: Slow Requests

GET /logs-2024.02/_search
{
  "query": {
    "range": {
      "response_time": { "gte": 5000 }
    }
  },
  "aggs": {
    "slow_endpoints": {
      "terms": { "field": "endpoint.keyword" }
    }
  }
}

3. Content Management System

Scenario: Blog/news website

Index Mapping

PUT /articles
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "fields": { "keyword": { "type": "keyword" }}
      },
      "content": { "type": "text" },
      "author": { "type": "keyword" },
      "tags": { "type": "keyword" },
      "published_at": { "type": "date" },
      "view_count": { "type": "integer" },
      "category": { "type": "keyword" }
    }
  }
}

Query: Find Related Articles

GET /articles/_search
{
  "query": {
    "more_like_this": {
      "fields": ["title", "content", "tags"],
      "like": [
        {
          "_index": "articles",
          "_id": "current-article-id"
        }
      ],
      "min_term_freq": 2,
      "max_query_terms": 12
    }
  },
  "size": 5
}

4. Autocomplete Search

// Index document with completion suggester
PUT /products/_doc/1
{
  "name": "iPhone 15 Pro Max",
  "name_suggest": {
    "input": ["iPhone", "iPhone 15", "iPhone Pro", "iPhone 15 Pro Max"],
    "weight": 10
  }
}

// Autocomplete query
GET /products/_search
{
  "suggest": {
    "product-suggest": {
      "prefix": "iph",
      "completion": {
        "field": "name_suggest",
        "size": 5,
        "fuzzy": {
          "fuzziness": "AUTO"
        }
      }
    }
  }
}

Performance Impact

Index Size Comparison

Dataset: 1 million products

PostgreSQL:
- Table size: 500 MB
- Index size (B-tree): 200 MB
- Total: 700 MB

Elasticsearch:
- Document storage: 600 MB
- Inverted index: 800 MB
- Field data cache: 200 MB
- Total: 1.6 GB

Elasticsearch uses ~2x more disk space

Query Performance

Query: Full-text search "gaming laptop" across 10M documents

PostgreSQL (LIKE):
SELECT * FROM products 
WHERE description LIKE '%gaming%laptop%';
→ Time: 15-30 seconds (full table scan)

PostgreSQL (Full-Text Search):
SELECT * FROM products 
WHERE to_tsvector(description) @@ to_tsquery('gaming & laptop');
→ Time: 2-5 seconds (better, but still slow)

Elasticsearch:
GET /products/_search { "query": { "match": { "description": "gaming laptop" }}}
→ Time: 50-200 milliseconds

Speed improvement: 50-100x faster

Write Performance

Scenario: Insert 100,000 documents

PostgreSQL:
- With indexes: 60-120 seconds
- Without indexes: 10-20 seconds (but queries slow)

Elasticsearch:
- Bulk indexing: 5-15 seconds
- Individual inserts: 40-60 seconds

Elasticsearch is faster for writes (optimized for append-only workloads)

Memory Requirements

Small deployment (1M documents):
- PostgreSQL: 2-4 GB RAM
- Elasticsearch: 4-8 GB RAM (needs heap + OS cache)

Medium deployment (10M documents):
- PostgreSQL: 8-16 GB RAM
- Elasticsearch: 16-32 GB RAM

Large deployment (100M documents):
- PostgreSQL: 32-64 GB RAM
- Elasticsearch: 64-128 GB RAM (distributed across nodes)

Rule of thumb: Elasticsearch needs 2x RAM of PostgreSQL

CPU Impact

CPU Usage Pattern:

PostgreSQL:
- Idle: 1-5%
- Search query: 20-40% (single core)
- Complex aggregation: 60-80% (single core)

Elasticsearch:
- Idle: 5-10%
- Search query: 10-30% (distributed across cores)
- Complex aggregation: 40-60% (parallel processing)

Elasticsearch utilizes multiple cores better (distributed)

Implementation Guide

1. Java Client Setup

Maven Dependency

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
    <version>7.17.15</version>
</dependency>

<!-- OR use new Java API Client (ES 8+) -->
<dependency>
    <groupId>co.elastic.clients</groupId>
    <artifactId>elasticsearch-java</artifactId>
    <version>8.11.0</version>
</dependency>

Spring Boot Configuration

@Configuration
public class ElasticsearchConfig {

    @Bean
    public RestHighLevelClient elasticsearchClient() {
        ClientConfiguration clientConfiguration = ClientConfiguration.builder()
                .connectedTo("localhost:9200")
                .withConnectTimeout(Duration.ofSeconds(5))
                .withSocketTimeout(Duration.ofSeconds(30))
                .build();

        return RestClients.create(clientConfiguration).rest();
    }
}

2. Document Operations

Index a Document (INSERT)

@Service
public class ProductService {

    @Autowired
    private RestHighLevelClient client;

    public void indexProduct(Product product) throws IOException {
        IndexRequest request = new IndexRequest("products")
                .id(product.getId())
                .source(objectMapper.writeValueAsString(product), XContentType.JSON);

        IndexResponse response = client.index(request, RequestOptions.DEFAULT);

        // Check result
        if (response.getResult() == DocWriteResponse.Result.CREATED) {
            logger.info("Product indexed: {}", product.getId());
        }
    }
}

Bulk Index (Batch INSERT)

public void bulkIndexProducts(List<Product> products) throws IOException {
    BulkRequest bulkRequest = new BulkRequest();

    for (Product product : products) {
        IndexRequest request = new IndexRequest("products")
                .id(product.getId())
                .source(objectMapper.writeValueAsString(product), XContentType.JSON);
        bulkRequest.add(request);
    }

    BulkResponse bulkResponse = client.bulk(bulkRequest, RequestOptions.DEFAULT);

    if (bulkResponse.hasFailures()) {
        logger.error("Bulk indexing failures: {}", bulkResponse.buildFailureMessage());
    } else {
        logger.info("Indexed {} products", products.size());
    }
}

Get a Document (SELECT by ID)

public Product getProduct(String id) throws IOException {
    GetRequest request = new GetRequest("products", id);
    GetResponse response = client.get(request, RequestOptions.DEFAULT);

    if (response.isExists()) {
        String sourceAsString = response.getSourceAsString();
        return objectMapper.readValue(sourceAsString, Product.class);
    }
    return null;
}

Update a Document

public void updateProduct(String id, Map<String, Object> updates) throws IOException {
    UpdateRequest request = new UpdateRequest("products", id)
            .doc(updates);

    UpdateResponse response = client.update(request, RequestOptions.DEFAULT);

    if (response.getResult() == DocWriteResponse.Result.UPDATED) {
        logger.info("Product updated: {}", id);
    }
}

Delete a Document

public void deleteProduct(String id) throws IOException {
    DeleteRequest request = new DeleteRequest("products", id);
    DeleteResponse response = client.delete(request, RequestOptions.DEFAULT);

    if (response.getResult() == DocWriteResponse.Result.DELETED) {
        logger.info("Product deleted: {}", id);
    }
}

3. Search Operations

Simple Search

public List<Product> searchProducts(String query) throws IOException {
    SearchRequest searchRequest = new SearchRequest("products");
    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();

    sourceBuilder.query(QueryBuilders.matchQuery("name", query));
    sourceBuilder.from(0);
    sourceBuilder.size(20);

    searchRequest.source(sourceBuilder);
    SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);

    List<Product> products = new ArrayList<>();
    for (SearchHit hit : response.getHits().getHits()) {
        Product product = objectMapper.readValue(hit.getSourceAsString(), Product.class);
        products.add(product);
    }

    return products;
}

Complex Search with Filters

public SearchResult searchProductsAdvanced(ProductSearchRequest request) throws IOException {
    SearchRequest searchRequest = new SearchRequest("products");
    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();

    // Build bool query
    BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();

    // Text search
    if (request.getQuery() != null) {
        boolQuery.must(QueryBuilders.multiMatchQuery(request.getQuery())
                .field("name", 3.0f)
                .field("description")
                .fuzziness(Fuzziness.AUTO));
    }

    // Filters
    if (request.getCategory() != null) {
        boolQuery.filter(QueryBuilders.termQuery("category.keyword", request.getCategory()));
    }

    if (request.getMinPrice() != null || request.getMaxPrice() != null) {
        RangeQueryBuilder rangeQuery = QueryBuilders.rangeQuery("price");
        if (request.getMinPrice() != null) {
            rangeQuery.gte(request.getMinPrice());
        }
        if (request.getMaxPrice() != null) {
            rangeQuery.lte(request.getMaxPrice());
        }
        boolQuery.filter(rangeQuery);
    }

    if (request.isInStockOnly()) {
        boolQuery.filter(QueryBuilders.termQuery("in_stock", true));
    }

    sourceBuilder.query(boolQuery);

    // Sorting
    if ("price_asc".equals(request.getSort())) {
        sourceBuilder.sort("price", SortOrder.ASC);
    } else if ("price_desc".equals(request.getSort())) {
        sourceBuilder.sort("price", SortOrder.DESC);
    } else {
        sourceBuilder.sort("_score", SortOrder.DESC);
    }

    // Pagination
    sourceBuilder.from(request.getPage() * request.getSize());
    sourceBuilder.size(request.getSize());

    // Aggregations
    sourceBuilder.aggregation(
        AggregationBuilders.terms("categories")
            .field("category.keyword")
            .size(10)
    );

    sourceBuilder.aggregation(
        AggregationBuilders.range("price_ranges")
            .field("price")
            .addRange(0, 100)
            .addRange(100, 500)
            .addRange(500, 1000)
            .addUnboundedFrom(1000)
    );

    searchRequest.source(sourceBuilder);
    SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);

    // Parse results
    return parseSearchResponse(response);
}

4. Spring Data Elasticsearch (Simpler)

Entity

@Document(indexName = "products")
public class Product {

    @Id
    private String id;

    @Field(type = FieldType.Text)
    private String name;

    @Field(type = FieldType.Text)
    private String description;

    @Field(type = FieldType.Keyword)
    private String category;

    @Field(type = FieldType.Float)
    private Double price;

    @Field(type = FieldType.Boolean)
    private Boolean inStock;

    // Getters and setters
}

Repository

public interface ProductRepository extends ElasticsearchRepository<Product, String> {

    // Method name query
    List<Product> findByCategory(String category);

    List<Product> findByPriceBetween(Double minPrice, Double maxPrice);

    List<Product> findByNameContaining(String name);

    // Custom query
    @Query("{\"bool\": {\"must\": [{\"match\": {\"name\": \"?0\"}}]}}")
    List<Product> searchByName(String name);
}

Service

@Service
public class ProductService {

    @Autowired
    private ProductRepository repository;

    public List<Product> searchProducts(String query) {
        return repository.findByNameContaining(query);
    }

    public List<Product> getProductsByCategory(String category) {
        return repository.findByCategory(category);
    }

    public void saveProduct(Product product) {
        repository.save(product);
    }
}

Best Practices

1. Index Design

Use Time-Based Indices for Logs

❌ BAD: Single index
logs (all logs from day 1)

✅ GOOD: Time-based indices
logs-2024.02.01
logs-2024.02.02
logs-2024.02.03
...

Benefits:
- Easy to delete old data (delete entire index)
- Better performance (smaller indices)
- Easier backup/restore

Separate Read/Write Indices

// Write to alias
PUT /products-write-000001
{
  "aliases": {
    "products-write": {}
  }
}

// Read from alias
PUT /products-read
{
  "aliases": {
    "products-read": {}
  }
}

// Application uses aliases
GET /products-read/_search
POST /products-write/_doc

2. Mapping Design

Define Mappings Explicitly

// ❌ BAD: Auto-mapping (Elasticsearch guesses types)
PUT /products/_doc/1
{
  "name": "iPhone",
  "price": "1199"  // Will be mapped as text, not number!
}

// ✅ GOOD: Explicit mapping
PUT /products
{
  "mappings": {
    "properties": {
      "name": { "type": "text" },
      "price": { "type": "float" },
      "category": {
        "type": "text",
        "fields": {
          "keyword": { "type": "keyword" }
        }
      }
    }
  }
}

Use Keyword for Exact Matches

{
  "category": {
    "type": "text",              // For full-text search
    "fields": {
      "keyword": { "type": "keyword" }  // For exact match, aggregations
    }
  }
}

// Full-text search
GET /products/_search
{
  "query": { "match": { "category": "electronics" }}
}

// Exact match / aggregation
GET /products/_search
{
  "query": { "term": { "category.keyword": "Electronics" }},
  "aggs": {
    "categories": { "terms": { "field": "category.keyword" }}
  }
}

3. Query Optimization

Use Filters for Non-Scoring Queries

// ❌ SLOW: Everything in must (computes scores unnecessarily)
{
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "laptop" }},
        { "term": { "in_stock": true }},
        { "range": { "price": { "lte": 1000 }}}
      ]
    }
  }
}

// ✅ FAST: Use filter for exact matches (cached, no scoring)
{
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "laptop" }}  // Only this needs scoring
      ],
      "filter": [
        { "term": { "in_stock": true }},
        { "range": { "price": { "lte": 1000 }}}
      ]
    }
  }
}

Avoid Deep Pagination

// ❌ BAD: Deep pagination (slow)
GET /products/_search
{
  "from": 10000,
  "size": 10
}

// ✅ GOOD: Use search_after
GET /products/_search
{
  "size": 10,
  "sort": [{ "price": "asc" }, { "_id": "asc" }],
  "search_after": [1199, "abc123"]
}

Use Bulk API for Indexing

// ❌ BAD: Individual requests (slow)
for (Product product : products) {
    client.index(new IndexRequest("products").source(...));
}

// ✅ GOOD: Bulk request (fast)
BulkRequest bulkRequest = new BulkRequest();
for (Product product : products) {
    bulkRequest.add(new IndexRequest("products").source(...));
}
client.bulk(bulkRequest);

4. Performance Tuning

Increase Refresh Interval for Bulk Indexing

// Default: refresh every 1 second
PUT /products/_settings
{
  "index": {
    "refresh_interval": "30s"  // During bulk indexing
  }
}

// After bulk indexing complete
PUT /products/_settings
{
  "index": {
    "refresh_interval": "1s"  // Back to normal
  }
}

Disable Replicas During Initial Load

PUT /products/_settings
{
  "index": {
    "number_of_replicas": 0  // During bulk load
  }
}

// After load complete
PUT /products/_settings
{
  "index": {
    "number_of_replicas": 1  // Enable replicas
  }
}

5. Monitoring

Monitor Cluster Health

@Scheduled(fixedRate = 60000)
public void monitorClusterHealth() throws IOException {
    ClusterHealthRequest request = new ClusterHealthRequest();
    ClusterHealthResponse response = client.cluster().health(request, RequestOptions.DEFAULT);

    ClusterHealthStatus status = response.getStatus();
    int numberOfNodes = response.getNumberOfNodes();
    int numberOfDataNodes = response.getNumberOfDataNodes();

    logger.info("Cluster status: {}, Nodes: {}, Data Nodes: {}", 
                status, numberOfNodes, numberOfDataNodes);

    if (status == ClusterHealthStatus.RED) {
        alertOps("Elasticsearch cluster is RED!");
    }
}

Monitor Index Stats

public void monitorIndexStats() throws IOException {
    IndicesStatsRequest request = new IndicesStatsRequest();
    request.indices("products");

    IndicesStatsResponse response = client.indices().stats(request, RequestOptions.DEFAULT);

    CommonStats stats = response.getTotal();
    long docCount = stats.getDocs().getCount();
    long storeSize = stats.getStore().getSizeInBytes();

    logger.info("Index stats - Docs: {}, Size: {} MB", 
                docCount, storeSize / 1024 / 1024);
}

Common Pitfalls

Pitfall 1: Using Text Field for Exact Match

// ❌ WRONG: This won't work as expected
{
  "query": {
    "term": { "category": "Electronics" }
  }
}

// Why: "Electronics" is analyzed to "electronics"
// term query looks for exact "Electronics" → No match!

// ✅ CORRECT: Use keyword field
{
  "query": {
    "term": { "category.keyword": "Electronics" }
  }
}

// OR use match query on text field
{
  "query": {
    "match": { "category": "Electronics" }
  }
}

Pitfall 2: Not Handling Null Values

// ❌ BAD: NullPointerException if field missing
SearchHit hit = ...;
String category = hit.getSourceAsMap().get("category").toString();

// ✅ GOOD: Check for null
Map<String, Object> source = hit.getSourceAsMap();
String category = source.containsKey("category") 
    ? source.get("category").toString() 
    : "Unknown";

Pitfall 3: Split Brain Problem

Scenario: Network partition in cluster

Node 1, 2, 3
Network splits: [Node 1, 2] vs [Node 3]

Both form their own clusters → Data inconsistency!

Solution: Set minimum_master_nodes
minimum_master_nodes = (total_nodes / 2) + 1
For 3 nodes: (3 / 2) + 1 = 2

Pitfall 4: Not Closing Client

// ❌ BAD: Resource leak
RestHighLevelClient client = new RestHighLevelClient(...);
// Use client
// Never closed → Resource leak

// ✅ GOOD: Always close
@PreDestroy
public void cleanup() throws IOException {
    if (client != null) {
        client.close();
    }
}

Pitfall 5: Over-Sharding

// ❌ BAD: Too many shards for small index
PUT /products
{
  "settings": {
    "number_of_shards": 50  // For 1000 documents!
  }
}

// ✅ GOOD: Right-size shards
PUT /products
{
  "settings": {
    "number_of_shards": 1  // 1 shard sufficient for small datasets
  }
}

// Rule of thumb:
// - Shard size: 10-50 GB
// - Small index (< 1GB): 1 shard
// - Medium (1-50GB): 1-5 shards
// - Large (> 50GB): Calculate: total_size / 30GB

Architecture Patterns

Pattern 1: Elasticsearch as Search Layer

┌─────────────┐
│   Client    │
└──────┬──────┘
       │
       ├──────────────────┬─────────────────┐
       │                  │                 │
       ▼                  ▼                 ▼
┌──────────────┐   ┌─────────────┐   ┌─────────────┐
│   Write API  │   │  Read API   │   │ Search API  │
└──────┬───────┘   └──────┬──────┘   └──────┬──────┘
       │                  │                 │
       ▼                  ▼                 ▼
┌──────────────┐   ┌─────────────┐   ┌─────────────┐
│  PostgreSQL  │   │ PostgreSQL  │   │Elasticsearch│
│  (Primary)   │   │ (Read Replica)  │(Search Only)│
└──────┬───────┘   └─────────────┘   └──────▲──────┘
       │                                     │
       └─────────────────────────────────────┘
                 Sync (Logstash/Kafka)

Use when:

Need ACID transactions
Relational data model
Fast search required

Pattern 2: Change Data Capture (CDC)

PostgreSQL
    │
    ├─── Write operation
    │
    ▼
WAL (Write-Ahead Log)
    │
    ▼
Debezium (CDC)
    │
    ▼
Kafka
    │
    ▼
Kafka Consumer
    │
    ▼
Elasticsearch

Implementation:

@Service
public class ProductSyncService {

    @KafkaListener(topics = "postgres.products")
    public void syncProduct(ProductChangeEvent event) {
        if (event.getOperation() == Operation.CREATE || 
            event.getOperation() == Operation.UPDATE) {
            elasticsearchService.indexProduct(event.getProduct());
        } else if (event.getOperation() == Operation.DELETE) {
            elasticsearchService.deleteProduct(event.getProductId());
        }
    }
}

Pattern 3: Event Sourcing

Command → Event Store (PostgreSQL)
            │
            ├─── Event Published
            │
            ▼
        Event Bus (Kafka)
            │
            ├─────────────┬─────────────┐
            ▼             ▼             ▼
    Projection 1    Projection 2    Elasticsearch
   (PostgreSQL)     (MongoDB)      (Search View)

Summary

Elasticsearch Strengths ✅

Full-text search (typo tolerance, relevance)
Fast aggregations
Real-time analytics
Log analysis
Geospatial queries
Autocomplete/suggestions
Scalability (horizontal)

Elasticsearch Weaknesses ❌

Not ACID compliant
No JOINs (denormalization required)
Higher resource usage (RAM, disk)
Eventual consistency
Complex operations (transactions, referential integrity)

Golden Rules

✅ Use Elasticsearch for search, not as primary database
✅ Denormalize data - Embed related data in documents
✅ Use filters for non-scoring queries - Cached and faster
✅ Define mappings explicitly - Don't rely on auto-mapping
✅ Monitor cluster health - RED/YELLOW/GREEN status
✅ Use bulk API - For batch operations
✅ Close clients - Prevent resource leaks
✅ Right-size shards - 10-50GB per shard
❌ Don't use for transactions - Use RDBMS
❌ Don't over-shard - Too many shards = poor performance

When to Use Elasticsearch

Scenario	Use Elasticsearch?	Alternative
Product search	✅ Yes	-
Log analysis	✅ Yes	-
Autocomplete	✅ Yes	-
User authentication	❌ No	PostgreSQL
Financial transactions	❌ No	PostgreSQL
Shopping cart	❌ No	Redis/PostgreSQL
Analytics dashboard	✅ Yes	-
Social network	❌ No	Graph DB (Neo4j)
Order processing	❌ No	PostgreSQL
Content search (CMS)	✅ Yes	-

Final Recommendation: Use Elasticsearch as a specialized search layer on top of your primary database, not as a replacement for it. This gives you the best of both worlds: ACID transactions in your RDBMS and lightning-fast search in Elasticsearch.

Complete Guide to Elasticsearch

Table of Contents

What is Elasticsearch?

Simple Analogy

Key Characteristics

Why Use Elasticsearch?

1. Full-Text Search Capabilities

Traditional SQL

Elasticsearch

2. Search Speed

3. Complex Search Requirements

4. Log Analysis & Monitoring

5. Real-Time Analytics

6. Autocomplete & Suggestions

7. Geospatial Search

When NOT to Use Elasticsearch

❌ Don't Use Elasticsearch For:

Core Concepts

SQL vs Elasticsearch Terminology

Document Example

Index Structure

Elasticsearch vs SQL - Side by Side

1. Simple SELECT

2. SELECT with WHERE Clause

3. LIKE Query (Partial Match)

4. Full-Text Search (Advanced)

5. ORDER BY

6. Pagination

7. COUNT

8. GROUP BY (Aggregations)

9. BETWEEN

10. IN Clause

11. NOT / Negation

12. AND / OR Logic

13. Nested Queries (Subqueries)

14. JOIN (Parent-Child Relationship)

15. Date Range Query

Query Types Explained

1. Match Query (Full-Text Search)

2. Term Query (Exact Match)

3. Bool Query (Combine Multiple Queries)

4. Range Query

5. Multi-Match Query

6. Fuzzy Query (Typo Tolerance)

7. Wildcard Query

8. Prefix Query

9. Exists Query

10. Aggregations (GROUP BY on steroids)

Real-World Use Cases

1. E-Commerce Product Search

Index Mapping

Search Query

2. Log Analysis (ELK Stack)

Index Mapping

Query: Find Errors in Last Hour

Query: Slow Requests

3. Content Management System

Index Mapping

Query: Find Related Articles

4. Autocomplete Search

Performance Impact

Index Size Comparison

Query Performance

Write Performance

Memory Requirements

CPU Impact

Implementation Guide

1. Java Client Setup

Maven Dependency

Spring Boot Configuration

2. Document Operations

Index a Document (INSERT)

Bulk Index (Batch INSERT)

Get a Document (SELECT by ID)

Update a Document

Delete a Document

3. Search Operations

Simple Search

Complex Search with Filters

4. Spring Data Elasticsearch (Simpler)