Complete Guide to Elasticsearch
Table of Contents
- What is Elasticsearch?
- Why Use Elasticsearch?
- When NOT to Use Elasticsearch
- Core Concepts
- Elasticsearch vs SQL - Side by Side
- Query Types Explained
- Real-World Use Cases
- Performance Impact
- Implementation Guide
- Best Practices
- Common Pitfalls
What is Elasticsearch?
Elasticsearch is a distributed, open-source search and analytics engine built on Apache Lucene. It's designed for horizontal scalability, maximum reliability, and real-time search.
Simple Analogy
- SQL Database = Library with a card catalog (search by exact criteria)
- Elasticsearch = Google for your data (fuzzy search, typo-tolerant, relevance-ranked)
Key Characteristics
- Document-oriented - Stores data as JSON documents
- Full-text search - Optimized for text search and analysis
- Near real-time - Data is searchable within 1 second of indexing
- Distributed - Scales horizontally across multiple nodes
- Schema-free - Dynamic mapping (but can define schema)
Why Use Elasticsearch?
1. Full-Text Search Capabilities
Traditional SQL
-- ❌ Poor full-text search
SELECT * FROM products
WHERE description LIKE '%laptop%';
-- Problems:
-- - No relevance scoring
-- - No typo tolerance
-- - Doesn't handle "laptops", "LAPTOP", "lap-top"
-- - Slow on large datasets
Elasticsearch
// ✅ Powerful full-text search
GET /products/_search
{
"query": {
"match": {
"description": "laptop"
}
}
}
// Automatically handles:
// - "laptop", "laptops", "LAPTOP", "Laptop"
// - Typos: "loptop" → finds "laptop"
// - Relevance scoring
// - Fast even with millions of documents
2. Search Speed
Dataset: 10 million product records
SQL LIKE Query:
SELECT * FROM products WHERE description LIKE '%gaming laptop%';
→ Full table scan: 15-30 seconds
Elasticsearch:
GET /products/_search { "query": { "match": { "description": "gaming laptop" }}}
→ Inverted index: 50-200 milliseconds
Speed: 150x faster!
3. Complex Search Requirements
Scenario: E-commerce product search
- Search across multiple fields (title, description, brand)
- Fuzzy matching (typo tolerance)
- Boost certain fields (title more important than description)
- Filter by price range, category, rating
- Sort by relevance, price, or rating
- Faceted search (aggregations)
SQL: Requires complex, slow queries with multiple JOINs and LIKE clauses
Elasticsearch: Built for this exact use case
4. Log Analysis & Monitoring
Use Case: Analyze 1 billion application logs
SQL Database:
- Insert: Slow (indexes slow down writes)
- Query: "Find all errors in the last hour" → Slow
- Aggregation: "Count errors by service" → Very slow
- Storage: Expensive for time-series data
Elasticsearch:
- Insert: Fast (optimized for writes)
- Query: Near instant (time-based indexing)
- Aggregation: Milliseconds (built-in analytics)
- Storage: Time-based indices, easy to delete old data
5. Real-Time Analytics
// SQL: Complex GROUP BY query (slow)
SELECT
category,
AVG(price),
COUNT(*)
FROM products
WHERE created_at > NOW() - INTERVAL 1 DAY
GROUP BY category;
// Elasticsearch: Fast aggregations
GET /products/_search
{
"query": {
"range": {
"created_at": { "gte": "now-1d" }
}
},
"aggs": {
"by_category": {
"terms": { "field": "category" },
"aggs": {
"avg_price": { "avg": { "field": "price" }}
}
}
}
}
// Returns in milliseconds, even with millions of docs
6. Autocomplete & Suggestions
// User types "iph" → Suggest "iPhone 15 Pro Max"
GET /products/_search
{
"suggest": {
"product-suggest": {
"prefix": "iph",
"completion": {
"field": "name.suggest"
}
}
}
}
// SQL: Nearly impossible to implement efficiently
7. Geospatial Search
// Find restaurants within 5km
GET /restaurants/_search
{
"query": {
"bool": {
"must": { "match_all": {} },
"filter": {
"geo_distance": {
"distance": "5km",
"location": {
"lat": 23.8103,
"lon": 90.4125
}
}
}
}
}
}
// SQL: Requires PostGIS extension, much slower
When NOT to Use Elasticsearch
❌ Don't Use Elasticsearch For:
1. Transactional Data (ACID Requirements)
Use Case: Banking transactions, order processing
Problem: Elasticsearch is eventually consistent, not ACID
Solution: Use PostgreSQL/MySQL for transactions, sync to ES for search
2. Primary Data Store
Use Case: Customer records, inventory
Problem: No foreign keys, joins are expensive, data can be lost
Solution: Use RDBMS as source of truth, ES as search layer
3. Complex Relationships
Use Case: Social network (users, friends, posts, comments, likes)
Problem: No JOINs, denormalization required
Solution: Use graph database (Neo4j) or RDBMS
4. Small Datasets (< 10,000 records)
Use Case: Small product catalog
Problem: Overhead not worth it
Solution: PostgreSQL with full-text search is sufficient
5. Strong Consistency Requirements
Use Case: Inventory management (exact stock counts)
Problem: Near real-time (1s delay), eventual consistency
Solution: RDBMS for writes, ES for search (sync with delay acceptable)
6. Frequent Updates to Same Document
Use Case: Real-time stock prices, live scores
Problem: Each update creates a new document version (inefficient)
Solution: Use Redis or time-series database
Core Concepts
SQL vs Elasticsearch Terminology
| SQL | Elasticsearch | Description |
|---|---|---|
| Database | Index | Container for data |
| Table | Type (deprecated in 7.x+) | Collection of documents |
| Row | Document | Single record (JSON) |
| Column | Field | Attribute of document |
| Schema | Mapping | Field definitions |
| Index | Inverted Index | Data structure for fast search |
| SELECT | Query DSL | Retrieve data |
| INSERT | Index API | Add data |
| UPDATE | Update API | Modify data |
| DELETE | Delete API | Remove data |
Document Example
SQL Row:
id | name | price | category | in_stock
1 | iPhone 15 Pro Max | 1199 | Electronics | true
Elasticsearch Document:
{
"_index": "products",
"_id": "1",
"_source": {
"name": "iPhone 15 Pro Max",
"price": 1199,
"category": "Electronics",
"in_stock": true,
"tags": ["smartphone", "apple", "5g"],
"specs": {
"screen": "6.7 inch",
"storage": "256GB"
}
}
}
Index Structure
SQL Database:
database: ecommerce
├── table: products
├── table: orders
└── table: customers
Elasticsearch:
cluster: ecommerce-cluster
├── index: products
│ ├── shard 0 (primary)
│ ├── shard 0 (replica)
│ ├── shard 1 (primary)
│ └── shard 1 (replica)
├── index: orders
└── index: customers
Elasticsearch vs SQL - Side by Side
1. Simple SELECT
SQL:
SELECT * FROM products WHERE id = 1;
Elasticsearch:
GET /products/_doc/1
2. SELECT with WHERE Clause
SQL:
SELECT * FROM products
WHERE category = 'Electronics'
AND price < 1000;
Elasticsearch:
GET /products/_search
{
"query": {
"bool": {
"must": [
{ "term": { "category": "Electronics" }},
{ "range": { "price": { "lt": 1000 }}}
]
}
}
}
3. LIKE Query (Partial Match)
SQL:
SELECT * FROM products
WHERE name LIKE '%iPhone%';
Elasticsearch:
GET /products/_search
{
"query": {
"match": {
"name": "iPhone"
}
}
}
OR (more precise):
GET /products/_search
{
"query": {
"wildcard": {
"name": "*iPhone*"
}
}
}
4. Full-Text Search (Advanced)
SQL:
-- ❌ This is painful and slow
SELECT * FROM products
WHERE LOWER(name) LIKE '%gaming%laptop%'
OR LOWER(description) LIKE '%gaming%laptop%'
ORDER BY (
CASE WHEN name LIKE '%gaming laptop%' THEN 1
WHEN name LIKE '%gaming%' THEN 2
ELSE 3 END
);
Elasticsearch:
// ✅ Clean and fast
GET /products/_search
{
"query": {
"multi_match": {
"query": "gaming laptop",
"fields": ["name^2", "description"],
"fuzziness": "AUTO"
}
}
}
// Explanation:
// - Searches both name and description
// - name^2 = boost name field 2x (more important)
// - fuzziness: AUTO = tolerates typos
// - Automatic relevance scoring
5. ORDER BY
SQL:
SELECT * FROM products
ORDER BY price DESC
LIMIT 10;
Elasticsearch:
GET /products/_search
{
"query": { "match_all": {} },
"sort": [
{ "price": "desc" }
],
"size": 10
}
6. Pagination
SQL:
SELECT * FROM products
LIMIT 10 OFFSET 20;
Elasticsearch:
GET /products/_search
{
"query": { "match_all": {} },
"from": 20,
"size": 10
}
⚠️ Warning: Deep pagination (from > 10,000) is slow in Elasticsearch.
Solution: Use search_after (cursor-based pagination)
// First page
GET /products/_search
{
"query": { "match_all": {} },
"size": 10,
"sort": [{ "price": "asc" }, { "_id": "asc" }]
}
// Next page (using last doc's sort values)
GET /products/_search
{
"query": { "match_all": {} },
"size": 10,
"sort": [{ "price": "asc" }, { "_id": "asc" }],
"search_after": [1199, "abc123"]
}
7. COUNT
SQL:
SELECT COUNT(*) FROM products
WHERE category = 'Electronics';
Elasticsearch:
GET /products/_count
{
"query": {
"term": { "category": "Electronics" }
}
}
8. GROUP BY (Aggregations)
SQL:
SELECT
category,
COUNT(*) as count,
AVG(price) as avg_price,
MAX(price) as max_price
FROM products
GROUP BY category;
Elasticsearch:
GET /products/_search
{
"size": 0,
"aggs": {
"by_category": {
"terms": { "field": "category" },
"aggs": {
"avg_price": { "avg": { "field": "price" }},
"max_price": { "max": { "field": "price" }}
}
}
}
}
Response:
{
"aggregations": {
"by_category": {
"buckets": [
{
"key": "Electronics",
"doc_count": 1234,
"avg_price": { "value": 599.99 },
"max_price": { "value": 1999.99 }
},
{
"key": "Clothing",
"doc_count": 5678,
"avg_price": { "value": 49.99 },
"max_price": { "value": 299.99 }
}
]
}
}
}
9. BETWEEN
SQL:
SELECT * FROM products
WHERE price BETWEEN 100 AND 500;
Elasticsearch:
GET /products/_search
{
"query": {
"range": {
"price": {
"gte": 100,
"lte": 500
}
}
}
}
10. IN Clause
SQL:
SELECT * FROM products
WHERE category IN ('Electronics', 'Books', 'Toys');
Elasticsearch:
GET /products/_search
{
"query": {
"terms": {
"category": ["Electronics", "Books", "Toys"]
}
}
}
11. NOT / Negation
SQL:
SELECT * FROM products
WHERE category != 'Clothing';
Elasticsearch:
GET /products/_search
{
"query": {
"bool": {
"must_not": [
{ "term": { "category": "Clothing" }}
]
}
}
}
12. AND / OR Logic
SQL:
SELECT * FROM products
WHERE (category = 'Electronics' OR category = 'Books')
AND price < 100;
Elasticsearch:
GET /products/_search
{
"query": {
"bool": {
"must": [
{ "range": { "price": { "lt": 100 }}}
],
"should": [
{ "term": { "category": "Electronics" }},
{ "term": { "category": "Books" }}
],
"minimum_should_match": 1
}
}
}
13. Nested Queries (Subqueries)
SQL:
SELECT * FROM products
WHERE price > (
SELECT AVG(price) FROM products
);
Elasticsearch:
// Two-step approach (no subqueries in ES)
// Step 1: Get average
GET /products/_search
{
"size": 0,
"aggs": {
"avg_price": { "avg": { "field": "price" }}
}
}
// Returns: avg_price = 150
// Step 2: Query with result
GET /products/_search
{
"query": {
"range": {
"price": { "gt": 150 }
}
}
}
14. JOIN (Parent-Child Relationship)
SQL:
SELECT
orders.id,
orders.total,
customers.name
FROM orders
JOIN customers ON orders.customer_id = customers.id
WHERE customers.country = 'USA';
Elasticsearch:
// ❌ Elasticsearch doesn't support JOINs like SQL
// ✅ Use denormalization (embed customer data in order)
// Document structure:
{
"_index": "orders",
"_id": "order123",
"_source": {
"order_id": "order123",
"total": 299.99,
"customer": {
"id": "cust456",
"name": "John Doe",
"country": "USA"
}
}
}
// Query:
GET /orders/_search
{
"query": {
"term": { "customer.country": "USA" }
}
}
OR use Parent-Child relationship (complex):
// Define mapping
PUT /orders
{
"mappings": {
"properties": {
"join_field": {
"type": "join",
"relations": {
"customer": "order"
}
}
}
}
}
// Query with has_parent
GET /orders/_search
{
"query": {
"has_parent": {
"parent_type": "customer",
"query": {
"term": { "country": "USA" }
}
}
}
}
15. Date Range Query
SQL:
SELECT * FROM orders
WHERE created_at >= '2024-01-01'
AND created_at < '2024-02-01';
Elasticsearch:
GET /orders/_search
{
"query": {
"range": {
"created_at": {
"gte": "2024-01-01",
"lt": "2024-02-01"
}
}
}
}
OR using relative dates:
GET /orders/_search
{
"query": {
"range": {
"created_at": {
"gte": "now-7d",
"lte": "now"
}
}
}
}
Query Types Explained
1. Match Query (Full-Text Search)
What: Analyzes the search text and finds relevant documents
GET /products/_search
{
"query": {
"match": {
"description": "gaming laptop"
}
}
}
How it works:
- Analyzes "gaming laptop" → ["gaming", "laptop"]
- Searches for documents containing either word
- Scores by relevance (both words > one word)
Use when:
- Full-text search across analyzed fields
- Natural language queries
- Typo tolerance needed
2. Term Query (Exact Match)
What: Exact match on a single term (not analyzed)
GET /products/_search
{
"query": {
"term": {
"category.keyword": "Electronics"
}
}
}
Use when:
- Filtering by exact values (IDs, statuses, categories)
- Keyword fields
- Not for analyzed text fields
⚠️ Common mistake:
// ❌ WRONG: Won't find "Electronics" if field is analyzed
{
"query": {
"term": { "category": "Electronics" }
}
}
// ✅ CORRECT: Use .keyword subfield or match query
{
"query": {
"term": { "category.keyword": "Electronics" }
}
}
3. Bool Query (Combine Multiple Queries)
What: Combines queries with boolean logic
GET /products/_search
{
"query": {
"bool": {
"must": [
// AND logic - must match
{ "match": { "name": "laptop" }}
],
"should": [
// OR logic - nice to have (boosts score)
{ "match": { "brand": "Apple" }}
],
"must_not": [
// NOT logic - exclude
{ "term": { "in_stock": false }}
],
"filter": [
// AND logic - must match (no scoring)
{ "range": { "price": { "lte": 2000 }}}
]
}
}
}
Clauses:
-
must: Documents MUST match (affects score) -
should: Documents SHOULD match (boosts score) -
must_not: Documents MUST NOT match (filter) -
filter: Documents MUST match (no scoring, cacheable)
Use when:
- Combining multiple conditions
- Complex search logic
4. Range Query
What: Find documents within a range
GET /products/_search
{
"query": {
"range": {
"price": {
"gte": 100, // Greater than or equal
"lte": 1000 // Less than or equal
}
}
}
}
Operators:
-
gte: >= (greater than or equal) -
gt: > (greater than) -
lte: <= (less than or equal) -
lt: < (less than)
Use when:
- Price ranges
- Date ranges
- Numeric ranges
5. Multi-Match Query
What: Search across multiple fields
GET /products/_search
{
"query": {
"multi_match": {
"query": "gaming laptop",
"fields": ["name^3", "description", "brand^2"],
"fuzziness": "AUTO"
}
}
}
Field boosting:
-
name^3: Boost name field 3x -
brand^2: Boost brand field 2x -
description: No boost (1x)
Use when:
- Searching across multiple fields
- Different field importance
- Google-like search
6. Fuzzy Query (Typo Tolerance)
What: Finds terms within edit distance
GET /products/_search
{
"query": {
"fuzzy": {
"name": {
"value": "loptop",
"fuzziness": "AUTO"
}
}
}
}
Finds: "laptop", "laptops" (even with typo "loptop")
Fuzziness values:
-
0: No typos allowed -
1: 1 character difference -
2: 2 character differences -
AUTO: Adaptive (0 for 1-2 chars, 1 for 3-5 chars, 2 for 6+ chars)
Use when:
- User input may have typos
- Autocomplete/suggestions
7. Wildcard Query
What: Pattern matching with wildcards
GET /products/_search
{
"query": {
"wildcard": {
"name": "*Phone*"
}
}
}
Wildcards:
-
*: Any number of characters -
?: Single character
⚠️ Warning: Slow! Avoid leading wildcards (*Phone)
Use when:
- Pattern matching
- Prefix/suffix search (but prefer prefix query for performance)
8. Prefix Query
What: Matches terms starting with prefix
GET /products/_search
{
"query": {
"prefix": {
"name": "iph"
}
}
}
Finds: "iPhone", "iPhone 15", "iPhones"
Use when:
- Autocomplete
- Typeahead search
- Fast prefix matching
9. Exists Query
What: Checks if field exists
GET /products/_search
{
"query": {
"exists": {
"field": "discount"
}
}
}
SQL equivalent: WHERE discount IS NOT NULL
Use when:
- Check for presence of field
- Filter documents with/without specific fields
10. Aggregations (GROUP BY on steroids)
What: Analytics and statistics
GET /products/_search
{
"size": 0,
"aggs": {
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{ "to": 100 },
{ "from": 100, "to": 500 },
{ "from": 500 }
]
}
},
"popular_categories": {
"terms": {
"field": "category.keyword",
"size": 10
}
},
"stats": {
"stats": { "field": "price" }
}
}
}
Response:
{
"aggregations": {
"price_ranges": {
"buckets": [
{ "key": "*-100.0", "doc_count": 234 },
{ "key": "100.0-500.0", "doc_count": 567 },
{ "key": "500.0-*", "doc_count": 123 }
]
},
"popular_categories": {
"buckets": [
{ "key": "Electronics", "doc_count": 456 },
{ "key": "Books", "doc_count": 234 }
]
},
"stats": {
"count": 924,
"min": 9.99,
"max": 1999.99,
"avg": 299.50,
"sum": 276738.0
}
}
}
Types:
- Metric aggregations: avg, max, min, sum, stats
- Bucket aggregations: terms, range, date_histogram
- Pipeline aggregations: derivative, cumulative_sum
Real-World Use Cases
1. E-Commerce Product Search
Requirements:
- Search by product name, description
- Filter by category, price, brand
- Faceted search (category count, price ranges)
- Autocomplete
- "Did you mean?" suggestions
Implementation:
Index Mapping
PUT /products
{
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": { "type": "keyword" },
"suggest": { "type": "completion" }
}
},
"description": { "type": "text" },
"category": {
"type": "text",
"fields": { "keyword": { "type": "keyword" }}
},
"brand": { "type": "keyword" },
"price": { "type": "float" },
"rating": { "type": "float" },
"in_stock": { "type": "boolean" },
"tags": { "type": "keyword" }
}
}
}
Search Query
GET /products/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "gaming laptop",
"fields": ["name^3", "description", "tags^2"],
"fuzziness": "AUTO"
}
}
],
"filter": [
{ "term": { "in_stock": true }},
{ "range": { "price": { "lte": 2000 }}},
{ "terms": { "brand": ["Dell", "HP", "Lenovo"] }}
]
}
},
"aggs": {
"categories": {
"terms": { "field": "category.keyword" }
},
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{ "key": "Under $500", "to": 500 },
{ "key": "$500-$1000", "from": 500, "to": 1000 },
{ "key": "$1000-$2000", "from": 1000, "to": 2000 },
{ "key": "Over $2000", "from": 2000 }
]
}
},
"brands": {
"terms": { "field": "brand", "size": 20 }
}
},
"sort": [
{ "_score": "desc" },
{ "rating": "desc" }
],
"from": 0,
"size": 20
}
2. Log Analysis (ELK Stack)
Scenario: Analyze application logs
Index Mapping
PUT /logs-2024.02
{
"mappings": {
"properties": {
"timestamp": { "type": "date" },
"level": { "type": "keyword" },
"service": { "type": "keyword" },
"message": { "type": "text" },
"user_id": { "type": "keyword" },
"ip_address": { "type": "ip" },
"response_time": { "type": "integer" }
}
}
}
Query: Find Errors in Last Hour
GET /logs-2024.02/_search
{
"query": {
"bool": {
"must": [
{ "term": { "level": "ERROR" }},
{ "range": { "timestamp": { "gte": "now-1h" }}}
]
}
},
"aggs": {
"errors_by_service": {
"terms": { "field": "service" }
},
"errors_over_time": {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "5m"
}
}
}
}
Query: Slow Requests
GET /logs-2024.02/_search
{
"query": {
"range": {
"response_time": { "gte": 5000 }
}
},
"aggs": {
"slow_endpoints": {
"terms": { "field": "endpoint.keyword" }
}
}
}
3. Content Management System
Scenario: Blog/news website
Index Mapping
PUT /articles
{
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": { "keyword": { "type": "keyword" }}
},
"content": { "type": "text" },
"author": { "type": "keyword" },
"tags": { "type": "keyword" },
"published_at": { "type": "date" },
"view_count": { "type": "integer" },
"category": { "type": "keyword" }
}
}
}
Query: Find Related Articles
GET /articles/_search
{
"query": {
"more_like_this": {
"fields": ["title", "content", "tags"],
"like": [
{
"_index": "articles",
"_id": "current-article-id"
}
],
"min_term_freq": 2,
"max_query_terms": 12
}
},
"size": 5
}
4. Autocomplete Search
// Index document with completion suggester
PUT /products/_doc/1
{
"name": "iPhone 15 Pro Max",
"name_suggest": {
"input": ["iPhone", "iPhone 15", "iPhone Pro", "iPhone 15 Pro Max"],
"weight": 10
}
}
// Autocomplete query
GET /products/_search
{
"suggest": {
"product-suggest": {
"prefix": "iph",
"completion": {
"field": "name_suggest",
"size": 5,
"fuzzy": {
"fuzziness": "AUTO"
}
}
}
}
}
Performance Impact
Index Size Comparison
Dataset: 1 million products
PostgreSQL:
- Table size: 500 MB
- Index size (B-tree): 200 MB
- Total: 700 MB
Elasticsearch:
- Document storage: 600 MB
- Inverted index: 800 MB
- Field data cache: 200 MB
- Total: 1.6 GB
Elasticsearch uses ~2x more disk space
Query Performance
Query: Full-text search "gaming laptop" across 10M documents
PostgreSQL (LIKE):
SELECT * FROM products
WHERE description LIKE '%gaming%laptop%';
→ Time: 15-30 seconds (full table scan)
PostgreSQL (Full-Text Search):
SELECT * FROM products
WHERE to_tsvector(description) @@ to_tsquery('gaming & laptop');
→ Time: 2-5 seconds (better, but still slow)
Elasticsearch:
GET /products/_search { "query": { "match": { "description": "gaming laptop" }}}
→ Time: 50-200 milliseconds
Speed improvement: 50-100x faster
Write Performance
Scenario: Insert 100,000 documents
PostgreSQL:
- With indexes: 60-120 seconds
- Without indexes: 10-20 seconds (but queries slow)
Elasticsearch:
- Bulk indexing: 5-15 seconds
- Individual inserts: 40-60 seconds
Elasticsearch is faster for writes (optimized for append-only workloads)
Memory Requirements
Small deployment (1M documents):
- PostgreSQL: 2-4 GB RAM
- Elasticsearch: 4-8 GB RAM (needs heap + OS cache)
Medium deployment (10M documents):
- PostgreSQL: 8-16 GB RAM
- Elasticsearch: 16-32 GB RAM
Large deployment (100M documents):
- PostgreSQL: 32-64 GB RAM
- Elasticsearch: 64-128 GB RAM (distributed across nodes)
Rule of thumb: Elasticsearch needs 2x RAM of PostgreSQL
CPU Impact
CPU Usage Pattern:
PostgreSQL:
- Idle: 1-5%
- Search query: 20-40% (single core)
- Complex aggregation: 60-80% (single core)
Elasticsearch:
- Idle: 5-10%
- Search query: 10-30% (distributed across cores)
- Complex aggregation: 40-60% (parallel processing)
Elasticsearch utilizes multiple cores better (distributed)
Implementation Guide
1. Java Client Setup
Maven Dependency
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>7.17.15</version>
</dependency>
<!-- OR use new Java API Client (ES 8+) -->
<dependency>
<groupId>co.elastic.clients</groupId>
<artifactId>elasticsearch-java</artifactId>
<version>8.11.0</version>
</dependency>
Spring Boot Configuration
@Configuration
public class ElasticsearchConfig {
@Bean
public RestHighLevelClient elasticsearchClient() {
ClientConfiguration clientConfiguration = ClientConfiguration.builder()
.connectedTo("localhost:9200")
.withConnectTimeout(Duration.ofSeconds(5))
.withSocketTimeout(Duration.ofSeconds(30))
.build();
return RestClients.create(clientConfiguration).rest();
}
}
2. Document Operations
Index a Document (INSERT)
@Service
public class ProductService {
@Autowired
private RestHighLevelClient client;
public void indexProduct(Product product) throws IOException {
IndexRequest request = new IndexRequest("products")
.id(product.getId())
.source(objectMapper.writeValueAsString(product), XContentType.JSON);
IndexResponse response = client.index(request, RequestOptions.DEFAULT);
// Check result
if (response.getResult() == DocWriteResponse.Result.CREATED) {
logger.info("Product indexed: {}", product.getId());
}
}
}
Bulk Index (Batch INSERT)
public void bulkIndexProducts(List<Product> products) throws IOException {
BulkRequest bulkRequest = new BulkRequest();
for (Product product : products) {
IndexRequest request = new IndexRequest("products")
.id(product.getId())
.source(objectMapper.writeValueAsString(product), XContentType.JSON);
bulkRequest.add(request);
}
BulkResponse bulkResponse = client.bulk(bulkRequest, RequestOptions.DEFAULT);
if (bulkResponse.hasFailures()) {
logger.error("Bulk indexing failures: {}", bulkResponse.buildFailureMessage());
} else {
logger.info("Indexed {} products", products.size());
}
}
Get a Document (SELECT by ID)
public Product getProduct(String id) throws IOException {
GetRequest request = new GetRequest("products", id);
GetResponse response = client.get(request, RequestOptions.DEFAULT);
if (response.isExists()) {
String sourceAsString = response.getSourceAsString();
return objectMapper.readValue(sourceAsString, Product.class);
}
return null;
}
Update a Document
public void updateProduct(String id, Map<String, Object> updates) throws IOException {
UpdateRequest request = new UpdateRequest("products", id)
.doc(updates);
UpdateResponse response = client.update(request, RequestOptions.DEFAULT);
if (response.getResult() == DocWriteResponse.Result.UPDATED) {
logger.info("Product updated: {}", id);
}
}
Delete a Document
public void deleteProduct(String id) throws IOException {
DeleteRequest request = new DeleteRequest("products", id);
DeleteResponse response = client.delete(request, RequestOptions.DEFAULT);
if (response.getResult() == DocWriteResponse.Result.DELETED) {
logger.info("Product deleted: {}", id);
}
}
3. Search Operations
Simple Search
public List<Product> searchProducts(String query) throws IOException {
SearchRequest searchRequest = new SearchRequest("products");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.matchQuery("name", query));
sourceBuilder.from(0);
sourceBuilder.size(20);
searchRequest.source(sourceBuilder);
SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
List<Product> products = new ArrayList<>();
for (SearchHit hit : response.getHits().getHits()) {
Product product = objectMapper.readValue(hit.getSourceAsString(), Product.class);
products.add(product);
}
return products;
}
Complex Search with Filters
public SearchResult searchProductsAdvanced(ProductSearchRequest request) throws IOException {
SearchRequest searchRequest = new SearchRequest("products");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
// Build bool query
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
// Text search
if (request.getQuery() != null) {
boolQuery.must(QueryBuilders.multiMatchQuery(request.getQuery())
.field("name", 3.0f)
.field("description")
.fuzziness(Fuzziness.AUTO));
}
// Filters
if (request.getCategory() != null) {
boolQuery.filter(QueryBuilders.termQuery("category.keyword", request.getCategory()));
}
if (request.getMinPrice() != null || request.getMaxPrice() != null) {
RangeQueryBuilder rangeQuery = QueryBuilders.rangeQuery("price");
if (request.getMinPrice() != null) {
rangeQuery.gte(request.getMinPrice());
}
if (request.getMaxPrice() != null) {
rangeQuery.lte(request.getMaxPrice());
}
boolQuery.filter(rangeQuery);
}
if (request.isInStockOnly()) {
boolQuery.filter(QueryBuilders.termQuery("in_stock", true));
}
sourceBuilder.query(boolQuery);
// Sorting
if ("price_asc".equals(request.getSort())) {
sourceBuilder.sort("price", SortOrder.ASC);
} else if ("price_desc".equals(request.getSort())) {
sourceBuilder.sort("price", SortOrder.DESC);
} else {
sourceBuilder.sort("_score", SortOrder.DESC);
}
// Pagination
sourceBuilder.from(request.getPage() * request.getSize());
sourceBuilder.size(request.getSize());
// Aggregations
sourceBuilder.aggregation(
AggregationBuilders.terms("categories")
.field("category.keyword")
.size(10)
);
sourceBuilder.aggregation(
AggregationBuilders.range("price_ranges")
.field("price")
.addRange(0, 100)
.addRange(100, 500)
.addRange(500, 1000)
.addUnboundedFrom(1000)
);
searchRequest.source(sourceBuilder);
SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
// Parse results
return parseSearchResponse(response);
}
4. Spring Data Elasticsearch (Simpler)
Entity
@Document(indexName = "products")
public class Product {
@Id
private String id;
@Field(type = FieldType.Text)
private String name;
@Field(type = FieldType.Text)
private String description;
@Field(type = FieldType.Keyword)
private String category;
@Field(type = FieldType.Float)
private Double price;
@Field(type = FieldType.Boolean)
private Boolean inStock;
// Getters and setters
}
Repository
public interface ProductRepository extends ElasticsearchRepository<Product, String> {
// Method name query
List<Product> findByCategory(String category);
List<Product> findByPriceBetween(Double minPrice, Double maxPrice);
List<Product> findByNameContaining(String name);
// Custom query
@Query("{\"bool\": {\"must\": [{\"match\": {\"name\": \"?0\"}}]}}")
List<Product> searchByName(String name);
}
Service
@Service
public class ProductService {
@Autowired
private ProductRepository repository;
public List<Product> searchProducts(String query) {
return repository.findByNameContaining(query);
}
public List<Product> getProductsByCategory(String category) {
return repository.findByCategory(category);
}
public void saveProduct(Product product) {
repository.save(product);
}
}
Best Practices
1. Index Design
Use Time-Based Indices for Logs
❌ BAD: Single index
logs (all logs from day 1)
✅ GOOD: Time-based indices
logs-2024.02.01
logs-2024.02.02
logs-2024.02.03
...
Benefits:
- Easy to delete old data (delete entire index)
- Better performance (smaller indices)
- Easier backup/restore
Separate Read/Write Indices
// Write to alias
PUT /products-write-000001
{
"aliases": {
"products-write": {}
}
}
// Read from alias
PUT /products-read
{
"aliases": {
"products-read": {}
}
}
// Application uses aliases
GET /products-read/_search
POST /products-write/_doc
2. Mapping Design
Define Mappings Explicitly
// ❌ BAD: Auto-mapping (Elasticsearch guesses types)
PUT /products/_doc/1
{
"name": "iPhone",
"price": "1199" // Will be mapped as text, not number!
}
// ✅ GOOD: Explicit mapping
PUT /products
{
"mappings": {
"properties": {
"name": { "type": "text" },
"price": { "type": "float" },
"category": {
"type": "text",
"fields": {
"keyword": { "type": "keyword" }
}
}
}
}
}
Use Keyword for Exact Matches
{
"category": {
"type": "text", // For full-text search
"fields": {
"keyword": { "type": "keyword" } // For exact match, aggregations
}
}
}
// Full-text search
GET /products/_search
{
"query": { "match": { "category": "electronics" }}
}
// Exact match / aggregation
GET /products/_search
{
"query": { "term": { "category.keyword": "Electronics" }},
"aggs": {
"categories": { "terms": { "field": "category.keyword" }}
}
}
3. Query Optimization
Use Filters for Non-Scoring Queries
// ❌ SLOW: Everything in must (computes scores unnecessarily)
{
"query": {
"bool": {
"must": [
{ "match": { "name": "laptop" }},
{ "term": { "in_stock": true }},
{ "range": { "price": { "lte": 1000 }}}
]
}
}
}
// ✅ FAST: Use filter for exact matches (cached, no scoring)
{
"query": {
"bool": {
"must": [
{ "match": { "name": "laptop" }} // Only this needs scoring
],
"filter": [
{ "term": { "in_stock": true }},
{ "range": { "price": { "lte": 1000 }}}
]
}
}
}
Avoid Deep Pagination
// ❌ BAD: Deep pagination (slow)
GET /products/_search
{
"from": 10000,
"size": 10
}
// ✅ GOOD: Use search_after
GET /products/_search
{
"size": 10,
"sort": [{ "price": "asc" }, { "_id": "asc" }],
"search_after": [1199, "abc123"]
}
Use Bulk API for Indexing
// ❌ BAD: Individual requests (slow)
for (Product product : products) {
client.index(new IndexRequest("products").source(...));
}
// ✅ GOOD: Bulk request (fast)
BulkRequest bulkRequest = new BulkRequest();
for (Product product : products) {
bulkRequest.add(new IndexRequest("products").source(...));
}
client.bulk(bulkRequest);
4. Performance Tuning
Increase Refresh Interval for Bulk Indexing
// Default: refresh every 1 second
PUT /products/_settings
{
"index": {
"refresh_interval": "30s" // During bulk indexing
}
}
// After bulk indexing complete
PUT /products/_settings
{
"index": {
"refresh_interval": "1s" // Back to normal
}
}
Disable Replicas During Initial Load
PUT /products/_settings
{
"index": {
"number_of_replicas": 0 // During bulk load
}
}
// After load complete
PUT /products/_settings
{
"index": {
"number_of_replicas": 1 // Enable replicas
}
}
5. Monitoring
Monitor Cluster Health
@Scheduled(fixedRate = 60000)
public void monitorClusterHealth() throws IOException {
ClusterHealthRequest request = new ClusterHealthRequest();
ClusterHealthResponse response = client.cluster().health(request, RequestOptions.DEFAULT);
ClusterHealthStatus status = response.getStatus();
int numberOfNodes = response.getNumberOfNodes();
int numberOfDataNodes = response.getNumberOfDataNodes();
logger.info("Cluster status: {}, Nodes: {}, Data Nodes: {}",
status, numberOfNodes, numberOfDataNodes);
if (status == ClusterHealthStatus.RED) {
alertOps("Elasticsearch cluster is RED!");
}
}
Monitor Index Stats
public void monitorIndexStats() throws IOException {
IndicesStatsRequest request = new IndicesStatsRequest();
request.indices("products");
IndicesStatsResponse response = client.indices().stats(request, RequestOptions.DEFAULT);
CommonStats stats = response.getTotal();
long docCount = stats.getDocs().getCount();
long storeSize = stats.getStore().getSizeInBytes();
logger.info("Index stats - Docs: {}, Size: {} MB",
docCount, storeSize / 1024 / 1024);
}
Common Pitfalls
Pitfall 1: Using Text Field for Exact Match
// ❌ WRONG: This won't work as expected
{
"query": {
"term": { "category": "Electronics" }
}
}
// Why: "Electronics" is analyzed to "electronics"
// term query looks for exact "Electronics" → No match!
// ✅ CORRECT: Use keyword field
{
"query": {
"term": { "category.keyword": "Electronics" }
}
}
// OR use match query on text field
{
"query": {
"match": { "category": "Electronics" }
}
}
Pitfall 2: Not Handling Null Values
// ❌ BAD: NullPointerException if field missing
SearchHit hit = ...;
String category = hit.getSourceAsMap().get("category").toString();
// ✅ GOOD: Check for null
Map<String, Object> source = hit.getSourceAsMap();
String category = source.containsKey("category")
? source.get("category").toString()
: "Unknown";
Pitfall 3: Split Brain Problem
Scenario: Network partition in cluster
Node 1, 2, 3
Network splits: [Node 1, 2] vs [Node 3]
Both form their own clusters → Data inconsistency!
Solution: Set minimum_master_nodes
minimum_master_nodes = (total_nodes / 2) + 1
For 3 nodes: (3 / 2) + 1 = 2
Pitfall 4: Not Closing Client
// ❌ BAD: Resource leak
RestHighLevelClient client = new RestHighLevelClient(...);
// Use client
// Never closed → Resource leak
// ✅ GOOD: Always close
@PreDestroy
public void cleanup() throws IOException {
if (client != null) {
client.close();
}
}
Pitfall 5: Over-Sharding
// ❌ BAD: Too many shards for small index
PUT /products
{
"settings": {
"number_of_shards": 50 // For 1000 documents!
}
}
// ✅ GOOD: Right-size shards
PUT /products
{
"settings": {
"number_of_shards": 1 // 1 shard sufficient for small datasets
}
}
// Rule of thumb:
// - Shard size: 10-50 GB
// - Small index (< 1GB): 1 shard
// - Medium (1-50GB): 1-5 shards
// - Large (> 50GB): Calculate: total_size / 30GB
Architecture Patterns
Pattern 1: Elasticsearch as Search Layer
┌─────────────┐
│ Client │
└──────┬──────┘
│
├──────────────────┬─────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌─────────────┐ ┌─────────────┐
│ Write API │ │ Read API │ │ Search API │
└──────┬───────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌─────────────┐ ┌─────────────┐
│ PostgreSQL │ │ PostgreSQL │ │Elasticsearch│
│ (Primary) │ │ (Read Replica) │(Search Only)│
└──────┬───────┘ └─────────────┘ └──────▲──────┘
│ │
└─────────────────────────────────────┘
Sync (Logstash/Kafka)
Use when:
- Need ACID transactions
- Relational data model
- Fast search required
Pattern 2: Change Data Capture (CDC)
PostgreSQL
│
├─── Write operation
│
▼
WAL (Write-Ahead Log)
│
▼
Debezium (CDC)
│
▼
Kafka
│
▼
Kafka Consumer
│
▼
Elasticsearch
Implementation:
@Service
public class ProductSyncService {
@KafkaListener(topics = "postgres.products")
public void syncProduct(ProductChangeEvent event) {
if (event.getOperation() == Operation.CREATE ||
event.getOperation() == Operation.UPDATE) {
elasticsearchService.indexProduct(event.getProduct());
} else if (event.getOperation() == Operation.DELETE) {
elasticsearchService.deleteProduct(event.getProductId());
}
}
}
Pattern 3: Event Sourcing
Command → Event Store (PostgreSQL)
│
├─── Event Published
│
▼
Event Bus (Kafka)
│
├─────────────┬─────────────┐
▼ ▼ ▼
Projection 1 Projection 2 Elasticsearch
(PostgreSQL) (MongoDB) (Search View)
Summary
Elasticsearch Strengths ✅
- Full-text search (typo tolerance, relevance)
- Fast aggregations
- Real-time analytics
- Log analysis
- Geospatial queries
- Autocomplete/suggestions
- Scalability (horizontal)
Elasticsearch Weaknesses ❌
- Not ACID compliant
- No JOINs (denormalization required)
- Higher resource usage (RAM, disk)
- Eventual consistency
- Complex operations (transactions, referential integrity)
Golden Rules
- ✅ Use Elasticsearch for search, not as primary database
- ✅ Denormalize data - Embed related data in documents
- ✅ Use filters for non-scoring queries - Cached and faster
- ✅ Define mappings explicitly - Don't rely on auto-mapping
- ✅ Monitor cluster health - RED/YELLOW/GREEN status
- ✅ Use bulk API - For batch operations
- ✅ Close clients - Prevent resource leaks
- ✅ Right-size shards - 10-50GB per shard
- ❌ Don't use for transactions - Use RDBMS
- ❌ Don't over-shard - Too many shards = poor performance
When to Use Elasticsearch
| Scenario | Use Elasticsearch? | Alternative |
|---|---|---|
| Product search | ✅ Yes | - |
| Log analysis | ✅ Yes | - |
| Autocomplete | ✅ Yes | - |
| User authentication | ❌ No | PostgreSQL |
| Financial transactions | ❌ No | PostgreSQL |
| Shopping cart | ❌ No | Redis/PostgreSQL |
| Analytics dashboard | ✅ Yes | - |
| Social network | ❌ No | Graph DB (Neo4j) |
| Order processing | ❌ No | PostgreSQL |
| Content search (CMS) | ✅ Yes | - |
Final Recommendation: Use Elasticsearch as a specialized search layer on top of your primary database, not as a replacement for it. This gives you the best of both worlds: ACID transactions in your RDBMS and lightning-fast search in Elasticsearch.
Top comments (0)