Anh Trần Tuấn

Posted on May 22, 2025 • Originally published at tuanh.net on May 21, 2025

Strategies for Designing a Scalable Search System using Elasticsearch and Java

#codeproject #devops #availabilitydatabase

1. Horizontal Scaling with Sharding

Sharding is a fundamental strategy for scaling a search system. It involves splitting the data into smaller pieces, called shards, and distributing them across multiple nodes. This allows the system to process queries in parallel, significantly reducing the load on any single machine.

1.1 How Sharding Works

Each shard is a subset of the entire dataset. When a search query is made, the system routes the query to the relevant shards, processes the results independently on each shard, and then merges the results before returning them to the user. This parallelism enables the system to handle large datasets efficiently.

For example, in Elasticsearch, you can create an index with multiple shards like this:

PUT /products
{
  "settings": {
    "number_of_shards": 5
  },
  "mappings": {
    "properties": {
      "name": { "type": "text" },
      "price": { "type": "float" }
    }
  }
}

In this setup, the data is distributed across five shards. Queries can be processed simultaneously on all shards, improving the overall performance as the dataset grows.

1.2 Choosing a Shard Key

The shard key determines how data is distributed across the shards. It is important to select a shard key that ensures data is evenly distributed. A poor shard key can result in uneven load distribution, with some shards becoming "hot" (overloaded with queries) while others remain underutilized.

For instance, if you are designing a system for a global e-commerce platform, a user ID might not be a good shard key if certain users generate far more queries than others. Instead, a neutral key, such as a hash of the product ID or location, might result in a more even distribution of load across the shards.

1.3 Monitoring Shard Distribution

Effective monitoring of shard performance is essential for ensuring that data is evenly distributed and that no shard is overloaded. Tools like Elasticsearch’s Kibana can help visualize shard allocation and performance metrics. By monitoring query latencies and CPU usage per shard, you can detect imbalances early and take corrective action, such as re-sharding or migrating data.

2. Replication for High Availability

While sharding focuses on distributing data for performance, replication ensures that the system remains highly available even in the event of hardware failure. Replication creates copies of each shard, called replica shards, which are stored on different nodes.

2.1 How Replication Works

When replication is enabled, the system creates one or more copies of each shard. If a node that holds a primary shard goes down, the replica shard can take over, ensuring that data remains accessible and the system continues to serve search queries.

In Elasticsearch, you can enable replication like this:

PUT /products
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 2
  }
}

Here, we create two replicas for each primary shard. This setup ensures that even if a server fails, there are two backup copies available, keeping the system resilient.

2.2 Balancing Availability and Performance

While replicas are crucial for high availability, they also play a role in balancing query load. Both primary and replica shards can serve read queries, meaning the system can distribute search requests across both types of shards, which reduces the load on the primary shards.

However, it’s essential to balance replication carefully. Too many replicas can consume unnecessary resources, while too few replicas may not provide sufficient redundancy. A common best practice is to have one or two replicas per shard, depending on the criticality of your data and the scale of your infrastructure.

3. Caching for Performance Optimization

One of the most effective ways to improve search performance, especially for frequent queries, is by implementing a caching layer. Caching stores the results of common queries, enabling the system to serve these results from memory rather than executing a full search.

3.1 Implementing Caching in Java

In a Spring Boot application, you can easily implement caching using Redis. Here’s an example of how you can cache search queries:

import org.springframework.cache.annotation.Cacheable;

@Service
public class ProductService {

    @Cacheable(value = "searchCache", key = "#searchTerm")
    public List<Product> searchProducts(String searchTerm) {
        // Simulate a search query
        return productRepository.findByTerm(searchTerm);
    }
}

In this example, Redis is used to cache the results of search queries based on the searchTerm. If the same query is made again, the result is served from the cache, drastically reducing the load on the search engine.

3.2 Optimizing Cache Lifetime

It’s essential to manage the cache expiration strategy to avoid serving stale data. For a search system, you might want to invalidate the cache periodically or when data changes. For example, product information might be updated daily, so setting a cache expiration of 24 hours could balance performance and freshness of data.

4. Indexing Strategies for Better Performance

Efficient indexing is the backbone of any scalable search system. How you structure and store your data in the index affects both the speed of query execution and the system’s ability to scale.

4.1 Full-Text Search Optimization

Full-text search engines like Elasticsearch use inverted indices to quickly locate documents that match a search query. An inverted index maps each term in your documents to the list of documents that contain that term, allowing for fast lookups.

To optimize full-text search, you can configure analyzers that control how text is tokenized and indexed. For instance, you may want to implement stemming or remove stop words to reduce the size of the index and improve query performance.

Here’s an example of configuring a custom analyzer in Elasticsearch:

PUT /products
{
  "settings": {
    "analysis": {
      "analyzer": {
        "custom_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "stop", "snowball"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "description": { "type": "text", "analyzer": "custom_analyzer" }
    }
  }
}

In this configuration, we create a custom analyzer that applies stemming (via the Snowball filter), removes common stop words, and ensures that all tokens are lowercase. This optimization reduces the size of the index while improving search accuracy.

4.2 Reindexing for Performance

As your data evolves, reindexing becomes necessary to maintain performance. Reindexing is the process of rebuilding the index from scratch, which can help optimize how the data is stored, remove fragmentation, and apply new indexing strategies.

In Elasticsearch, you can reindex using the following command:

POST /_reindex
{
  "source": {
    "index": "old_products"
  },
  "dest": {
    "index": "new_products"
  }
}

This reindex operation moves data from the old_products index to a new index new_products, applying any updated settings or mappings in the process.

5. Conclusion

Designing a scalable search system requires careful planning and the implementation of strategies like sharding, replication, caching, and efficient indexing. By focusing on these best practices, you can ensure that your search system can scale as your data and query volumes grow, maintaining both high performance and availability.

If you have any questions or need further clarification on any of the techniques discussed, feel free to comment below!

Read posts more at : Strategies for Designing a Scalable Search System using Elasticsearch and Java

DEV Community