Prithvi S

Posted on Jun 6

Search Configuration Management in OpenSearch: Tuning Search Without Deploying Code

#opensearch #search #database #data

Introduction

Every engineering team has been there. A product manager says the search results are "off." Users are complaining. Someone suggests tweaking the field boost or adding a synonym. And then someone else says, "That requires a code change, a PR review, and a deployment."

What if it didn't? What if search tuning was a configuration change, not a code change? What if you could experiment with different search strategies, compare them side-by-side, and deploy the winner without touching a single line of production code?

This is exactly what the OpenSearch Search Relevance plugin enables through its search configuration management system. In this post, I will show you how to treat search as a configurable layer, how to version and test configurations safely, and how to move from guesswork to data-driven search tuning.

The Problem: Search Tuning Is Too Slow

In most organizations, improving search relevance follows a painfully slow cycle:

Identify a search quality problem (usually from user complaints)
An engineer proposes a fix (boosting, analyzer change, query rewrite)
Write the code, open a PR, wait for review
Deploy to staging, test manually with a few queries
Deploy to production, monitor for regressions
If it breaks, roll back and start over

This cycle takes days or weeks. And the feedback loop is fuzzy. Did the change actually improve search quality? Most teams have no systematic way to know. They rely on gut feeling and a handful of manual test queries.

The Configuration-First Approach

The Search Relevance plugin introduces a fundamental shift: search behavior is defined by configurations, not code. A search configuration is a declarative object that specifies how queries should be executed against an index. You create it, version it, test it, and apply it, all through the OpenSearch REST API or Dashboards UI.

What Is a Search Configuration?

At its core, a search configuration is a structured object that captures:

Analyzer settings: Which tokenizer, filters, and character mappings to use
Query type: Match, multi-match, bool, query_string, or custom
Field boosts: Relative weights for different fields (title vs. description vs. tags)
Query parameters: Fuzziness, minimum_should_match, operator (AND/OR)
Scoring functions: Script scoring, decay functions, field value factors
Filtering logic: Category filters, date ranges, custom constraints

These configurations are stored as documents in a dedicated system index (.search-relevance-config or similar). They are not embedded in application code. They are not compiled into your search service. They are first-class objects that you can manage independently.

Why This Matters

When search behavior is configuration-driven, several things become possible:

Non-engineers can tune search: Product managers, data analysts, or search specialists can create and test configurations without engineering support
No deployment required: A new configuration is just a document stored in OpenSearch. No code review, no CI/CD pipeline, no rollback risk
A/B testing is natural: Run two configurations side-by-side in an experiment and compare metrics. No feature flags, no code branches
History is automatic: Every configuration is a versioned document. You can see what changed, when, and by whom
Reversibility is instant: If a configuration underperforms, switch back to the previous one by referencing its ID

Creating Your First Search Configuration

Let me walk you through the practical process of creating and managing search configurations.

Step 1: Define the Baseline

Start with your current production search behavior. Capture it as a configuration document. This is your baseline, the benchmark against which all future changes are measured.

Here is a concrete example of a baseline configuration for an e-commerce product search:

{
  "name": "baseline-standard",
  "description": "Current production search with standard analyzer and basic field boosting",
  "index": "products",
  "query": {
    "type": "multi_match",
    "fields": ["title^2.0", "description^1.0", "tags^0.5"],
    "analyzer": "standard",
    "operator": "or",
    "minimum_should_match": "1"
  },
  "sort": [{"_score": "desc"}]
}

This configuration says: search across title, description, and tags. Title is twice as important as description. Tags are half as important. Use the standard analyzer. Match any single term (OR logic).

Notice that this is just a JSON document. No code. No compiled logic. No deployment.

Step 2: Create a Challenger

Now create a configuration that tests a hypothesis. For example, you might hypothesize that adding synonym expansion and switching to AND logic will improve precision for multi-word queries.

{
  "name": "challenger-synonym-and",
  "description": "Custom analyzer with synonym expansion and AND operator for multi-word queries",
  "index": "products",
  "query": {
    "type": "multi_match",
    "fields": ["title^3.0", "description^1.0", "tags^0.5"],
    "analyzer": "custom-synonym",
    "operator": "and",
    "minimum_should_match": "2<75%"
  },
  "sort": [{"_score": "desc"}]
}

Key differences from the baseline:

Custom analyzer with synonym expansion ("laptop" matches "notebook")
AND operator for stricter matching
Title boost increased to 3x (aggressive emphasis on product names)
Minimum should match: 2 terms or 75% for multi-word queries

Step 3: Store in OpenSearch

These configurations are stored as documents in the search relevance system index. You can create, update, and retrieve them via the REST API:

curl -X POST "http://localhost:9200/.search-relevance-config/_doc" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "challenger-synonym-and",
    "description": "Custom analyzer with synonym expansion...",
    "config": { ... }
  }'

The plugin assigns a unique ID. You reference this ID when creating experiments or applying configurations to search requests.

The Power of Configuration Versioning

One of the most underrated features of this system is implicit versioning. Every configuration is a distinct document with a unique ID, a creation timestamp, and a description. You are not overwriting configurations. You are creating new ones.

Benefits of Immutable Configurations

Audit trail: You can trace exactly which configuration was active at any point in time
Rollback safety: If a new configuration causes problems, you can instantly revert to the previous configuration ID
Parallel testing: Multiple configurations can coexist. You can run experiments comparing any two historical configurations
Cross-environment portability: A configuration created in development can be exported and imported into staging or production

Practical Versioning Workflow

Config v1 (baseline-standard) → Created April 1
Config v2 (challenger-synonym-and) → Created April 15
Config v3 (challenger-recency-boost) → Created April 22
Config v4 (challenger-synonym-and-v2) → Created May 1

At any point, you can run an experiment comparing v1 vs. v3, or v2 vs. v4. The history is preserved. The comparisons are always valid because the configurations are immutable.

Running Experiments Without Code Changes

This is where the configuration-driven approach shines. You create an experiment that references your query set and two configurations. The plugin orchestrates everything.

Experiment Definition

{
  "name": "baseline-vs-synonym-and",
  "description": "Testing synonym expansion and AND operator against baseline",
  "query_set": "ecommerce-top-100",
  "configurations": ["baseline-standard", "challenger-synonym-and"],
  "type": "PAIRWISE_COMPARISON",
  "metrics": ["nDCG", "Precision@10", "MRR"]
}

The experiment engine does the following:

Loads the query set (100 representative queries)
For each query, executes it against both configurations
Captures the top-K results from each configuration
Stores the results for judgment collection
Computes metrics once judgments are available

No code changes. No deployment. No manual testing. The entire comparison is automated and reproducible.

Applying Configurations to Production Search

Once you have identified a winning configuration through experiments, the final step is applying it to your production search traffic. The approach depends on your architecture.

Option 1: Configuration-Driven Search Service

If your application is already built to use the Search Relevance plugin, you can reference the configuration by ID in your search requests. The search service reads the configuration from the system index and applies it dynamically.

{
  "search_configuration": "challenger-synonym-and",
  "query": "wireless headphones"
}

The application does not need to know the details of the configuration. It just references the ID. The plugin handles the rest.

Option 2: Export and Integrate

If your application uses its own search logic, you can export the winning configuration and integrate the relevant settings into your application code. This is less ideal because it reintroduces code changes, but the experiment process has already validated that the configuration works. You are not guessing. You are deploying a proven winner.

Option 3: Gradual Rollout with A/B Testing

For the most advanced setups, you can route a percentage of traffic to the new configuration while keeping the rest on the baseline. Monitor metrics like click-through rate, conversion rate, and search refinement rate. This gives you real-world validation beyond the controlled experiment.

Real-World Scenario: E-Commerce Search Tuning

Let me walk through a realistic scenario that shows the power of configuration-driven search tuning.

The Problem

An e-commerce platform sells electronics. The search team notices that users searching for "developer laptop" find good results, but users searching for "programming laptop" get irrelevant results. The synonym gap is hurting conversion.

The Configuration Approach

Baseline: Current production configuration with standard analyzer and no synonyms
Challenger: Custom analyzer with synonym expansion ("developer" ↔ "programming" ↔ "coding")
Challenger 2: Same as challenger 1 but with title boost increased to 3x

The Experiment

The team creates a query set of 50 queries related to laptops and developer tools. They run a pairwise comparison experiment: baseline vs. challenger 1, and baseline vs. challenger 2.

Results

Configuration	nDCG@10	Precision@10	MRR
Baseline	0.68	0.72	0.55
Challenger 1 (synonym)	0.74	0.75	0.61
Challenger 2 (synonym + boost)	0.76	0.78	0.63

Challenger 2 wins. The team applies the configuration to production. No code deployment. No risk. Just a configuration change.

The Follow-Up

Two weeks later, the team notices that users searching for "budget laptop" are seeing high-end products. They hypothesize that adding a price-based filter to the configuration will help. They create a new challenger, run an experiment, and iterate again. The cycle continues, but each iteration is fast, safe, and measurable.

Common Pitfalls and How to Avoid Them

Pitfall 1: Configuration Sprawl

It is easy to create dozens of configurations and lose track of which ones are active. Use clear naming conventions and descriptions. Delete or archive obsolete configurations regularly.

Pitfall 2: Ignoring Index Consistency

Experiments only work if both configurations query the same index with the same data. If you change the index mappings or reindex data between experiments, the comparison is invalid.

Pitfall 3: Over-Optimizing for One Metric

A configuration that maximizes nDCG might hurt diversity or recall. A configuration that maximizes precision might hurt coverage. Use at least two metrics to capture the full picture.

Pitfall 4: Forgetting About Performance

A configuration with better relevance but 10x slower query time is not a win. Track latency alongside metrics. The plugin's concurrent segment search (enabled by default in OpenSearch 3.0) helps, but monitor CPU usage.

When Configuration-Driven Search Is Not Enough

Configuration-driven search tuning is powerful, but it has limits. It works best for:

Adjusting analyzers, boosts, and query parameters
Adding synonyms, stop words, or custom token filters
Tuning ranking functions and scoring logic
Filtering and faceting rules

It does not replace code when you need:

Custom query parsers or entirely new query types
External data integration (e.g., user personalization signals)
Complex reranking models that require machine learning
Real-time feature computation

For these cases, the Search Relevance plugin still helps by providing the experiment framework. You can test the impact of custom code changes using the same query sets and metrics. The configuration layer and the code layer complement each other.

Getting Started Today

If you are running OpenSearch and you want to move to a configuration-driven search tuning model, here is your starting checklist:

Install the Search Relevance plugin: Follow the official documentation at https://github.com/opensearch-project/search-relevance
Capture your baseline: Document your current search behavior as a configuration object
Build a query set: Start with 30-50 representative queries from your search logs
Create one challenger: Pick one hypothesis to test (synonyms, boosts, query type)
Run an experiment: Use the plugin to compare baseline vs. challenger
Collect judgments: Have domain experts grade a subset of results
Review metrics: Did the challenger improve nDCG or Precision@K?
Apply the winner: Update your search service to use the winning configuration

The entire cycle can be completed in a single day. No code deployment. No engineering bottleneck. Just fast, measurable search improvement.

Conclusion

Search tuning should not require a code deployment. It should be a configuration change, tested, measured, and applied with confidence. The OpenSearch Search Relevance plugin makes this possible by treating search configurations as first-class objects: versioned, testable, and instantly deployable.

If your team is still tuning search through code changes and manual testing, you are moving too slowly. The configuration-driven approach gives you speed, safety, and measurability. Start with one experiment. Prove the value. Then scale it across your organization.

Your search quality will improve. Your engineers will thank you. And your users will finally find what they are looking for.

About the Author:
I'm Prithvi S, Staff Software Engineer at Cloudera and Opensource Enthusiast. I maintain the dashboards-search-relevance plugin for OpenSearch and contribute to the search-relevance backend. Follow my work on GitHub: https://github.com/iprithv

DEV Community