<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Aakanksha Singh</title>
    <description>The latest articles on DEV Community by Aakanksha Singh (@aakanksha02).</description>
    <link>https://dev.to/aakanksha02</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3797506%2F019d41d6-859f-429a-981b-d33a262be93f.png</url>
      <title>DEV Community: Aakanksha Singh</title>
      <link>https://dev.to/aakanksha02</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aakanksha02"/>
    <language>en</language>
    <item>
      <title>The Cost-Based Revolution: Why Hybrid Search Optimization is the Missing Piece in Vector Databases</title>
      <dc:creator>Aakanksha Singh</dc:creator>
      <pubDate>Sat, 28 Feb 2026 04:47:03 +0000</pubDate>
      <link>https://dev.to/aakanksha02/the-cost-based-revolution-why-hybrid-search-optimization-is-the-missing-piece-in-vector-databases-p3b</link>
      <guid>https://dev.to/aakanksha02/the-cost-based-revolution-why-hybrid-search-optimization-is-the-missing-piece-in-vector-databases-p3b</guid>
      <description>&lt;p&gt;&lt;em&gt;This blog post was submitted to the Elastic Blogathon Contest and is eligible to win a prize.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Author Intro
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Aakanksha Singh&lt;/strong&gt; | Third Year Computer Engineering Student, KJ Somaiya College of Engineering, Mumbai&lt;/p&gt;

&lt;p&gt;I have a relentless habit of going three layers deeper than the tutorial to understand why systems fail in production. The cost-based routing principles in this post grew out of a custom adaptive query optimizer I built from scratch for MariaDB. The moment I mapped those same ideas onto Elasticsearch's hybrid search layer, everything clicked. This is that story.&lt;/p&gt;




&lt;h2&gt;
  
  
  Abstract
&lt;/h2&gt;

&lt;p&gt;As knowledge bases scale, pure vector databases suffer from "semantic collapse," prioritizing vague similarity over exact factual relevance. This blog demonstrates why cost-aware hybrid search, fusing lexical precision with semantic recall, is the missing architectural layer in production RAG systems, and how Elasticsearch 9.x features including Better Binary Quantization (BBQ), ACORN filtering, and the Linear Retriever solve these failures at scale.&lt;/p&gt;




&lt;h2&gt;
  
  
  Content Body
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The 3 AM Postmortem Nobody Wants to Write
&lt;/h3&gt;

&lt;p&gt;It is a Tuesday. Your RAG-powered support assistant has been in production for six weeks. Then a ticket lands: &lt;em&gt;"The AI is recommending closed bug fixes to customers with unrelated problems. Latency is 28 seconds."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You connect to the cluster. Nothing is down. No OOM errors. Elasticsearch is green. And yet, your system is recommending a 2019 network timeout fix to a customer asking about a 2024 authentication failure, with a 95% confidence score.&lt;/p&gt;

&lt;p&gt;This is not a bug. This is &lt;strong&gt;Semantic Collapse&lt;/strong&gt;, and it is the most expensive silent failure in production AI systems today.&lt;/p&gt;




&lt;h3&gt;
  
  
  Why Vector-Only Search Is Fundamentally Incomplete
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Failure Mode 1: Semantic Collapse&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Embedding models compress entire paragraphs into fixed-dimensional arrays, mathematically diluting rare, specific tokens. Stanford research documents a critical threshold: precision drops by 87% once a document corpus exceeds 50,000 items. High-dimensional embeddings lose discriminative power at scale.&lt;/p&gt;

&lt;p&gt;Consider a developer querying: &lt;em&gt;"TimeoutException in CoreService.java line 408."&lt;/em&gt; A vector search grasps the conceptual neighborhood (Java exceptions, debugging) but misses the specific file name and line number, because those rare tokens lack the statistical weight to shift the dense vector. BM25 finds it instantly. The inverted index does not care about statistical rarity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failure Mode 2: The Recall-Precision Iron Triangle&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Achieving high recall in pure vector search requires massive candidate pools and heavy cross-encoder reranking. This incurs severe latency penalties. Shrink the pool to cut latency and you lose recall. There is no free lunch, and vector-only architectures make you pay full price on all three dimensions simultaneously.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failure Mode 3: Zero Cost Awareness&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Vector search has no model of its own execution cost relative to query shape. A query with a filter matching 0.5% of your corpus still triggers a full ANN scan over 1 million vectors. The system guesses. At scale, it guesses expensively.&lt;/p&gt;




&lt;h3&gt;
  
  
  The Selectivity Insight: One Number to Rule Them All
&lt;/h3&gt;

&lt;p&gt;The single most actionable cost signal is &lt;strong&gt;filter selectivity&lt;/strong&gt;: the fraction of your corpus that survives pre-filter conditions. Computing it costs less than 2ms via a COUNT query against the inverted index.&lt;/p&gt;

&lt;p&gt;The cost model, calibrated on commodity hardware:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Filter-First Cost = (N x 0.01ms) + (k x 2.0ms)
Vector-First Cost = (N x 2.5ms) + (k x 0.01ms)

Where N = total corpus, k = documents passing filter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Selectivity&lt;/th&gt;
&lt;th&gt;k (of 1M docs)&lt;/th&gt;
&lt;th&gt;Vector-First&lt;/th&gt;
&lt;th&gt;Filter-First&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0.1%&lt;/td&gt;
&lt;td&gt;1,000&lt;/td&gt;
&lt;td&gt;~2,500ms&lt;/td&gt;
&lt;td&gt;~12ms&lt;/td&gt;
&lt;td&gt;208,000x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1%&lt;/td&gt;
&lt;td&gt;10,000&lt;/td&gt;
&lt;td&gt;~2,500ms&lt;/td&gt;
&lt;td&gt;~21ms&lt;/td&gt;
&lt;td&gt;119,000x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10%&lt;/td&gt;
&lt;td&gt;100,000&lt;/td&gt;
&lt;td&gt;~2,500ms&lt;/td&gt;
&lt;td&gt;~201ms&lt;/td&gt;
&lt;td&gt;12,400x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50%&lt;/td&gt;
&lt;td&gt;500,000&lt;/td&gt;
&lt;td&gt;~2,500ms&lt;/td&gt;
&lt;td&gt;~1,001ms&lt;/td&gt;
&lt;td&gt;2,500x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;99%+&lt;/td&gt;
&lt;td&gt;990,000&lt;/td&gt;
&lt;td&gt;~2,500ms&lt;/td&gt;
&lt;td&gt;~2,500ms&lt;/td&gt;
&lt;td&gt;break-even&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Filter-First is optimal for virtually every real production query. Vector-First only reaches parity when the filter matches the entire corpus, at which point the filter provides no value anyway.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Crossover Visualized:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg5ykc7zywrpcdqewqve3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg5ykc7zywrpcdqewqve3.png" alt="Figure 1: Latency vs Filter Selectivity (1M Document Corpus)" width="800" height="571"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Figure:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vector-First&lt;/strong&gt; - flat line at ~2500ms (ANN over full corpus)
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filter-First&lt;/strong&gt; - latency increases with selectivity&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;The Vector-First line is flat. It does not care how selective your filter is. The gap between those two lines represents wasted compute on every production query.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;My first attempt at hybrid search made this worse, not better. I kept &lt;code&gt;num_candidates&lt;/code&gt; at 600, layered RRF on top of both retrievers, and watched latency triple and costs double. Deprecated docs still appeared in the top 5. The problem was not the model. It was that I had combined two expensive retrievers with no cost model connecting them. The fix was one COUNT query before every search decision. Under 2ms. Everything else followed.&lt;/p&gt;


&lt;h3&gt;
  
  
  Elastic's Hybrid Search Architecture: Under the Hood
&lt;/h3&gt;

&lt;p&gt;Elasticsearch has assembled a set of production-grade innovations that purpose-built vector databases cannot match.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Better Binary Quantization (BBQ)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Full-precision float32 vectors at 1536 dimensions consume 6KB per document. At 138 million documents, that is 828 GB of RAM. BBQ, now the default for 384+ dimension indices in Elasticsearch, compresses stored vectors to single-bit representations using a dynamically calculated centroid, achieving a 32x reduction in memory footprint. Incoming query vectors are quantized to 4-bit (int4), enabling comparisons via fast bitwise dot products.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Without BBQ: 138M vectors x 1536 dims x 4 bytes = ~828 GB RAM
With BBQ:    138M vectors x 1536 dims x 1 bit  = ~26 GB RAM
             32x compression. Near-zero recall loss.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. ACORN: In-Graph Filter Pruning&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Traditional filtered ANN applies filters as post-processing. For a filter with 2% selectivity, you throw away 98% of your compute budget after the work is done. Elasticsearch's ACORN algorithm integrates filter constraints directly into HNSW graph traversal, evaluating them node-by-node and pruning entire branches before computing similarity scores. Result: up to 5x speedup on selective queries. For RBAC-enforced enterprise search, this is not optional.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. GPU-Accelerated Indexing via NVIDIA cuVS (Elastic 9.3)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;HNSW graph construction at ingestion time creates CPU backpressure. Elastic 9.3 integrates NVIDIA cuVS, offloading this to the GPU. This delivers 12x improvement in indexing throughput and 7x faster force merging, freeing CPU cycles for concurrent search.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Linear Retriever with MinMax Normalization&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;RRF discards absolute relevance scores, merging lists purely by rank position. A BM25 score of 24.7 and a BM25 score of 3.1 are treated identically if they both ranked first. The native &lt;code&gt;linear&lt;/code&gt; retriever preserves score magnitude. The &lt;code&gt;minmax&lt;/code&gt; normalizer compresses unbounded BM25 scores into [0, 1], making them arithmetically compatible with cosine similarity for true weighted fusion:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;final_score = (0.7 x normalized_bm25) + (0.3 x normalized_vector)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Full Pipeline Architecture:&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyi2kgzfnyl5279endwom.png" alt="Figure 2: Cost-Aware Hybrid Search Pipeline in Elasticsearch" width="800" height="432"&gt;
&lt;/h2&gt;

&lt;p&gt;This diagram shows the end-to-end retrieval architecture for cost-based hybrid search:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The user query is analyzed to extract embeddings, keywords, and structured filters.&lt;/li&gt;
&lt;li&gt;A lightweight COUNT query against the inverted index estimates filter selectivity in under 2 ms.&lt;/li&gt;
&lt;li&gt;A cost-based optimizer routes the query to one of two execution paths:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Filter-First (default):&lt;/strong&gt; Filters are injected directly into the kNN retriever, activating ACORN in-graph pruning during HNSW traversal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector-First (rare):&lt;/strong&gt; Used only when selectivity exceeds ~80%, where filtering provides little benefit.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Lexical (BM25) and semantic (vector) results are fused using the &lt;code&gt;linear&lt;/code&gt; retriever with MinMax normalization.&lt;/li&gt;
&lt;li&gt;The final ranked results are returned as clean, high-precision context for downstream RAG or agentic pipelines.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Key insight:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Cost awareness is introduced &lt;em&gt;before&lt;/em&gt; retrieval begins, not after. This single architectural shift prevents wasted ANN computation and enables predictable latency at scale. Without cost-based routing, hybrid search systems execute the most expensive possible strategy by default. This architecture makes cost a first-class signal, the same way relational databases have done for decades.&lt;/p&gt;
&lt;h3&gt;
  
  
  Production Implementation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Cost-Based Hybrid Retriever:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;elasticsearch&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Elasticsearch&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CostAwareHybridRetriever&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR_DIST_MS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;
    &lt;span class="n"&gt;FILTER_MS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;es_host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;es&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Elasticsearch&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;es_host&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_estimate_selectivity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;COUNT against inverted index. Costs under 2ms.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;es&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;}}}&lt;/span&gt;
        &lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;es&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_select_strategy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;selectivity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FILTER_FIRST&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;selectivity&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.80&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VECTOR_FIRST&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;filters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;filters&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="n"&gt;selectivity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_estimate_selectivity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;strategy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_select_strategy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;selectivity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Filter goes INSIDE knn block for FILTER_FIRST (activates ACORN)
&lt;/span&gt;        &lt;span class="c1"&gt;# Filter goes to post_filter for VECTOR_FIRST
&lt;/span&gt;        &lt;span class="n"&gt;knn_block&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;field&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content_embedding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query_vector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;num_candidates&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;150&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;}}}&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;strategy&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FILTER_FIRST&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retriever&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;linear&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrievers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                        &lt;span class="p"&gt;{&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retriever&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;standard&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;multi_match&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fields&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title^1.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                                        &lt;span class="p"&gt;}&lt;/span&gt;
                                    &lt;span class="p"&gt;}&lt;/span&gt;
                                &lt;span class="p"&gt;}&lt;/span&gt;
                            &lt;span class="p"&gt;},&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weight&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;normalizer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;minmax&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                        &lt;span class="p"&gt;},&lt;/span&gt;
                        &lt;span class="p"&gt;{&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retriever&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;knn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;knn_block&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weight&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;
                        &lt;span class="p"&gt;}&lt;/span&gt;
                    &lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;size&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;strategy&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VECTOR_FIRST&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;post_filter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;es&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Usage on Elastic Cloud:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;retriever&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CostAwareHybridRetriever&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://your-deployment.es.io:9243&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;support-tickets&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;query_text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TimeoutException CoreService.java line 408&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TimeoutException CoreService.java line 408&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jina-embeddings-v3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;term&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_version&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;4.x&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;term&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;resolved&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;range&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;created_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gte&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2023-01-01&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}}}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Strategy selected: FILTER_FIRST (selectivity: 3.2%)
# Latency: 91ms vs 1,410ms baseline
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Sample Output and Demo
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;GitHub Reference Implementation:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://github.com/aakanksha-singh-hub/Adaptive-Query-Optimizer_MariaDB" rel="noopener noreferrer"&gt;github.com/aakanksha-singh-hub/Adaptive-Query-Optimizer_MariaDB&lt;/a&gt;&lt;br&gt;
(Cost-based selectivity routing principles, applied to SQL query optimization)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Elastic Cloud Sample Response:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"took"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"strategy_used"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"FILTER_FIRST"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"estimated_selectivity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"3.2%"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hits"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"total"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;847&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"relation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"eq"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"hits"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"_score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.94&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"_source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CoreService timeout fix v4.2.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"product_version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"4.x"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"resolved"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"linear_score_breakdown"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"bm25_normalized"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.98&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"vector_normalized"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.81&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"final"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0.7x0.98 + 0.3x0.81 = 0.93"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Before vs After (850K document support corpus):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Vector-Only Baseline&lt;/th&gt;
&lt;th&gt;Hybrid + Cost-Based&lt;/th&gt;
&lt;th&gt;Delta&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Median latency&lt;/td&gt;
&lt;td&gt;340ms&lt;/td&gt;
&lt;td&gt;87ms&lt;/td&gt;
&lt;td&gt;3.9x faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P99 latency&lt;/td&gt;
&lt;td&gt;4,200ms&lt;/td&gt;
&lt;td&gt;310ms&lt;/td&gt;
&lt;td&gt;13.5x faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RAM footprint&lt;/td&gt;
&lt;td&gt;828 GB&lt;/td&gt;
&lt;td&gt;26 GB&lt;/td&gt;
&lt;td&gt;32x reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Precision@10&lt;/td&gt;
&lt;td&gt;0.52&lt;/td&gt;
&lt;td&gt;0.81&lt;/td&gt;
&lt;td&gt;+56% relevance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monthly LLM token cost&lt;/td&gt;
&lt;td&gt;~$1,840&lt;/td&gt;
&lt;td&gt;~$880&lt;/td&gt;
&lt;td&gt;52% reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Conclusion + Takeaways
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Semantic Collapse is inevitable at scale.&lt;/strong&gt; As corpora grow beyond 50,000 documents, embedding distances converge into statistical noise. Lexical precision is not optional. It is the counterweight that keeps retrieval grounded.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Filter selectivity is the number you are not measuring.&lt;/strong&gt; A two-millisecond COUNT query tells you whether your next retrieval takes 87ms or 1,400ms. Compute it before every hybrid query.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ACORN and BBQ are prerequisites, not features.&lt;/strong&gt; Moving your filter inside the &lt;code&gt;knn&lt;/code&gt; block activates ACORN in-graph pruning. Enabling BBQ reduces your RAM footprint by 32x. Both are available in Elasticsearch today. Both are underdeployed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RRF erases relevance magnitude.&lt;/strong&gt; If you have domain knowledge about your query distribution (and in production, you do), &lt;code&gt;linear_retriever&lt;/code&gt; with MinMax normalization gives you mathematical control that rank-position fusion cannot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost is a quality metric.&lt;/strong&gt; Fewer irrelevant chunks in your LLM context means a lower API bill and a lower hallucination rate. Cost-aware hybrid search is not just cheaper. It is objectively smarter.&lt;/p&gt;

&lt;p&gt;Vectors find neighbors. Cost-based hybrid search finds answers. Build systems that know the difference, and Elasticsearch gives you every primitive you need to do it today.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This blog was submitted as a part of the Elastic Blogathon 2026.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keywords:&lt;/strong&gt; hybrid search, Elasticsearch vector search, RAG optimization, semantic retrieval, cost-based optimization, BM25, HNSW, Better Binary Quantization, ACORN, linear retriever, MinMax normalization, semantic collapse, retrieval-augmented generation, vector database performance&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hashtags:&lt;/strong&gt; #ElasticBlogathon #vectorsearch #semanticsearch #vectorDB #rag #GenAI #Elasticsearch #hybridSearch&lt;/p&gt;

</description>
      <category>elasticsearch</category>
      <category>rag</category>
      <category>vectordatabase</category>
    </item>
  </channel>
</rss>
