<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sergey Nikolaev</title>
    <description>The latest articles on DEV Community by Sergey Nikolaev (@sanikolaev).</description>
    <link>https://dev.to/sanikolaev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F363352%2F6f7a2da7-fa00-47f5-aaca-a007b1d43350.jpeg</url>
      <title>DEV Community: Sergey Nikolaev</title>
      <link>https://dev.to/sanikolaev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sanikolaev"/>
    <language>en</language>
    <item>
      <title>14 faster embeddings: how we rebuilt the ONNX path in Manticore</title>
      <dc:creator>Sergey Nikolaev</dc:creator>
      <pubDate>Thu, 25 Jun 2026 11:50:04 +0000</pubDate>
      <link>https://dev.to/sanikolaev/14-faster-embeddings-how-we-rebuilt-the-onnx-path-in-manticore-4eom</link>
      <guid>https://dev.to/sanikolaev/14-faster-embeddings-how-we-rebuilt-the-onnx-path-in-manticore-4eom</guid>
      <description>&lt;p&gt;When we shipped &lt;a href="https://manticoresearch.com/blog/auto-embeddings/" rel="noopener noreferrer"&gt;Auto Embeddings&lt;/a&gt; — the feature that turns any text column into a vector automatically, with no separate model service to run — the most common piece of feedback was about speed. The previous path went through SentenceTransformers on top of &lt;a href="https://github.com/huggingface/candle" rel="noopener noreferrer"&gt;Candle&lt;/a&gt;, Hugging Face's pure-Rust ML inference runtime, and it left a lot of CPU on the floor: most workloads sat in the low-double-digits of docs/sec no matter how we fed them, and concurrent calls serialised on a single model session.&lt;/p&gt;

&lt;p&gt;So we spent a few weeks rebuilding how Manticore runs ONNX models. The new ONNX Runtime backend shipped in &lt;a href="https://manticoresearch.com/blog/manticore-search-27-1-5/" rel="noopener noreferrer"&gt;Manticore Search 27.1.5&lt;/a&gt;. ONNX (Open Neural Network Exchange) is the portable model format that most of the popular open-source embedding models — MiniLM, BGE, E5, and friends — already publish. The result is a backend that's &lt;strong&gt;~14× faster on average than the previous SentenceTransformers/Candle path&lt;/strong&gt; on the same hardware (average cheap 16 cores / 32 threads server), same model, same weights, averaged over the full &lt;code&gt;threads × batch&lt;/code&gt; workload grid — and that advantage holds whether you run 1 client thread or 32. The old path stayed in the 5–11 docs/sec range across the entire grid; the new one lives in the 70–230 docs/sec band.&lt;/p&gt;

&lt;p&gt;This post is the engineering log: what we tried, what surprised us, what we threw away, and what the final design looks like.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;~14× faster on average than the previous SentenceTransformers/Candle path&lt;/strong&gt;, averaged across the full &lt;code&gt;threads × batch&lt;/code&gt; workload grid (1 / 2 / 4 / 8 / 16 / 32 threads × batch sizes 1…128) on the same box (16 cores / 32 threads), same model, same weights.&lt;/li&gt;
&lt;li&gt;Released in &lt;a href="https://manticoresearch.com/blog/manticore-search-27-1-5/" rel="noopener noreferrer"&gt;Manticore Search 27.1.5&lt;/a&gt;, the new ONNX path is now the default fast path for any HuggingFace model that ships an &lt;code&gt;.onnx&lt;/code&gt; file.&lt;/li&gt;
&lt;li&gt;On &lt;code&gt;all-MiniLM-L12-v2&lt;/code&gt;, the old Candle path sat at &lt;strong&gt;5–11 docs/sec&lt;/strong&gt; across every configuration we tried. The new ONNX path lands in the &lt;strong&gt;70–230 docs/sec&lt;/strong&gt; range — the &lt;strong&gt;same ~14× margin holds whether you run 1 client thread or 32&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Single-insert latency on our test box: &lt;strong&gt;~14 ms&lt;/strong&gt; with a single client, &lt;strong&gt;~56 ms&lt;/strong&gt; under 8-way concurrent load — both well below the 200+ ms Candle was hitting.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Want maximum bulk ingest throughput?&lt;/strong&gt; Use a &lt;strong&gt;high batch size&lt;/strong&gt; (32–128) on a &lt;strong&gt;single client thread&lt;/strong&gt;. The new backend parallelises inside the call, so client-side fan-out just piles coordination overhead on top — peak on our box was &lt;strong&gt;233 docs/sec at 1 thread + batch=64&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The two changes that mattered most: turning &lt;strong&gt;&lt;code&gt;intra_op_spinning&lt;/code&gt; off&lt;/strong&gt;, and giving up on batching documents inside the worker.&lt;/li&gt;
&lt;li&gt;No user-facing API changes. A table that already points at an ONNX-capable &lt;code&gt;MODEL_NAME&lt;/code&gt; picks up the new path automatically. Switching an existing table to a different model isn't a one-liner — Manticore doesn't allow altering &lt;code&gt;MODEL_NAME&lt;/code&gt; on a &lt;code&gt;FLOAT_VECTOR&lt;/code&gt; field in place — but you don't have to recreate the whole table either: you can add a new column with the new model alongside, rebuild its embeddings, and drop the old one.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;With auto-embeddings, the database itself runs the model on every &lt;code&gt;INSERT&lt;/code&gt;. That means embedding speed &lt;em&gt;is&lt;/em&gt; INSERT speed — your ingest throughput is whatever the embedding step can sustain.&lt;/p&gt;

&lt;p&gt;The old SentenceTransformers/Candle path left performance on the table. Concurrency hit lock contention, batched calls plateaued because of padding overhead, and between calls the runtime parked threads in ways that prevented the next call from picking up where the previous one left off. The headline symptom was simple: &lt;code&gt;top&lt;/code&gt; would show the box well under full load no matter what you threw at it. The whole sweep — single-row INSERTs, 128-row bulk INSERTs, one client thread, thirty-two client threads — sat at &lt;strong&gt;5–11 docs/sec&lt;/strong&gt;, because nothing about how you fed it could buy you more CPU.&lt;/p&gt;

&lt;p&gt;The new ONNX path raises the floor by an order of magnitude &lt;em&gt;and&lt;/em&gt; gives users meaningful performance tuning options. A single-thread, single-row INSERT now lands &lt;strong&gt;72 docs/sec&lt;/strong&gt; — already ~7× the old Candle ceiling. Add concurrency or batch size and it climbs into the &lt;strong&gt;130–230 docs/sec&lt;/strong&gt; range, with the top of the grid at &lt;strong&gt;233 docs/sec on a single client thread at &lt;code&gt;--batch-size=64&lt;/code&gt;&lt;/strong&gt;. Averaged across the whole &lt;code&gt;threads × batch&lt;/code&gt; matrix, the new path is &lt;strong&gt;~14× the old one&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why ONNX, and not Candle
&lt;/h2&gt;

&lt;p&gt;Manticore's embeddings library has supported a few backends for a while. The Candle path is great for correctness and easy to ship. But for production inference of small encoder models like the MiniLM and BGE family, ONNX Runtime is hard to beat:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ONNX Runtime (or &lt;strong&gt;ORT&lt;/strong&gt; — Microsoft's official, hand-tuned C++ inference engine for ONNX models) does graph fusion, constant folding, kernel autotuning.&lt;/li&gt;
&lt;li&gt;Most of the popular embedding models on HuggingFace already publish a pre-fused &lt;code&gt;model.onnx&lt;/code&gt; in their &lt;code&gt;onnx/&lt;/code&gt; directory. The on-disk file is already in the shape ORT wants.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On the same &lt;code&gt;all-MiniLM-L12-v2&lt;/code&gt; weights, on CPU, the ONNX path is a noticeable step up over the Candle path. Same quality, much less per-document work.&lt;/p&gt;

&lt;p&gt;The ORT session is created with a small set of opinions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;ort&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;session&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;
    &lt;span class="nf"&gt;.with_optimization_level&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;GraphOptimizationLevel&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Level3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;
    &lt;span class="nf"&gt;.with_intra_threads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;            &lt;span class="c1"&gt;// let ORT pick (= all cores)&lt;/span&gt;
    &lt;span class="nf"&gt;.with_intra_op_spinning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;    &lt;span class="c1"&gt;// do NOT busy-wait between calls&lt;/span&gt;
    &lt;span class="nf"&gt;.with_flush_to_zero&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;             &lt;span class="c1"&gt;// kill denormals on attention softmax&lt;/span&gt;
    &lt;span class="nf"&gt;.with_approximate_gelu&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;          &lt;span class="c1"&gt;// ~10% faster activation, no quality loss&lt;/span&gt;
    &lt;span class="nf"&gt;.commit_from_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;onnx_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Most of these are uncontroversial, "of course you turn that on" knobs. One is not: &lt;code&gt;intra_op_spinning(false)&lt;/code&gt;. We'll come back to it — it's the single biggest win in the whole branch, and it's not really an ORT setting so much as a load-shape decision.&lt;/p&gt;

&lt;h2&gt;
  
  
  The concurrency model — the part most readers will find new
&lt;/h2&gt;

&lt;p&gt;If you give a Rust developer "make ONNX go fast" with no other constraints, they reach for one of two patterns. We tried both. They are both wrong for this workload.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern 1: a single shared &lt;code&gt;Session&lt;/code&gt; behind a &lt;code&gt;Mutex&lt;/code&gt;&lt;/strong&gt; (a &lt;code&gt;Mutex&lt;/code&gt; is a lock that lets only one thread touch the session at a time). Easy to reason about, easy to get right. Throughput collapses under concurrency because every caller serialises on the lock. Fine for a CLI tool, awful for a database serving many concurrent INSERTs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern 2: a session pool, one &lt;code&gt;Session&lt;/code&gt; per CPU.&lt;/strong&gt; No more lock contention, but cold-start time multiplies, RAM use multiplies, and small inputs pay a dispatch cost just to land on a session. We had a working version of this in a development branch and it never quite delivered.&lt;/p&gt;

&lt;p&gt;The thing that unlocked the design is something most Rust ONNX wrappers get wrong: &lt;strong&gt;on Linux and macOS, ORT's C &lt;code&gt;Run()&lt;/code&gt; API is thread-safe.&lt;/strong&gt; You can share one &lt;code&gt;Session&lt;/code&gt; across many concurrent callers without any locking. The C++ side already serialises what needs serialising; the Rust API just hides it behind borrow-checker rules that do not match what the underlying library actually allows.&lt;/p&gt;

&lt;p&gt;So we wrap the session in a small platform-aware type:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="nd"&gt;#[cfg(not(target_os&lt;/span&gt; &lt;span class="nd"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"windows"&lt;/span&gt;&lt;span class="nd"&gt;))]&lt;/span&gt;
&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;SessionWrapper&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;inner&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nn"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;cell&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;UnsafeCell&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;ort&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;session&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Session&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nd"&gt;#[cfg(not(target_os&lt;/span&gt; &lt;span class="nd"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"windows"&lt;/span&gt;&lt;span class="nd"&gt;))]&lt;/span&gt;
&lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="k"&gt;impl&lt;/span&gt; &lt;span class="nb"&gt;Sync&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;SessionWrapper&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="nd"&gt;#[cfg(not(target_os&lt;/span&gt; &lt;span class="nd"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"windows"&lt;/span&gt;&lt;span class="nd"&gt;))]&lt;/span&gt;
&lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="k"&gt;impl&lt;/span&gt; &lt;span class="nb"&gt;Send&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;SessionWrapper&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="k"&gt;impl&lt;/span&gt; &lt;span class="n"&gt;SessionWrapper&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="n"&gt;with_session&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;R&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;impl&lt;/span&gt; &lt;span class="nf"&gt;FnOnce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;R&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;R&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;f&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.inner&lt;/span&gt;&lt;span class="nf"&gt;.get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yes, this is &lt;code&gt;unsafe&lt;/code&gt;. We're taking the borrow checker out of the loop because the underlying library is documented to be safe under the access pattern we're using. It's a deliberate &lt;code&gt;unsafe&lt;/code&gt; with a one-line justification, not a foot-gun.&lt;/p&gt;

&lt;p&gt;On Windows, ORT's threading model has known issues, so we serialise &lt;code&gt;Run()&lt;/code&gt; with a &lt;code&gt;Mutex&lt;/code&gt;. Importantly, the lock is held &lt;em&gt;for the entire closure&lt;/em&gt;, not just the call to &lt;code&gt;run()&lt;/code&gt; — that's what fixed the race we saw on Windows where one thread's &lt;code&gt;SessionOutputs&lt;/code&gt; were still being read while another thread had already started a new &lt;code&gt;run()&lt;/code&gt;. Closure-scoped locking, not call-scoped.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adaptive parallelism — the wrong turns we took
&lt;/h2&gt;

&lt;p&gt;This is the part of the work that took the longest, because every textbook says "to make ONNX fast, batch your inputs". So our first attempts followed the textbook.&lt;/p&gt;

&lt;p&gt;We tokenized chunks of 8, 16, 32 documents at a time, padded them to &lt;code&gt;max_len&lt;/code&gt;, and ran a single forward pass per worker thread. The throughput numbers came back lower than processing the same texts one-by-one through the same session. We ran it again. Same result. We spent a while trying to disprove it before accepting it. The reverted commit &lt;code&gt;980b24b "Revert: perf(model): batch inference in worker threads"&lt;/code&gt; is the moment we stopped fighting and rebuilt around what the profiler kept telling us.&lt;/p&gt;

&lt;p&gt;Two things were behind the surprise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The padding tax.&lt;/strong&gt; A batch of mixed-length texts pads every row up to the longest row. The model then does work proportional to &lt;code&gt;batch_size * max_len * hidden_dim&lt;/code&gt;, regardless of how much real content is in the batch. Real text inputs are highly variable in length: a typical batch of 8 random sentences might have one 60-token outlier and seven 8-token rows. The model spends most of its cycles multiplying padding tokens against attention weights. With one-doc batches, the model only does work proportional to that doc's actual token count. Per-document, "no batching" is cheaper than "batching" once the variance in input length is realistic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Spinning.&lt;/strong&gt; ORT's intra-op thread pool defaults to &lt;em&gt;spinning&lt;/em&gt; between dispatches — threads burn CPU in a tight loop waiting for the next chunk of work. With one big batch per session call this is invisible: the thread is always busy with real work. With many concurrent small calls, it becomes a disaster: every worker's intra-op pool is pinned at 100% CPU between calls, and there's no CPU left for anything else. We saw exactly this pattern in &lt;code&gt;top&lt;/code&gt;: every core at 100%, throughput &lt;em&gt;lower&lt;/em&gt; than spinning-off. This sounds wrong until you remember the rest of the system needs CPU time too — the tokenizer, the HNSW build, the rest of &lt;code&gt;searchd&lt;/code&gt;. Flipping &lt;code&gt;with_intra_op_spinning(false)&lt;/code&gt; on was a one-line change that immediately raised throughput and dropped CPU usage at the same time.&lt;/p&gt;

&lt;p&gt;So the final shape is the opposite of the textbook recipe:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;One shared session&lt;/strong&gt;, no pool.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One document per inference call&lt;/strong&gt;, no batching inside the worker.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Many concurrent callers&lt;/strong&gt;, scaled to CPU count.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No spinning&lt;/strong&gt; between calls — yield the CPU like a polite citizen.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;predict_pipelined&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;f32&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;bs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="c1"&gt;// Small input — single tokenize + infer, no thread overhead.&lt;/span&gt;
    &lt;span class="c1"&gt;// This is the path a 1-doc INSERT takes.&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="nf"&gt;.len&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;bs&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;Self&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;tokenize_and_infer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.tokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Large input — split across workers, each running 1-doc-at-a-time&lt;/span&gt;
    &lt;span class="c1"&gt;// through the SHARED session. This deliberately mimics the&lt;/span&gt;
    &lt;span class="c1"&gt;// many-concurrent-callers pattern that ORT is happiest with.&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;num_workers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="nf"&gt;.len&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;bs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nf"&gt;.min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;available_cpus&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="nf"&gt;.max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;docs_per_worker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="nf"&gt;.len&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.div_ceil&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_workers&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nn"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;worker_texts&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="nf"&gt;.chunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs_per_worker&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="nf"&gt;.spawn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;move&lt;/span&gt; &lt;span class="p"&gt;||&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;worker_texts&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="k"&gt;Self&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;tokenize_and_infer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                             &lt;span class="nn"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;from_ref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(())&lt;/span&gt;
            &lt;span class="p"&gt;});&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The two-branch design is on purpose. A 1-row INSERT comes in with &lt;code&gt;texts.len() == 1&lt;/code&gt;, which is &lt;code&gt;&amp;lt;= bs&lt;/code&gt;, so it takes the fast path with &lt;strong&gt;zero thread spawning, zero channel sends, zero coordination overhead&lt;/strong&gt;. A bulk REPLACE INTO with thousands of rows takes the parallel branch and gets the throughput benefit. The cheap case stays cheap, the expensive case stays parallel.&lt;/p&gt;

&lt;p&gt;We also enable parallel tokenization once at startup (&lt;code&gt;TOKENIZERS_PARALLELISM=true&lt;/code&gt;) and pre-truncate inputs by character count before BPE, so a 100KB blob of text doesn't pin a CPU on the tokenizer for a second before the model even sees it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Numbers
&lt;/h2&gt;

&lt;p&gt;All runs on our standard benchmark box, using &lt;code&gt;all-MiniLM-L12-v2-onnx&lt;/code&gt;, 1000 documents per run. Generated with &lt;a href="https://manticoresearch.com/blog/manticore-load/" rel="noopener noreferrer"&gt;manticore-load&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;manticore-load &lt;span class="nt"&gt;--quiet&lt;/span&gt; &lt;span class="nt"&gt;--drop&lt;/span&gt; &lt;span class="nt"&gt;--batch-size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nt"&gt;--threads&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;8 &lt;span class="nt"&gt;--total&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--init&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"CREATE TABLE t (
    f text,
    v FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
      MODEL_NAME='onnx-models/all-MiniLM-L12-v2-onnx' FROM=''
  )"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--load&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"INSERT INTO t(f) VALUES('&amp;lt;text/10/100&amp;gt;')"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same command with &lt;code&gt;--batch-size=2&lt;/code&gt;, &lt;code&gt;8&lt;/code&gt;, &lt;code&gt;32&lt;/code&gt;, &lt;code&gt;128&lt;/code&gt;, all at 8 threads:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;code&gt;--batch-size&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;docs/sec&lt;/th&gt;
&lt;th&gt;avg call latency (ms)&lt;/th&gt;
&lt;th&gt;per-doc latency (ms)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;143&lt;/td&gt;
&lt;td&gt;55.9&lt;/td&gt;
&lt;td&gt;55.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;113&lt;/td&gt;
&lt;td&gt;141.6&lt;/td&gt;
&lt;td&gt;70.8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;91&lt;/td&gt;
&lt;td&gt;703.3&lt;/td&gt;
&lt;td&gt;87.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;146&lt;/td&gt;
&lt;td&gt;1753.4&lt;/td&gt;
&lt;td&gt;54.8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;128&lt;/td&gt;
&lt;td&gt;147&lt;/td&gt;
&lt;td&gt;6966.0&lt;/td&gt;
&lt;td&gt;54.4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Compared against Candle at the same 8 threads — which sat flat at &lt;strong&gt;10 docs/sec across every batch size&lt;/strong&gt; — that's between &lt;strong&gt;9× and 15× more documents per second&lt;/strong&gt; depending on the batch you pick. The "avg call latency" column is the time for one full &lt;code&gt;INSERT&lt;/code&gt; statement to return, not per document; divide by the batch size and the per-doc cost lands in the 55–90 ms band.&lt;/p&gt;

&lt;p&gt;If you swap the table to 1 client thread — the configuration that turns out to be optimal for bulk loading — the numbers climb further: &lt;strong&gt;72 / 76 / 93 / 175 / 233 / 222 docs/sec&lt;/strong&gt; at batches 1 / 2 / 8 / 32 / 64 / 128. The peak in the entire grid is &lt;strong&gt;233 docs/sec at 1 thread × batch=64&lt;/strong&gt;, with per-document latency of &lt;strong&gt;~4.3 ms&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to feed it for maximum throughput
&lt;/h3&gt;

&lt;p&gt;If you're loading a lot of data in bulk and want maximum docs/sec, the recipe is straightforward: send large &lt;code&gt;INSERT ... VALUES (..), (..), ...&lt;/code&gt; statements (batch 32–128) from a &lt;strong&gt;single client thread&lt;/strong&gt;, not many small inserts from many threads. The new backend already parallelises &lt;em&gt;inside&lt;/em&gt; the call (see the &lt;code&gt;predict_pipelined&lt;/code&gt; code above), so client-side fan-out just piles coordination overhead on top of what ORT is already doing — that's why 1 thread × batch=64 (233 docs/sec) beats 8 threads × batch=128 (147 docs/sec) by a clear margin.&lt;/p&gt;

&lt;p&gt;If your workload is naturally one-row-at-a-time — web requests, queue consumers, MCP servers — just use &lt;code&gt;INSERT INTO&lt;/code&gt;. The single-thread / single-row floor of 72 docs/sec is already ~7× the old Candle path, low enough latency that this isn't a tier you need to optimise around any more.&lt;/p&gt;

&lt;h3&gt;
  
  
  Before vs after, across the whole grid
&lt;/h3&gt;

&lt;p&gt;To make the before/after concrete, we also swept the full &lt;code&gt;threads × batch&lt;/code&gt; grid against the old Candle/&lt;code&gt;trans&lt;/code&gt; path on the same box, same weights:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fq25ro59w1ilutzdxb24z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fq25ro59w1ilutzdxb24z.png" alt=" " width="800" height="193"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Each X-axis tick is &lt;code&gt;backend threads/batch-size&lt;/code&gt;. The left half (&lt;code&gt;trans …&lt;/code&gt;) is the old Candle path — docs/sec sits at 5–11 across the entire grid no matter how many threads or how large the batch, while CPU is already pinned. The right half (&lt;code&gt;onnx …&lt;/code&gt;) is the new path — docs/sec is an order of magnitude higher across the whole sweep. Within the new path: at small batches, adding client threads helps (1T/batch=1 = 72 → 8T/batch=1 = 143); at large batches, a single client thread wins (1T/batch=64 = 233 is the global peak).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Frhikrnu4ck1fpi9fadup.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Frhikrnu4ck1fpi9fadup.png" alt=" " width="800" height="195"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Same sweep, but plotting efficiency (docs/sec per % CPU) alongside docs/sec. On the Candle (&lt;code&gt;trans&lt;/code&gt;) side, both lines hug the floor — the box is spending CPU without producing documents. On the ONNX (&lt;code&gt;onnx&lt;/code&gt;) side, efficiency is highest at 1–2 threads with mid-sized batches, where each percent of CPU buys the most embeddings, and it stays well above the old path even as we crank threads up to 32.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;A few things are queued behind this work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPU path.&lt;/strong&gt; The current ONNX setup is CPU-only. The &lt;code&gt;_use_gpu&lt;/code&gt; parameter is plumbed through but not yet wired to the ORT CUDA execution provider.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Windows perf parity.&lt;/strong&gt; We currently serialise on Windows because of an ORT threading bug. Once that bug is resolved upstream, Windows should get the same shared-session behaviour Linux/macOS already have.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;More architectures down the ONNX path.&lt;/strong&gt; Right now ONNX is the path for BERT-family encoders. T5, causal-LM and quantized GGUF models still go through Candle for now.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;If your existing table is already pointed at an ONNX-capable model, the new path takes over once you upgrade to Manticore Search 27.1.5 or newer — no schema changes, no re-ingest. You should just see your INSERTs go faster.&lt;/p&gt;

&lt;p&gt;If you're not on an ONNX model yet — or you want to move to a smaller / faster one to take maximum advantage of the new backend — note that &lt;strong&gt;you can't swap the model on an existing field&lt;/strong&gt;. Manticore doesn't support altering &lt;code&gt;MODEL_NAME&lt;/code&gt; on an existing &lt;code&gt;FLOAT_VECTOR&lt;/code&gt; field, so migrating in place isn't an option. You have two practical paths to choose between, depending on what's easier in your setup:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option A — dump, edit, reload.&lt;/strong&gt; Even if you no longer have the original source data, you can &lt;code&gt;mysqldump&lt;/code&gt; the existing table to a SQL file, edit the &lt;code&gt;CREATE TABLE&lt;/code&gt; in that dump to point &lt;code&gt;MODEL_NAME&lt;/code&gt; at the ONNX-optimised model you want, and replay the dump into a fresh table. Manticore will re-embed every row through the new path on the way in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option B — add a new column alongside, rebuild, drop the old one.&lt;/strong&gt; If you'd rather stay in SQL and avoid the dump round-trip, add a new &lt;code&gt;FLOAT_VECTOR&lt;/code&gt; column on the same table that points at the ONNX model, then trigger a one-shot re-embed of that column from the source text:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;v_new&lt;/span&gt; &lt;span class="n"&gt;FLOAT_VECTOR&lt;/span&gt; &lt;span class="n"&gt;KNN_TYPE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'hnsw'&lt;/span&gt;
  &lt;span class="n"&gt;HNSW_SIMILARITY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'l2'&lt;/span&gt;
  &lt;span class="n"&gt;MODEL_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'Xenova/all-MiniLM-L6-v2'&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'text_field'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="n"&gt;REBUILD&lt;/span&gt; &lt;span class="n"&gt;EMBEDDINGS&lt;/span&gt; &lt;span class="n"&gt;v_new&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- once you've cut over reads to v_new, drop the old column&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;v_old&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See the &lt;a href="https://manual.manticoresearch.com/dev/Updating_table_schema_and_settings#Rebuilding-embeddings" rel="noopener noreferrer"&gt;Rebuilding embeddings&lt;/a&gt; section of the docs for the exact syntax and constraints.&lt;/p&gt;

&lt;p&gt;On brand-new tables, none of this matters — just pick an ONNX-optimised &lt;code&gt;MODEL_NAME&lt;/code&gt; from the start.&lt;/p&gt;

&lt;p&gt;A good place to shop for ONNX-ready embedding models is the &lt;a href="https://huggingface.co/Xenova/models" rel="noopener noreferrer"&gt;Xenova collection on Hugging Face&lt;/a&gt; — these are pre-converted to ONNX and ready to drop into &lt;code&gt;MODEL_NAME='...'&lt;/code&gt;. Filter the list by the &lt;strong&gt;feature-extraction&lt;/strong&gt; task to narrow it down to embedding-style models. Some sensible starting points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Xenova/all-MiniLM-L6-v2&lt;/code&gt; — small and fast, 384-dim, great default.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Xenova/all-MiniLM-L12-v2&lt;/code&gt; — the model we benchmarked in this post, 384-dim, a step up in quality.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Xenova/bge-small-en-v1.5&lt;/code&gt; — strong English retrieval, 384-dim.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Xenova/multilingual-e5-small&lt;/code&gt; — multilingual coverage, 384-dim.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you aren't using auto-embeddings yet at all, the &lt;a href="https://manticoresearch.com/blog/auto-embeddings/" rel="noopener noreferrer"&gt;original announcement&lt;/a&gt; walks through the SQL from scratch.&lt;/p&gt;

&lt;p&gt;📚 &lt;a href="https://manual.manticoresearch.com/Searching/KNN" rel="noopener noreferrer"&gt;KNN search documentation&lt;/a&gt;&lt;br&gt;
💬 &lt;a href="https://slack.manticoresearch.com/" rel="noopener noreferrer"&gt;Slack community&lt;/a&gt; — we'd love to see how the new path holds up on your data.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Український лематизатор тепер вбудовано в Manticore Search</title>
      <dc:creator>Sergey Nikolaev</dc:creator>
      <pubDate>Tue, 23 Jun 2026 11:14:40 +0000</pubDate>
      <link>https://dev.to/sanikolaev/ukrayinskii-liematizator-tiepier-vbudovano-v-manticore-search-5aoa</link>
      <guid>https://dev.to/sanikolaev/ukrayinskii-liematizator-tiepier-vbudovano-v-manticore-search-5aoa</guid>
      <description>&lt;h2&gt;
  
  
  Коротко
&lt;/h2&gt;

&lt;p&gt;починаючи з релізу &lt;code&gt;27.1.5&lt;/code&gt; український лематизатор більше не потребує окремого Python-стека.&lt;br&gt;
Раніше потрібно було встановлювати окремий пакет, Python 3.9, &lt;code&gt;pymorphy2&lt;/code&gt; і українські словники.&lt;br&gt;
Гарна новина - тепер словник уже входить до Manticore.&lt;/p&gt;

&lt;p&gt;Достатньо лише ввімкнути явно морфологію:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;morphology&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'lemmatize_uk_all'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Окремо додавати українські символи до &lt;code&gt;charset_table&lt;/code&gt; також вже не потрібно: стандартний &lt;code&gt;non_cont&lt;/code&gt; містить мапінги для &lt;code&gt;є&lt;/code&gt;, &lt;code&gt;і&lt;/code&gt;, &lt;code&gt;ї&lt;/code&gt;, &lt;code&gt;ґ&lt;/code&gt;.&lt;br&gt;
А от апостроф для української мови важливий, але тут є один важливий нюанс. Якщо просто додати його в &lt;code&gt;charset_table&lt;/code&gt; це може зачепити данні на англійській мові, де апостроф також використовується.&lt;/p&gt;

&lt;p&gt;Саме тому для українських текстів ми рекомендуємо використовувати окрему таблицю з власним &lt;code&gt;charset_table&lt;/code&gt; та апострофом, а не змішувати українську з англійською чи іншими мовами в одній таблиці.&lt;/p&gt;

&lt;p&gt;Це все що потрібно урахувати для повноцінної підтримки Української в ManticoreSearch. Ніяких словників, пакетів чи скриптів. Тепер все працює прямо "з коробки"&lt;/p&gt;
&lt;h2&gt;
  
  
  Що таке лематизатор
&lt;/h2&gt;

&lt;p&gt;У повнотекстовому пошуку часто потрібно знайти слово не лише в тій формі, яку ввів користувач. У документі може бути &lt;code&gt;мрії&lt;/code&gt;, а користувач шукає &lt;code&gt;мрія&lt;/code&gt;. Або в тексті є &lt;code&gt;інтернет-магазину&lt;/code&gt;, а в запиті приходить &lt;code&gt;інтернет-магазин&lt;/code&gt;. Людина легко бачить, що це форми того самого слова. Для пошукового рушія без морфології це різні токени.&lt;/p&gt;

&lt;p&gt;Для цього в пошукових рушіях використовують стемінг і лематизацію.&lt;/p&gt;

&lt;p&gt;Стемер зазвичай працює за правилами: відкидає або замінює закінчення. Це швидко, але результат буває грубим і не завжди схожим на справжнє слово.&lt;/p&gt;

&lt;p&gt;Лематизатор спирається на словник і морфологію, щоб отримати нормальну форму слова. Для української мови це особливо помітно через відмінки, рід і число.&lt;/p&gt;
&lt;h2&gt;
  
  
  Що змінилося
&lt;/h2&gt;

&lt;p&gt;Якщо ви вже пробували українську лематизацію в Manticore, то проблема могла бути не в самому пошуку, а у встановленні:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;окремий &lt;code&gt;manticore-lemmatizer-uk&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;Python 3.9 з &lt;code&gt;--enable-shared&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pymorphy2&lt;/code&gt; і &lt;code&gt;pymorphy2-dicts-uk&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;додаткові системні залежності.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Тепер український словник постачається як звичайний мовний файл &lt;code&gt;uk.pak&lt;/code&gt;, а Manticore завантажує його напряму. Вам залишається налаштувати таблицю: вказати потрібну &lt;code&gt;morphology&lt;/code&gt; і працювати далі.&lt;/p&gt;
&lt;h2&gt;
  
  
  Мінімальна конфігурація
&lt;/h2&gt;

&lt;p&gt;Створимо таблицю для українських текстів:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;uk_docs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;morphology&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'lemmatize_uk_all'&lt;/span&gt;
  &lt;span class="n"&gt;charset_table&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'non_cont,U+0027'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Тут важливо ввімкнути морфологію:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;morphology='lemmatize_uk_all'&lt;/code&gt; вмикає український лематизатор та індексує усі знайдені нормальні форми.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Для української мови додаємо лише апостроф (&lt;code&gt;U+0027&lt;/code&gt;), щоб слова на кшталт &lt;code&gt;обов'язковим&lt;/code&gt; індексувалися як один токен.&lt;/p&gt;

&lt;p&gt;Для однієї нормальної форми підійде &lt;code&gt;lemmatize_uk&lt;/code&gt;. Щоб індексувати усі можливі форми, виберіть &lt;code&gt;lemmatize_uk_all&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Перевіримо на прикладі
&lt;/h2&gt;

&lt;p&gt;Додамо кілька документів:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;uk_docs&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'мрії про червону сукню'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'каталог інтернет-магазину'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'команд-учасниць запросили на зустріч'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Запит &lt;code&gt;мрія&lt;/code&gt; знаходить документ, де слово записано як &lt;code&gt;мрії&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;uk_docs&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'мрія'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+------+---------------------------+
| id   | title                     |
+------+---------------------------+
|    1 | мрії про червону сукню    |
+------+---------------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Запит &lt;code&gt;червоний&lt;/code&gt; знаходить &lt;code&gt;червону&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;uk_docs&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'червоний'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+------+---------------------------+
| id   | title                     |
+------+---------------------------+
|    1 | мрії про червону сукню    |
+------+---------------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;А &lt;code&gt;інтернет-магазин&lt;/code&gt; знаходить &lt;code&gt;інтернет-магазину&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;uk_docs&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'інтернет-магазин'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+------+---------------------------+
| id   | title                     |
+------+---------------------------+
|    2 | каталог інтернет-магазину |
+------+---------------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Що відбувається з токенами
&lt;/h2&gt;

&lt;p&gt;Якщо хочете побачити не лише результат пошуку, а й саму нормалізацію, використовуйте &lt;code&gt;CALL KEYWORDS&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;KEYWORDS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="s1"&gt;'мрії червона інтернет-магазину команд-учасниць'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s1"&gt;'uk_docs'&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+------+--------------------+--------------+
| qpos | tokenized          | normalized   |
+------+--------------------+--------------+
| 1    | мрії               | мрія         |
| 2    | червона            | червоний     |
| 3    | інтернет           | інтернет     |
| 4    | магазину           | магазин      |
| 5    | команд             | команда      |
| 6    | учасниць           | учасниця     |
+------+--------------------+--------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Тут добре видно різницю з простим обрізанням закінчень: на виході маємо нормальні форми слів, за якими вже можна шукати. &lt;code&gt;мрії&lt;/code&gt; перетворюється на &lt;code&gt;мрія&lt;/code&gt;, &lt;code&gt;червона&lt;/code&gt; на &lt;code&gt;червоний&lt;/code&gt;, &lt;code&gt;магазину&lt;/code&gt; на &lt;code&gt;магазин&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Що варто пам'ятати
&lt;/h2&gt;

&lt;p&gt;Користуватися українським лематизатором стало простіше, але для кожної таблиці його все одно треба ввімкнути явно через &lt;code&gt;morphology&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Стандартний &lt;code&gt;charset_table=non_cont&lt;/code&gt; уже покриває українські символи &lt;code&gt;є&lt;/code&gt;, &lt;code&gt;і&lt;/code&gt;, &lt;code&gt;ї&lt;/code&gt;, &lt;code&gt;ґ&lt;/code&gt;. Якщо ви задаєте таблицю саме для українських текстів, достатньо додати до нього апостроф: &lt;code&gt;charset_table='non_cont,U+0027'&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Якщо ви використовуєте офіційні пакети або образи Manticore Search актуальних версій, український &lt;code&gt;uk.pak&lt;/code&gt; уже має бути на місці. Якщо у вас власна збірка або нестандартне розташування файлів, перевірте, що &lt;code&gt;lemmatizer_base&lt;/code&gt; вказує на каталог, де лежить &lt;code&gt;uk.pak&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Докладніше про налаштування морфології можна прочитати в документації: &lt;a href="https://manual.manticoresearch.com/Creating_a_table/NLP_and_tokenization/Morphology#morphology" rel="noopener noreferrer"&gt;morphology&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>database</category>
      <category>news</category>
      <category>nlp</category>
      <category>sql</category>
    </item>
    <item>
      <title>Faster KNN search in Manticore: 2-pass HNSW, batched distances, and AVX-512</title>
      <dc:creator>Sergey Nikolaev</dc:creator>
      <pubDate>Tue, 23 Jun 2026 10:43:59 +0000</pubDate>
      <link>https://dev.to/sanikolaev/faster-knn-search-in-manticore-2-pass-hnsw-batched-distances-and-avx-512-4gd2</link>
      <guid>https://dev.to/sanikolaev/faster-knn-search-in-manticore-2-pass-hnsw-batched-distances-and-avx-512-4gd2</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Three changes to the HNSW search engine improve KNN throughput by up to 29% at high k, with over 20% gains under concurrent load. No API changes, no index rebuild, no configuration. Just faster searches.&lt;/p&gt;

&lt;h1&gt;
  
  
  Faster KNN search in Manticore
&lt;/h1&gt;

&lt;p&gt;Manticore's KNN search is built on top of &lt;a href="https://github.com/nmslib/hnswlib" rel="noopener noreferrer"&gt;hnswlib&lt;/a&gt;, an open-source HNSW implementation. Historically, most of our KNN work focused on custom distance functions, such as those used for binary quantization, rather than on hnswlib's core search loop. We also added features like &lt;a href="https://manual.manticoresearch.com/Searching/KNN#Filtering-strategies:-prefilter-vs.-postfilter" rel="noopener noreferrer"&gt;prefiltering with ACORN-1&lt;/a&gt; and &lt;a href="https://manual.manticoresearch.com/Searching/KNN#Filtering-strategies:-prefilter-vs.-postfilter" rel="noopener noreferrer"&gt;early termination&lt;/a&gt;, but the main search loop stayed the same: hnswlib still visited neighbors, computed distances, and maintained its set of candidates the same way.&lt;/p&gt;

&lt;p&gt;These changes go further, modifying hnswlib's core search loop itself - restructuring how it traverses neighbors, how it calls distance functions, and how it interacts with the CPU's memory hierarchy. Combined with new AVX-512 distance implementations in the columnar library, these changes target three sources of overhead: inefficient memory access patterns, redundant data loads, and indirect function call overhead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compile-time distance function specialization
&lt;/h2&gt;

&lt;p&gt;Previously, the distance function was a runtime function pointer stored in the HNSW index and called for every candidate. For large search budgets, that can mean a large number of indirect calls per query. Indirect calls prevent the compiler from inlining the distance function into the search loop, and they create branch prediction overhead.&lt;/p&gt;

&lt;p&gt;The new code resolves the distance function at compile time using C++ templates. When the search begins, a single switch statement selects the right template specialization based on the distance metric and quantization settings. From that point on, the entire inner loop - neighbor traversal, distance computation, candidate set updates - runs as one monolithic function with the distance calculation fully inlined. The compiler can now optimize register allocation, instruction scheduling, and loop unrolling across the distance computation boundary.&lt;/p&gt;

&lt;h2&gt;
  
  
  2-pass neighbor processing
&lt;/h2&gt;

&lt;p&gt;The HNSW algorithm explores the graph by visiting nodes and computing distances to their neighbors. In the original implementation, each neighbor was processed in a single pass: check if visited, fetch its vector data, compute distance, update the set of candidates. This meant that memory prefetch hints had little time to take effect before the data was needed.&lt;/p&gt;

&lt;p&gt;The new implementation splits this into two passes. Pass 1 iterates all neighbors of the current node, skips already-visited ones, and collects the unvisited neighbors into a small batch array. As each neighbor is added to the batch, a prefetch hint is issued for its vector data. Pass 2 iterates the batch and computes distances. By the time Pass 2 reaches each vector, the prefetch from Pass 1 has had time to bring the data into cache.&lt;/p&gt;

&lt;p&gt;Pass 2 walks a compact sequential array of candidate IDs, not the graph structure itself. The underlying vector loads are still scattered, but the data has been prefetched ahead of time.&lt;/p&gt;

&lt;p&gt;For unfiltered queries (no WHERE clause on the KNN search), the new code also takes a fast path that eliminates the per-candidate filter check entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  Batched distance computation
&lt;/h2&gt;

&lt;p&gt;The 2-pass structure helps in two ways: it gives prefetching more time to work, and it makes batching easy. Once Pass 2 has a compact list of candidates, it can score them two at a time instead of one by one.&lt;/p&gt;

&lt;p&gt;When scoring two candidates, the query vector is loaded once per SIMD iteration and reused for both distance computations, eliminating redundant loads.&lt;/p&gt;

&lt;p&gt;This reduces repeated query-side loads and lets the scoring loop process candidates in pairs, with a fallback for an odd remainder. Batch-2 functions are provided for inner product, L2, and their binary-quantized variants.&lt;/p&gt;

&lt;h2&gt;
  
  
  AVX-512 support
&lt;/h2&gt;

&lt;p&gt;The new AVX-512 distance code processes 16 floats per iteration instead of 8 with AVX2. For inner product and L2 distance, the core loop uses fused multiply-add (&lt;code&gt;_mm512_fmadd_ps&lt;/code&gt;), which combines multiplication and accumulation in a single instruction. For binary-quantized vectors, the AVX-512 VPOPCNTDQ extension speeds up bit-counting operations used in distance calculation.&lt;/p&gt;

&lt;p&gt;Manticore now ships three library variants: a baseline build, an AVX2 build, and an AVX-512 build. At startup, Manticore detects the CPU's capabilities and loads the appropriate library automatically. No configuration is needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark results
&lt;/h2&gt;

&lt;p&gt;The following benchmarks were run on the &lt;a href="https://storage.googleapis.com/ann-filtered-benchmark/datasets/dbpedia_openai_1M.tgz" rel="noopener noreferrer"&gt;dbpedia-openai-1M-1536-angular&lt;/a&gt; dataset (1M vectors, 1536 dimensions, cosine distance) on an AMD Ryzen 7 9700X (Zen 5, 8 physical cores / 16 logical cores). All data uses 1-bit binary quantization with oversampling and rescoring disabled. For multithreaded runs, throughput is reported as average per-thread queries per second: each worker runs its own batch of queries, its QPS is measured independently, and the final number is the average across workers. Each result is the average of 6 independent runs. Early termination was also disabled to isolate the effect of these optimizations on raw HNSW traversal.&lt;/p&gt;

&lt;p&gt;Zen 5 was chosen because it supports AVX-512 with native 512-bit datapaths, avoiding the split-512 execution behavior and heavy AVX-512 downclocking associated with some older Intel processors. This helps isolate the algorithmic effects of these changes from CPU-specific AVX-512 throttling behavior.&lt;/p&gt;

&lt;h3&gt;
  
  
  Algorithmic improvements alone
&lt;/h3&gt;

&lt;p&gt;The first chart isolates the effect of the algorithmic changes (2-pass processing, batched distances, compile-time dispatch) by comparing the new AVX2 build against the previous AVX2 build. Both builds use the same SIMD instruction set, so the difference is purely from the new code structure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fyyy05atim9h0xyis7xw6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fyyy05atim9h0xyis7xw6.png" alt=" " width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On a single thread, the gain grows steadily from +3% at k=10 to +24% at k=1000 as distance computation comes to dominate the search workload. With more threads competing for memory bandwidth, the per-thread gain shrinks: +9-10% at 4 or 8 threads, and only +2-5% at 16 threads.&lt;/p&gt;

&lt;p&gt;The 16-thread case is SMT (each physical core runs two threads). Distance computation is memory-bound, so when two threads share a core's L1/L2 caches, the prefetching and batching wins are partially absorbed by shared-resource contention. The algorithmic improvements still help, but the headroom shrinks.&lt;/p&gt;

&lt;h3&gt;
  
  
  SIMD width benefit (AVX-512 vs AVX2)
&lt;/h3&gt;

&lt;p&gt;The second chart isolates the effect of AVX-512 by comparing the AVX-512 build against the new AVX2 build (both share the same algorithmic improvements).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F2s077qw27ggqh2gvo66h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F2s077qw27ggqh2gvo66h.png" alt=" " width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AVX-512 is slightly slower than AVX2 at k=10 (around -2%) regardless of thread count. This is specific to AVX-512: the algorithmic improvements alone don't show this regression, so it's not a uniform per-query overhead. From k=30 upward, AVX-512 pulls ahead at every thread count.&lt;/p&gt;

&lt;p&gt;The interesting pattern is that AVX-512's benefit grows with thread count. Although this benchmark disables oversampling, the default Manticore KNN query uses &lt;code&gt;LIMIT 20&lt;/code&gt;, and with the default &lt;code&gt;oversampling=3.0&lt;/code&gt; (which multiplies the effective HNSW search budget for rescoring after quantized search) that becomes k=60 internally. At k=60, AVX-512 vs AVX2 (new) is +1.2% on a single thread, +2.6% at 4 threads, +3.4% at 8 threads, and +6.5% at 16 threads.&lt;/p&gt;

&lt;h3&gt;
  
  
  Combined improvement (AVX-512 vs the old code)
&lt;/h3&gt;

&lt;p&gt;The third chart shows the cumulative effect: AVX-512 with all the new code, compared against the previous AVX2 build. This is what a user upgrading from the previous Manticore version to the new one would see if their CPU supports AVX-512.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fajkius5p93q71qtg1wcm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fajkius5p93q71qtg1wcm.png" alt=" " width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The single-thread curve climbs from +0.5% at k=10 to +29% at k=1000. The multi-thread curves all reach +22-24% at k=1000. The improvement is broadly distributed across thread counts - the algorithmic and SIMD gains compose differently at different concurrency levels, but the combined result is consistently large at moderate-to-high k.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why the gain grows with k
&lt;/h3&gt;

&lt;p&gt;All three charts show the same shape: small improvement at low k, large at high k. The reason is that low-k queries spend a larger share of their time on graph traversal (visiting nodes, checking visited bits, popping the candidate set) - work that scales with the graph structure, not k. As k grows, the effective search budget grows proportionally, and the queries spend more time on distance computation. The optimizations target distance computation and the loops around it, so their benefit scales with the share of work that distance computation represents.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means for you
&lt;/h2&gt;

&lt;p&gt;These improvements require no action. They are available in the recent &lt;a href="https://dev.to/blog/manticore-search-27-1-5/"&gt;Manticore Search 27.1.5 release&lt;/a&gt;; there are no API changes, no new configuration options, and no need to rebuild indexes.&lt;/p&gt;

&lt;p&gt;The gains stack with the &lt;a href="https://manual.manticoresearch.com/Searching/KNN#Early-termination" rel="noopener noreferrer"&gt;KNN early termination&lt;/a&gt;: early termination reduces the number of distance computations per query, and these optimizations make each computation faster.&lt;/p&gt;

&lt;p&gt;The biggest improvements show up with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;High-dimensional vectors&lt;/strong&gt; (more arithmetic per distance computation, more SIMD benefit)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Large k values&lt;/strong&gt; (more total distance computations, more opportunity for batching and cache optimization)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Queries with oversampling&lt;/strong&gt; (oversampling multiplies the effective k, pushing queries into the range where gains are largest)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://manual.manticoresearch.com/Searching/KNN#Early-termination" rel="noopener noreferrer"&gt;KNN early termination documentation&lt;/a&gt; - how Manticore detects HNSW convergence and stops early&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://manual.manticoresearch.com/Searching/KNN#Filtering-strategies:-prefilter-vs.-postfilter" rel="noopener noreferrer"&gt;KNN filtering documentation&lt;/a&gt; - how prefiltering works with ACORN-1&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://manual.manticoresearch.com/Searching/KNN" rel="noopener noreferrer"&gt;KNN vector search reference&lt;/a&gt; - full syntax, parameters, quantization, rescoring&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>Manticore Search 27.1.5: Authentication, sharded tables, conversational search and faster vector search</title>
      <dc:creator>Sergey Nikolaev</dc:creator>
      <pubDate>Mon, 22 Jun 2026 02:19:45 +0000</pubDate>
      <link>https://dev.to/sanikolaev/manticore-search-2715-authentication-sharded-tables-conversational-search-and-faster-vector-204g</link>
      <guid>https://dev.to/sanikolaev/manticore-search-2715-authentication-sharded-tables-conversational-search-and-faster-vector-204g</guid>
      <description>&lt;p&gt;&lt;a href="https://dev.to/install/"&gt;Manticore Search 27.1.5&lt;/a&gt; has been released. This release brings built-in authentication and authorization, sharded tables, conversational search, faster HNSW builds, better faceting and aggregations, and a long list of fixes across KNN, replication, protocol compatibility and other areas.&lt;/p&gt;

&lt;p&gt;This post is a catch-up for everything shipped from &lt;strong&gt;25.0.1 through 27.1.5&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Upgrade Notes
&lt;/h2&gt;

&lt;p&gt;Please review these before upgrading:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;27.0.0 adds built-in auth/authz, and enabling it changes access assumptions.&lt;/strong&gt; Auth is not enabled by default, but once you enable it, anonymous access no longer works. Roll it out in stages: upgrade remote agents and replication peers first, then upgrade the masters that query or manage them, and enable auth only after the whole topology is on the new version. Distributed remote-agent and replication-related operations also need matching stored auth data across the participating daemons. A successful &lt;code&gt;JOIN CLUSTER&lt;/code&gt; replaces the joining node's local auth data with the donor cluster's auth data. (&lt;a href="https://github.com/manticoresoftware/manticoresearch/issues/2833" rel="noopener noreferrer"&gt;Issue #2833&lt;/a&gt;, &lt;a href="https://github.com/manticoresoftware/manticoresearch/pull/3648" rel="noopener noreferrer"&gt;PR #3648&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;26.0.0 changed replication storage layout.&lt;/strong&gt; Incoming replicated tables now live under the normal &lt;a href="https://manual.manticoresearch.com/Server_settings/Searchd#data_dir" rel="noopener noreferrer"&gt;&lt;code&gt;data_dir/&amp;lt;table&amp;gt;&lt;/code&gt;&lt;/a&gt; layout instead of the cluster &lt;code&gt;path&lt;/code&gt;. If you run replication clusters with a custom &lt;code&gt;path&lt;/code&gt;, you may need to move or re-synchronize replicated tables after upgrade. Downgrade is only safe before the new layout is adopted. (&lt;a href="https://github.com/manticoresoftware/manticoresearch/issues/4431" rel="noopener noreferrer"&gt;Issue #4431&lt;/a&gt;, &lt;a href="https://github.com/manticoresoftware/manticoresearch/pull/4598" rel="noopener noreferrer"&gt;PR #4598&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If you manage MCL separately from the daemon, upgrade it together with Manticore.&lt;/strong&gt; This release line moves through several &lt;a href="https://github.com/manticoresoftware/columnar" rel="noopener noreferrer"&gt;MCL&lt;/a&gt; updates, from vector-performance work to multithreaded HNSW builds and later stability fixes. Mixing an older library with a newer daemon is not recommended. (&lt;a href="https://github.com/manticoresoftware/manticoresearch/releases/tag/25.2.0" rel="noopener noreferrer"&gt;25.2.0&lt;/a&gt;, &lt;a href="https://github.com/manticoresoftware/manticoresearch/releases/tag/25.15.0" rel="noopener noreferrer"&gt;25.15.0&lt;/a&gt;, &lt;a href="https://github.com/manticoresoftware/manticoresearch/releases/tag/26.0.3" rel="noopener noreferrer"&gt;26.0.3&lt;/a&gt;, &lt;a href="https://github.com/manticoresoftware/manticoresearch/releases/tag/26.3.2" rel="noopener noreferrer"&gt;26.3.2&lt;/a&gt;, &lt;a href="https://github.com/manticoresoftware/manticoresearch/releases/tag/27.1.0" rel="noopener noreferrer"&gt;27.1.0&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Highlights
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Built-in authentication and authorization
&lt;/h3&gt;

&lt;p&gt;Manticore now supports &lt;a href="https://manual.manticoresearch.com/Security/Authentication_and_authorization" rel="noopener noreferrer"&gt;users, passwords, bearer tokens, and fine-grained permissions&lt;/a&gt; across MySQL, HTTP/HTTPS, distributed remote agents, and replication-related operations. This makes access control a first-class part of the product instead of something that always has to be handled outside the database.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sharded tables
&lt;/h3&gt;

&lt;p&gt;Manticore can now create and manage &lt;a href="https://manual.manticoresearch.com/Creating_a_table/Creating_a_sharded_table/Creating_a_sharded_table" rel="noopener noreferrer"&gt;sharded tables&lt;/a&gt;, distribute inserts across shards, and handle more of the surrounding lifecycle in one place. That makes larger write-heavy deployments easier to operate and reduces the amount of sharding-specific logic that has to live outside the engine.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conversational search
&lt;/h3&gt;

&lt;p&gt;This release adds &lt;a href="https://manual.manticoresearch.com/Searching/Conversational_search" rel="noopener noreferrer"&gt;conversational search&lt;/a&gt; to Manticore Search. It is exposed through &lt;a href="https://manual.manticoresearch.com/Searching/Conversational_search" rel="noopener noreferrer"&gt;&lt;code&gt;CREATE CHAT MODEL&lt;/code&gt;&lt;/a&gt; and &lt;a href="https://manual.manticoresearch.com/Searching/Conversational_search" rel="noopener noreferrer"&gt;&lt;code&gt;CALL CHAT&lt;/code&gt;&lt;/a&gt;, so you can ask questions over an existing vectorized table instead of building a separate retrieval layer around the same data.&lt;/p&gt;

&lt;p&gt;Under the hood, Manticore Search runs KNN on a &lt;code&gt;FLOAT_VECTOR&lt;/code&gt; field, builds LLM context from that field's &lt;code&gt;from='...'&lt;/code&gt; source columns, keeps conversation history by &lt;code&gt;conversation_uuid&lt;/code&gt;, and returns both the answer and the supporting &lt;code&gt;sources&lt;/code&gt;. If you already keep embeddings in Manticore, this makes document Q&amp;amp;A and support-style assistants much easier to wire up.&lt;/p&gt;

&lt;h3&gt;
  
  
  Faster vector builds and KNN improvements
&lt;/h3&gt;

&lt;p&gt;Vector search kept improving throughout this cycle.&lt;/p&gt;

&lt;p&gt;Manticore improved KNN performance, added local ONNX embeddings support, sped up ONNX inference, and then made HNSW build and rebuild work much faster with multithreaded index construction.&lt;/p&gt;

&lt;p&gt;A few important steps in that work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/manticoresoftware/manticoresearch/releases/tag/25.1.0" rel="noopener noreferrer"&gt;25.1.0&lt;/a&gt; improved KNN distance calculation and AVX-512 loading.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/manticoresoftware/manticoresearch/releases/tag/25.2.0" rel="noopener noreferrer"&gt;25.2.0&lt;/a&gt; added local ONNX embeddings support in MCL and improved vector-search performance further.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/manticoresoftware/manticoresearch/releases/tag/25.14.0" rel="noopener noreferrer"&gt;25.14.0&lt;/a&gt; and &lt;a href="https://github.com/manticoresoftware/manticoresearch/releases/tag/25.15.0" rel="noopener noreferrer"&gt;25.15.0&lt;/a&gt; added multithreaded HNSW builds together with the required library support.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The biggest practical improvement here is a much faster auto-embedding and shorter build and rebuild time for large vector tables. Initial KNN builds, chunk merges, and &lt;code&gt;ALTER TABLE ... REBUILD KNN&lt;/code&gt; are all affected.&lt;/p&gt;

&lt;h3&gt;
  
  
  Better faceting and aggregations
&lt;/h3&gt;

&lt;p&gt;Faceting and aggregations also became more useful.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://manual.manticoresearch.com/Searching/Faceted_search" rel="noopener noreferrer"&gt;&lt;code&gt;facet_filter_mode&lt;/code&gt;&lt;/a&gt; makes it easier to build e-commerce-style filters that preserve selected, available, and unavailable buckets under active filtering.&lt;/p&gt;

&lt;p&gt;On the analytics side:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://manual.manticoresearch.com/Functions/Date_and_time_functions#DATE_HISTOGRAM%28%29" rel="noopener noreferrer"&gt;&lt;code&gt;date_histogram()&lt;/code&gt;&lt;/a&gt; gained &lt;code&gt;time_zone&lt;/code&gt; and &lt;code&gt;offset&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Opensearch dashboards support&lt;/li&gt;
&lt;li&gt;Manticore added statistical aggregations such as &lt;code&gt;percentiles&lt;/code&gt;, &lt;code&gt;percentile_ranks&lt;/code&gt;, and &lt;code&gt;mad&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Other Notable Improvements
&lt;/h2&gt;

&lt;p&gt;This release line also includes several smaller but useful additions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://manual.manticoresearch.com/Starting_the_server/Manually#searchd-command-line-options" rel="noopener noreferrer"&gt;&lt;code&gt;searchd --check&lt;/code&gt;&lt;/a&gt; validates configuration before startup without side effects.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://manual.manticoresearch.com/Creating_a_cluster/Setting_up_replication/Managing_replication_nodes#EXIT-CLUSTER" rel="noopener noreferrer"&gt;&lt;code&gt;EXIT CLUSTER&lt;/code&gt;&lt;/a&gt; lets a node leave a replication cluster online without restarting.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://manual.manticoresearch.com/Creating_a_table/NLP_and_tokenization/Low-level_tokenization#dict" rel="noopener noreferrer"&gt;&lt;code&gt;dict=keywords_32k&lt;/code&gt;&lt;/a&gt; makes it possible to index very long machine-generated tokens such as hashes and message IDs without silent truncation.&lt;/li&gt;
&lt;li&gt;The &lt;a href="https://manual.manticoresearch.com/Creating_a_table/NLP_and_tokenization/Morphology#morphology" rel="noopener noreferrer"&gt;built-in Ukrainian lemmatizer&lt;/a&gt; expands native morphology support for Ukrainian text search.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/manticoresoftware/manticoresearch/releases/tag/25.4.0" rel="noopener noreferrer"&gt;Systemd &lt;code&gt;Type=notify&lt;/code&gt;&lt;/a&gt; improves startup and shutdown supervision.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;searchd&lt;/code&gt; process under systemd management now logs to &lt;code&gt;systemd&lt;/code&gt; journal&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;JOIN&lt;/code&gt; queries now support explicit left-table column prefixes.&lt;/li&gt;
&lt;li&gt;OpenSearch Dashboards support.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;manticore-load&lt;/code&gt; gained multi-query support.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Bug Fixes
&lt;/h2&gt;

&lt;p&gt;This release line also includes &lt;strong&gt;65 changelog-listed fixes&lt;/strong&gt;. The latest follow-up releases added a few more worth calling out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;27.1.5 fixed a crash when fetching columnar &lt;code&gt;float_vector&lt;/code&gt; attributes.&lt;/li&gt;
&lt;li&gt;27.1.4 fixed &lt;code&gt;ALTER TABLE ... RECONFIGURE&lt;/code&gt; and &lt;code&gt;SHOW CREATE TABLE&lt;/code&gt; for one-way upgrades from &lt;code&gt;dict='keywords'&lt;/code&gt; to &lt;code&gt;dict=keywords_32k&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;27.1.3 updated Buddy to 4.0.1 and tightened Queue-plugin mutation permission handling under auth.&lt;/li&gt;
&lt;li&gt;KNN-by-&lt;code&gt;doc_id&lt;/code&gt; queries now preserve &lt;code&gt;offset&lt;/code&gt; and &lt;code&gt;max_matches&lt;/code&gt; correctly.&lt;/li&gt;
&lt;li&gt;KNN rescoring order was fixed, so explicit &lt;code&gt;ORDER BY&lt;/code&gt; tie-breakers work again.&lt;/li&gt;
&lt;li&gt;Hybrid fused queries with &lt;code&gt;GROUP BY&lt;/code&gt; on columnar tables stopped crashing.&lt;/li&gt;
&lt;li&gt;Replication and node-rejoin crash paths were cleaned up further.&lt;/li&gt;
&lt;li&gt;Binary MySQL protocol behavior was fixed in 25.12.1, which matters for integrations that expect real client compatibility.&lt;/li&gt;
&lt;li&gt;Fluent Bit bulk-ingest interoperability was fixed, preventing successful responses from being replayed as duplicate inserts.&lt;/li&gt;
&lt;li&gt;27.1.2 fixed &lt;code&gt;sql_attr_multi&lt;/code&gt; handling for plain indexes built from multiple &lt;code&gt;source&lt;/code&gt; blocks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For the complete list, see the &lt;a href="https://manual.manticoresearch.com/Changelog" rel="noopener noreferrer"&gt;changelog&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Need help or want to connect?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Join our &lt;a href="https://slack.manticoresearch.com" rel="noopener noreferrer"&gt;Slack&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Visit the &lt;a href="https://forum.manticoresearch.com" rel="noopener noreferrer"&gt;Forum&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Report issues or suggest features on &lt;a href="https://github.com/manticoresoftware/manticoresearch/issues" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Email us at &lt;code&gt;contact@manticoresearch.com&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>database</category>
      <category>news</category>
      <category>security</category>
    </item>
    <item>
      <title>The Evolution of 'More Like This</title>
      <dc:creator>Sergey Nikolaev</dc:creator>
      <pubDate>Tue, 02 Jun 2026 04:01:20 +0000</pubDate>
      <link>https://dev.to/sanikolaev/the-evolution-of-more-like-this-5225</link>
      <guid>https://dev.to/sanikolaev/the-evolution-of-more-like-this-5225</guid>
      <description>&lt;p&gt;In many search scenarios, the user does not start from an empty query box, but from an existing result.&lt;/p&gt;

&lt;p&gt;A user opens an article and wants to find related material. A buyer views a product card and looks for close alternatives. A support engineer investigates an incident and wants to see earlier cases with the same symptoms. In all these situations, the user already has a relevant document to start from.&lt;/p&gt;

&lt;p&gt;This scenario is traditionally called &lt;strong&gt;More Like This (MLT)&lt;/strong&gt;: a function for finding documents similar to the selected one. In this article, MLT means search that starts from a known document, not from a newly typed query.&lt;/p&gt;

&lt;p&gt;The classic MLT approach, or similar-document search, was based on comparing textual matches. Modern implementations increasingly use embeddings: numerical representations of documents. A search index stores embeddings as vectors, and the search system can find documents with close vector representations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Short glossary
&lt;/h2&gt;

&lt;p&gt;To avoid repeating definitions throughout the article, here are the main terms:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Term&lt;/th&gt;
&lt;th&gt;Meaning in this article&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;More Like This (MLT)&lt;/td&gt;
&lt;td&gt;search for documents similar to an already selected document&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;embedding&lt;/td&gt;
&lt;td&gt;a numerical representation of text, a product, an image, or another object&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;embedding vector&lt;/td&gt;
&lt;td&gt;a numerical representation of an object, such as text or a product, stored in the index to find similar objects by vector proximity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;KNN, nearest-neighbor search&lt;/td&gt;
&lt;td&gt;search for nearest neighbors, meaning objects with close vectors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ANN, approximate nearest neighbors&lt;/td&gt;
&lt;td&gt;approximate nearest-neighbor search; it speeds up KNN on large datasets without scanning every vector&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RAG, Retrieval-Augmented Generation&lt;/td&gt;
&lt;td&gt;an approach where the search system retrieves context for a generative model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;hybrid search&lt;/td&gt;
&lt;td&gt;combining full-text search and vector search in one scenario&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;reranking&lt;/td&gt;
&lt;td&gt;an additional sorting step for already retrieved candidates using a more precise model or rule&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What classic More Like This did
&lt;/h2&gt;

&lt;p&gt;Classic MLT was lexical. It answered a simple question: which documents use similar important words?&lt;/p&gt;

&lt;p&gt;The process usually looked like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The search system took the source document.&lt;/li&gt;
&lt;li&gt;It analyzed its text.&lt;/li&gt;
&lt;li&gt;It selected informative terms.&lt;/li&gt;
&lt;li&gt;It built a query from those terms.&lt;/li&gt;
&lt;li&gt;It searched for documents with a similar set of words.&lt;/li&gt;
&lt;li&gt;It returned a list of similar documents.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Internally, this used familiar full-text search mechanisms: TF-IDF or BM25, term frequency, stopwords, field boosts, and document-frequency limits. That is why older MLT implementations exposed parameters such as &lt;code&gt;min_term_freq&lt;/code&gt;, &lt;code&gt;min_doc_freq&lt;/code&gt;, &lt;code&gt;max_doc_freq&lt;/code&gt;, and &lt;code&gt;max_query_terms&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This was not just an interface element, but a full search mechanism. MLT was used for related articles and products, duplicate detection, support-ticket matching, legal search, patent research, and internal knowledge bases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the lexical approach is still strong
&lt;/h2&gt;

&lt;p&gt;Lexical MLT works well when specific words, identifiers, and stable formulations matter.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;error codes;&lt;/li&gt;
&lt;li&gt;product SKUs;&lt;/li&gt;
&lt;li&gt;part numbers;&lt;/li&gt;
&lt;li&gt;function names;&lt;/li&gt;
&lt;li&gt;stack traces;&lt;/li&gt;
&lt;li&gt;legal wording;&lt;/li&gt;
&lt;li&gt;nearly identical product or ticket descriptions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The reason is that exact matching is critical here. If two incident reports contain the same error code or the same stack trace, full-text search sees a direct match. For example, when searching tickets with the code &lt;code&gt;ERR_404&lt;/code&gt;, lexical MLT quickly finds every mention of that code, while vector search may return tickets that describe similar but not identical problems.&lt;/p&gt;

&lt;p&gt;Lexical MLT had another advantage: it was cheap to run. The inverted index is already in the search engine. The analyzers are already configured. Ranking already works. There is no need to deploy separate search infrastructure just to support a “find similar” feature.&lt;/p&gt;

&lt;p&gt;The limitation is also clear. If two documents describe the same thing in different words, lexical MLT may fail to connect them. Synonyms work unevenly. Paraphrases are harder. Cross-lingual similarity is usually unavailable. For example, &lt;code&gt;memory leak&lt;/code&gt; and &lt;code&gt;unbounded heap growth&lt;/code&gt; may describe the same problem, but a standard analyzer sees different tokens.&lt;/p&gt;

&lt;p&gt;Lexical MLT efficiently finds documents with matching or similar wording. Semantic search helps when the meaning matches, not the words.&lt;/p&gt;

&lt;h2&gt;
  
  
  What embeddings change
&lt;/h2&gt;

&lt;p&gt;Using &lt;a href="https://manticoresearch.com/blog/vector-search-deep-dive/" rel="noopener noreferrer"&gt;embeddings&lt;/a&gt; — numerical representations of documents — changes the comparison principle: instead of words, the system compares vector representations.&lt;/p&gt;

&lt;p&gt;A document no longer has to be represented only as a set of weighted terms. It can be stored as a dense vector. Nearby vectors usually correspond to documents that are similar in meaning, even if they are written in different words.&lt;/p&gt;

&lt;p&gt;The lexical approach looks for matches by words and terms, while embedding search looks at the proximity of document vector representations. The first approach is optimal for exact matches such as error codes and SKUs. The second finds semantically close documents, even when they are phrased differently.&lt;/p&gt;

&lt;p&gt;This expands the scope of this kind of search. You can compare not only articles, but also products, images, code fragments, user events, or context fragments in a RAG system. In RAG, the search system first retrieves relevant context, and then the generative model uses that context to produce an answer.&lt;/p&gt;

&lt;p&gt;Lexical search does not disappear. Exact error codes, SKUs, names, statute references, and near duplicates are still better handled lexically. That is why production systems often use &lt;a href="https://manticoresearch.com/blog/hybrid-search/" rel="noopener noreferrer"&gt;hybrid search&lt;/a&gt;: full-text search provides exact matches, vector search adds results by meaning, filters constrain the search space, and reranking refines the final order.&lt;/p&gt;

&lt;p&gt;As shown in our &lt;a href="https://manticoresearch.com/blog/lexical-search-vs-vector-search/" rel="noopener noreferrer"&gt;comparison of lexical and vector search&lt;/a&gt;, the former wins on precise strict matches, while the latter improves coverage of semantic relationships.&lt;/p&gt;

&lt;h2&gt;
  
  
  MLT as lookup by a vector from the index
&lt;/h2&gt;

&lt;p&gt;If a vector representation has already been computed for a document and stored in the index, modern MLT can be described without a separate API example:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Take the source document.&lt;/li&gt;
&lt;li&gt;Retrieve its precomputed vector representation from the index.&lt;/li&gt;
&lt;li&gt;Find the nearest vectors.&lt;/li&gt;
&lt;li&gt;Return the documents those vectors belong to.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is still More Like This: the user starts from one document and gets related results. Only the comparison method changes. Instead of extracting terms, the search system uses the vector representation of the source document.&lt;/p&gt;

&lt;p&gt;In Manticore Search, this operation can be performed directly at the search-engine level: the query specifies the ID of the source document, and Manticore takes its embedding vector from the index and runs KNN search. The application does not need to fetch the vector separately, serialize hundreds or thousands of numbers, and send them back in a second request.&lt;/p&gt;

&lt;p&gt;A minimal SQL example looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;knn_dist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;knn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;123&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, &lt;code&gt;embedding&lt;/code&gt; is the field with the precomputed embedding vector, &lt;code&gt;123&lt;/code&gt; is the ID of the source document, and &lt;code&gt;10&lt;/code&gt; is the number of nearest documents to return. The &lt;code&gt;knn_dist()&lt;/code&gt; function returns the distance between vectors: a smaller value means greater semantic proximity to the source document. The same operation can be performed through the HTTP JSON API; the search logic does not change. The application passes the document ID, and Manticore performs lookup using that document’s vector from the index.&lt;/p&gt;

&lt;p&gt;For large datasets, KNN is usually implemented through an ANN index. This speeds up search through approximate computation and avoids scanning every vector. For the user, the important part is not the internal structure of the index, but the result: quickly finding documents that are close to the source in meaning.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why search is better handled in the engine
&lt;/h2&gt;

&lt;p&gt;You can implement this scenario in the application: first fetch the document, then extract its vector, then send a separate KNN query, and then combine the result with filters.&lt;/p&gt;

&lt;p&gt;That approach makes the system architecture more complex. The application has to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pass the vector between services;&lt;/li&gt;
&lt;li&gt;prevent accidental logging;&lt;/li&gt;
&lt;li&gt;check the embedding model version;&lt;/li&gt;
&lt;li&gt;keep data synchronized with the main index;&lt;/li&gt;
&lt;li&gt;apply the same filters used in normal search.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When the search system performs the lookup itself, the path is shorter:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The application passes the ID of the source document.&lt;/li&gt;
&lt;li&gt;The search system finds the precomputed vector representation in the index.&lt;/li&gt;
&lt;li&gt;The search system runs nearest-neighbor search (KNN) or its approximate variant (ANN).&lt;/li&gt;
&lt;li&gt;The search system returns the found documents with the same access filters and metadata.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Benefits of this approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fewer inter-service requests from the application;&lt;/li&gt;
&lt;li&gt;large vectors do not have to be sent through external APIs;&lt;/li&gt;
&lt;li&gt;filters stay close to search;&lt;/li&gt;
&lt;li&gt;the result is easier to reproduce and debug;&lt;/li&gt;
&lt;li&gt;the application does not need an additional layer for similarity calculation — everything runs inside the search engine.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This will not fix poor embeddings or remove the need to tune ranking. But it reduces the number of interacting components in the search chain, which makes the system easier to maintain.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical examples and the evolution of MLT
&lt;/h2&gt;

&lt;p&gt;Search from an existing object is especially useful when the user has already found a relevant starting point.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Source object&lt;/th&gt;
&lt;th&gt;What to find&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Support&lt;/td&gt;
&lt;td&gt;ticket with an error&lt;/td&gt;
&lt;td&gt;past tickets with similar symptoms and related fixes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Catalog&lt;/td&gt;
&lt;td&gt;product card&lt;/td&gt;
&lt;td&gt;close alternatives, similar models, or products from the same category&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RAG&lt;/td&gt;
&lt;td&gt;relevant fragment already found by the first search&lt;/td&gt;
&lt;td&gt;context expansion: neighboring sections of the same document, related documentation fragments, or similar discussions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Developer tools&lt;/td&gt;
&lt;td&gt;stack trace, diff, or bug description&lt;/td&gt;
&lt;td&gt;related code changes, discussions, and past incidents&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In these examples, there is no need to type a new query manually. The system uses the source object as a reference point and finds documents similar to it lexically, semantically, or by both criteria.&lt;/p&gt;

&lt;p&gt;In the context of RAG, this is not about the primary search by the user’s query, but about subsequent context selection: the system has already found a relevant fragment and uses it as the reference object to collect surrounding context. This is useful when one fragment is too narrow: nearby content may include a term definition, a configuration example, a related discussion, or a neighboring section of the same guide.&lt;/p&gt;

&lt;p&gt;In systems with personalization or AI agents, it is important to clearly define which data is used for search: the system may consider the user’s search-query history, the context of previous interactions, or saved working notes. This makes it clear which data participates in retrieval and why the result is considered similar.&lt;/p&gt;

&lt;p&gt;The evolution of MLT can be described like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Period&lt;/th&gt;
&lt;th&gt;What changed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2000s&lt;/td&gt;
&lt;td&gt;MLT mostly relied on lexical analysis, TF-IDF, BM25, and term overlap.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2010s&lt;/td&gt;
&lt;td&gt;Word2Vec and GloVe appeared and became widely used, making it possible to build semantic embeddings of words and texts.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Early 2020s&lt;/td&gt;
&lt;td&gt;FAISS and similar ANN libraries made it possible to run vector search efficiently even on very large datasets.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mid-2020s&lt;/td&gt;
&lt;td&gt;RAG, recommendations, and search from an existing object made lookup by stored vectors a common product scenario.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The evolution of MLT is a shift from lexical comparison to matching document vector representations. But the practical request stayed the same: find documents relevant to the source result.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to keep in mind
&lt;/h2&gt;

&lt;p&gt;Semantic MLT does not replace all search engineering.&lt;/p&gt;

&lt;p&gt;Production systems still need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;exact search for identifiers, error codes, and other strict matches;&lt;/li&gt;
&lt;li&gt;embedding model metadata and versioning;&lt;/li&gt;
&lt;li&gt;ACL filters: rules for document access by roles or users;&lt;/li&gt;
&lt;li&gt;tenant filters: data isolation between customers or workspaces;&lt;/li&gt;
&lt;li&gt;hybrid search when both meaning and exact matches matter;&lt;/li&gt;
&lt;li&gt;reranking when result order is critical;&lt;/li&gt;
&lt;li&gt;search-quality monitoring: precision and recall metrics, false-positive frequency, and missed relevant documents caused by ANN-index approximation errors.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lexical MLT can miss documents that use different words. Vector search sometimes returns overly broad results, or false positives, and can miss relevant documents because of the approximate nature of ANN indexes. That is why the quality of this kind of search should be evaluated on real queries and real data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;More Like This has moved from purely lexical search to hybrid solutions that combine lexical, vector, and filtering mechanisms.&lt;/p&gt;

&lt;p&gt;The core concept remains the same: the user selects a source document, and the system finds materials relevant to it, taking both lexical and semantic similarity into account.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>algorithms</category>
      <category>machinelearning</category>
      <category>nlp</category>
    </item>
    <item>
      <title>KNN early termination in Manticore Search</title>
      <dc:creator>Sergey Nikolaev</dc:creator>
      <pubDate>Mon, 01 Jun 2026 09:51:55 +0000</pubDate>
      <link>https://dev.to/sanikolaev/knn-early-termination-in-manticore-search-4fa5</link>
      <guid>https://dev.to/sanikolaev/knn-early-termination-in-manticore-search-4fa5</guid>
      <description>&lt;p&gt;Modern search engines do more than match keywords. When you search for "cozy mystery set in Paris" and get results for "atmospheric detective novel in France" that's vector search at work: documents and queries are converted into lists of numbers, called embeddings, and the search engine finds the documents whose numbers are closest to the query's.&lt;/p&gt;

&lt;p&gt;Manticore Search supports this natively. Under the hood, it uses a data structure called HNSW: a graph that connects nearby vectors, so it can find nearest neighbors quickly without scanning every document. That makes vector search fast enough to run on millions of documents in milliseconds.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;But HNSW has an inefficiency. Early in the traversal, almost every distance computation finds a better candidate than the ones already in the result set.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;As the search goes on, those improvements become rarer, but the algorithm keeps traversing the graph until it exhausts its exploration budget. By that point, the result set has often already converged, and the remaining work does little or nothing to improve it. Early termination fixes this by detecting that point and stopping early.&lt;/p&gt;

&lt;p&gt;The effect becomes more noticeable as &lt;code&gt;k&lt;/code&gt; grows, where &lt;code&gt;k&lt;/code&gt; is the number of nearest neighbors the query asks Manticore to return. Returning more neighbors requires more graph exploration, and much of that extra work happens after the result set has already stabilized. That also makes early termination more valuable, because it has more unnecessary work to cut.&lt;/p&gt;

&lt;p&gt;This gets more pronounced with &lt;a href="https://manual.manticoresearch.com/Searching/KNN#Vector-quantization" rel="noopener noreferrer"&gt;vector quantization&lt;/a&gt;. Quantization compresses stored vectors to save memory, which slightly lowers search precision. To recover it, Manticore uses &lt;a href="https://manual.manticoresearch.com/Searching/KNN#KNN-vector-search" rel="noopener noreferrer"&gt;oversampling&lt;/a&gt;: it fetches 3x more candidates than requested, then rescores them using the original full-precision vectors. With the default 3x oversampling, HNSW explores many more candidates per query. Large &lt;code&gt;k&lt;/code&gt; values often come from this kind of candidate expansion: an application may ask the vector index for hundreds or thousands of candidates, then rescore, rerank, or filter them down to a much smaller final result set to improve recall and precision. That raises latency, and early termination helps win some of it back.&lt;/p&gt;

&lt;p&gt;The waste is measurable. Benchmarks on a 1M-vector dataset show that with &lt;code&gt;k=60&lt;/code&gt;, which is the default result limit with default 3x oversampling, early termination reduces distance computations to about 65% of the full search. At &lt;code&gt;k=1000&lt;/code&gt;, computations drop to 30%. At &lt;code&gt;k=10000&lt;/code&gt;, just 20%. The search converges long before the exploration budget runs out, and the savings grow with &lt;code&gt;k&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Early termination lets Manticore detect this convergence and stop. The algorithm was designed with a specific precision target: lose no more than 2-4% of result set precision compared to a full HNSW search.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;The algorithm tracks a simple signal: discovery rate - the fraction of distance computations that actually improve the result set.&lt;/p&gt;

&lt;p&gt;Each time a new node's distance is computed, one of two things happens: either it's good enough to enter the heap - the priority queue that holds the current best candidate neighbors - or it's worse than everything already there and gets discarded. Entering the heap counts as a "discovery." Early in the search, discoveries are frequent - the heap is filling up and most candidates are useful. As the search progresses and the heap saturates with good results, discoveries become rare. Most new distance computations just confirm that the algorithm has already found the best candidates.&lt;/p&gt;

&lt;p&gt;Manticore monitors this transition. After each round of neighbor expansion, it computes the discovery rate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;discovery_rate &lt;span class="o"&gt;=&lt;/span&gt; new_candidates_collected / distances_computed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If this rate stays below a threshold for several rounds in a row, the search stops. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The idea is simple: if the algorithm keeps computing distances but nothing improves the result, the search has converged.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The threshold: quantile-based adaptation
&lt;/h2&gt;

&lt;p&gt;That raises the obvious next question: what threshold should count as "low"? A fixed threshold wouldn't work well - different datasets and different regions of the same dataset have wildly different discovery rate distributions. What counts as "low" depends on context.&lt;/p&gt;

&lt;p&gt;Manticore uses a quantile-based adaptive threshold. Instead of comparing the discovery rate against a fixed number, it continuously estimates a low percentile of recent rounds (20th percentile, or 14th percentile for L2 distance) and uses that as the baseline. This keeps the method lightweight while letting it adapt to different datasets and different regions of the graph.&lt;/p&gt;

&lt;p&gt;In other words, the threshold adapts to the local search pattern. If the algorithm enters a sparse region of the graph, the threshold drops and avoids stopping too early. If it enters a richer region, the threshold rises.&lt;/p&gt;

&lt;h2&gt;
  
  
  Patience: how many bad rounds before stopping
&lt;/h2&gt;

&lt;p&gt;The threshold alone is not enough, though. A single round with a low discovery rate isn't enough to declare convergence. It could just be a temporary dip before the search finds a better path. Manticore uses a "patience counter" that requires multiple consecutive bad rounds before terminating.&lt;/p&gt;

&lt;p&gt;The patience value scales inversely with &lt;code&gt;ef&lt;/code&gt;, the HNSW exploration factor that controls how many candidates the search keeps exploring. For example, patience ranges from 9 at low &lt;code&gt;ef&lt;/code&gt; values down to 6 at very high &lt;code&gt;ef&lt;/code&gt;. Larger &lt;code&gt;ef&lt;/code&gt; values mean more total rounds, so even with lower patience the algorithm has seen more evidence before deciding to stop. The counter resets to zero whenever a round has a healthy discovery rate, so a single good round restarts the patience window. This prevents the algorithm from stopping during a temporary plateau that leads to a productive region of the graph.&lt;/p&gt;

&lt;h2&gt;
  
  
  Warm-up phase
&lt;/h2&gt;

&lt;p&gt;The algorithm ignores the termination signal while the heap is still filling up, meaning fewer than &lt;code&gt;ef&lt;/code&gt; candidates have been collected. During this phase, discovery rates are artificially high because almost everything enters the heap, so the signal is not useful. Early termination only starts once the heap is full and new candidates must replace existing ones.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark results
&lt;/h2&gt;

&lt;p&gt;The quantile thresholds were tuned to keep precision loss within 2–4%. They were tuned separately for L2 and cosine/IP distance metrics, and validated across both &lt;a href="https://manual.manticoresearch.com/Searching/KNN#Vector-quantization" rel="noopener noreferrer"&gt;quantized and non-quantized&lt;/a&gt; data.&lt;/p&gt;

&lt;p&gt;The following benchmarks were run on the &lt;a href="https://huggingface.co/datasets/KShivendu/dbpedia-entities-openai-1M" rel="noopener noreferrer"&gt;dbpedia-entities&lt;/a&gt; dataset (1M vectors, 768 dimensions) on a machine with 8 physical cores / 16 logical cores.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Precision" here means the fraction of true k-nearest neighbors that appear in the result set (with fixed k, this is the same as recall@k).&lt;/li&gt;
&lt;li&gt;"Precision ratio" is the precision of HNSW with early termination ("ET") divided by precision without it (1.0 means no precision loss). &lt;/li&gt;
&lt;li&gt;"Visit ratio" is the fraction of distance computations performed compared to full HNSW search (lower is better). &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://manual.manticoresearch.com/Searching/KNN#KNN-vector-search" rel="noopener noreferrer"&gt;Oversampling and rescoring&lt;/a&gt; were disabled to isolate the effect of early termination on raw HNSW traversal.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh7unr6rkgltv8ws2y2sr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh7unr6rkgltv8ws2y2sr.png" alt=" " width="800" height="461"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The green line on the chart (precision) stays almost flat across all &lt;code&gt;k&lt;/code&gt; values, with precision ratio remaining above 0.97 throughout the benchmark. Meanwhile the orange line (visit ratio) drops steeply. At &lt;code&gt;k=100&lt;/code&gt;, it cuts distance computations nearly in half. At &lt;code&gt;k=1000&lt;/code&gt;, it saves 70%. At &lt;code&gt;k=10000&lt;/code&gt;, 80%.&lt;/p&gt;

&lt;p&gt;At &lt;code&gt;k &amp;lt;= 10&lt;/code&gt;, early termination is disabled because the search is already cheap and the savings are too small to justify any precision loss. The savings grow with &lt;code&gt;k&lt;/code&gt;, because larger result sets lead to more rounds of neighbor expansion and more chances to detect convergence early.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance under concurrent load
&lt;/h2&gt;

&lt;p&gt;The benchmarks above show that early termination cuts distance computations a lot while preserving precision. But what does that mean for actual query latency, especially under concurrent load? The chart below shows latency ratios (ET / no ET) at 1, 8, and 16 concurrent threads on the same dbpedia dataset:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx61vv2acc8ytv5jd7dsa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx61vv2acc8ytv5jd7dsa.png" alt=" " width="800" height="461"&gt;&lt;/a&gt;&lt;br&gt;
At &lt;code&gt;k=1000&lt;/code&gt;, early termination reduces distance computations by 71% (ratio 0.29). The latency improvement depends on how many threads are running at the same time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1 thread:&lt;/strong&gt; 24% faster (ratio 0.76)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;8 threads:&lt;/strong&gt; 45% faster (ratio 0.55)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;16 threads:&lt;/strong&gt; 48% faster (ratio 0.52)&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;The distance computation savings stay the same regardless of thread count, but the latency benefit nearly doubles from 1 to 16 threads.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The main reason is lower pressure on the CPU memory system. Each distance computation pulls vector data and graph links into cache. When several threads run HNSW traversal at the same time, they compete for shared cache and memory bandwidth. Doing fewer distance computations per query reduces memory traffic, keeps each thread’s working set smaller, and lowers cache churn between queries. As a result, each thread finishes faster and interferes less with the others.&lt;/p&gt;

&lt;p&gt;Single-thread benchmarks understate the benefit of early termination. Under production-like concurrent load, the percentage latency reduction is roughly twice as large.&lt;/p&gt;
&lt;h2&gt;
  
  
  When early termination kicks in (and when it doesn't)
&lt;/h2&gt;

&lt;p&gt;Early termination is enabled by default and works on both quantized and non-quantized vector data. It is automatically disabled when &lt;code&gt;k &amp;lt;= 10&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The benefit grows with the effective exploration budget, which is &lt;code&gt;max(ef, k)&lt;/code&gt;. Since hnswlib uses this internally as the number of candidates it keeps in play, larger &lt;code&gt;k&lt;/code&gt; means more candidates, more rounds, and more chances to detect convergence.&lt;/p&gt;

&lt;p&gt;Quantized vectors are typically used with rescoring and oversampling (both enabled by default) to recover precision lost from quantization. Oversampling (default 3x) multiplies the effective &lt;code&gt;k&lt;/code&gt; passed to HNSW.  For example, a query with &lt;code&gt;k=100&lt;/code&gt; uses 300 candidates internally when oversampling is 3x. That larger search budget gives early termination more room to detect convergence and stop early. Since the performance benefit of early termination grows with &lt;code&gt;k&lt;/code&gt;, oversampling pushes queries into the range where the savings are largest.&lt;/p&gt;
&lt;h2&gt;
  
  
  Syntax
&lt;/h2&gt;

&lt;p&gt;Early termination is on by default. To disable it:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQL:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- default: early termination enabled&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;knn_dist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;knn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;45&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;78&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;33&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

&lt;span class="c1"&gt;-- explicitly disable early termination&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;knn_dist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;knn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;45&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;78&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;33&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;early_termination&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;-- combine with other KNN options&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;knn_dist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;knn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;45&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;78&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;33&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;ef&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;early_termination&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;JSON:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;POST&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;/search&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"table"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"products"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"knn"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"field"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"embedding"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.45&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.78&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.33&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"early_termination"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  When to disable it
&lt;/h2&gt;

&lt;p&gt;There are a few scenarios where you might want to turn early termination off:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Maximum precision is critical.&lt;/strong&gt; Early termination trades a small amount of recall for speed. If your application requires the absolute best recall that HNSW can provide at a given &lt;code&gt;ef&lt;/code&gt;, disable it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Small k values (&amp;lt;= 30).&lt;/strong&gt; The algorithm auto-disables for &lt;code&gt;k &amp;lt;= 10&lt;/code&gt;, but even for &lt;code&gt;k&lt;/code&gt; between 11 and 30, the performance benefit is modest. If you notice any recall difference in this range, disabling early termination costs little in latency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Benchmarking HNSW recall.&lt;/strong&gt; If you are measuring HNSW recall, you probably want deterministic behavior without adaptive shortcuts. Disable early termination to get a clean baseline.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How it relates to other KNN optimizations
&lt;/h2&gt;

&lt;p&gt;Early termination is one of several optimizations that Manticore applies to KNN search. It works independently of and stacks with the others:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://manticoresearch.com/blog/knn-prefiltering/" rel="noopener noreferrer"&gt;Prefiltering&lt;/a&gt; reduces wasted work by skipping filtered-out documents during HNSW traversal. Early termination reduces wasted work by stopping the traversal once the result set has converged. They solve different problems and work well together.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://manticoresearch.com/blog/quantization/#why-oversampling--rescoring-matters" rel="noopener noreferrer"&gt;Oversampling&lt;/a&gt; retrieves more candidates than &lt;code&gt;k&lt;/code&gt; to improve recall after rescoring. Early termination can reduce the cost of that expanded search by stopping once enough good candidates have been found.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://manticoresearch.com/blog/quantization/#why-oversampling--rescoring-matters" rel="noopener noreferrer"&gt;Rescoring&lt;/a&gt; recalculates distances using full-precision vectors after the initial search with quantized vectors. Early termination operates during the initial quantized search phase, reducing the number of candidates evaluated before rescoring kicks in.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic brute-force fallback&lt;/strong&gt; skips HNSW entirely when a linear scan is cheaper. Early termination only applies when HNSW is actually used.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>algorithms</category>
      <category>database</category>
      <category>machinelearning</category>
      <category>performance</category>
    </item>
    <item>
      <title>How to Make xt850 Match xt 850</title>
      <dc:creator>Sergey Nikolaev</dc:creator>
      <pubDate>Fri, 08 May 2026 05:30:14 +0000</pubDate>
      <link>https://dev.to/sanikolaev/how-to-make-xt850-match-xt-850-o15</link>
      <guid>https://dev.to/sanikolaev/how-to-make-xt850-match-xt-850-o15</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Since version &lt;code&gt;23.0.0&lt;/code&gt;, Manticore can make searches like &lt;code&gt;xt850&lt;/code&gt; match &lt;code&gt;xt 850&lt;/code&gt; using &lt;a href="https://manual.manticoresearch.com/dev/Creating_a_table/NLP_and_tokenization/Low-level_tokenization#bigram_delimiter" rel="noopener noreferrer"&gt;bigram_delimiter&lt;/a&gt; together with digit-aware &lt;a href="https://manual.manticoresearch.com/dev/Creating_a_table/NLP_and_tokenization/Low-level_tokenization#bigram_index" rel="noopener noreferrer"&gt;bigram_index&lt;/a&gt; modes.&lt;/p&gt;

&lt;p&gt;This solves a common tokenization mismatch in product search, where users remove spaces from model names but the source data stores them as separate tokens.&lt;/p&gt;

&lt;h2&gt;
  
  
  Assumptions and verification
&lt;/h2&gt;

&lt;p&gt;This article assumes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RT tables created with SQL examples exactly as shown&lt;/li&gt;
&lt;li&gt;default tokenization unless the example explicitly changes a setting&lt;/li&gt;
&lt;li&gt;ASCII digits in model names, because &lt;code&gt;second_numeric&lt;/code&gt; and &lt;code&gt;second_has_digit&lt;/code&gt; are digit-aware modes built around &lt;code&gt;0-9&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All SQL examples and expected outputs in this article were verified against a real Manticore &lt;code&gt;23.0.0&lt;/code&gt; instance before publishing, using fresh tables created from scratch for each scenario.&lt;/p&gt;

&lt;h2&gt;
  
  
  The broader search problem
&lt;/h2&gt;

&lt;p&gt;Imagine a catalog containing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;xt 850 action camera&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;iphone 5se battery case&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;canon eos 80d body&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;thinkpad x1 carbon&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now imagine users searching for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;xt850&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;iphone5se&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;eos80d&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;thinkpadx1&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From the user's point of view, these should obviously match. From the engine's point of view, they often do not, because the indexed text is tokenized as separate terms.&lt;/p&gt;

&lt;p&gt;Search systems usually attack that mismatch in one of four ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;index prefixes or infixes&lt;/li&gt;
&lt;li&gt;add custom normalization rules&lt;/li&gt;
&lt;li&gt;duplicate content into alternate normalized fields&lt;/li&gt;
&lt;li&gt;index adjacent token pairs and optionally store glued variants too&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manticore's newer bigram functionality is a structured way to do the fourth option without awkward field duplication.&lt;/p&gt;

&lt;h2&gt;
  
  
  Baseline: why &lt;code&gt;xt850&lt;/code&gt; fails by default
&lt;/h2&gt;

&lt;p&gt;Here is the problem in its simplest form:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;bi_default_demo&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;bi_default_demo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;bi_default_demo&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'xt 850 action camera'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bi_default_demo&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'xt850'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Expected result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;Empty&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why does this fail?&lt;/p&gt;

&lt;p&gt;Because the document is indexed as two separate tokens, &lt;code&gt;xt&lt;/code&gt; and &lt;code&gt;850&lt;/code&gt;, while the query is a single token, &lt;code&gt;xt850&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;By default, Manticore does not assume that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;xt850&lt;/code&gt; should be split into &lt;code&gt;xt&lt;/code&gt; + &lt;code&gt;850&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;or &lt;code&gt;xt&lt;/code&gt; + &lt;code&gt;850&lt;/code&gt; should also be searchable as &lt;code&gt;xt850&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So this is not really a typo-tolerance problem or a phrase problem. It is a tokenization mismatch: the index sees two tokens, while the query provides one.&lt;/p&gt;

&lt;p&gt;That is the gap the newer bigram settings are designed to close. They let Manticore index selected adjacent token pairs in a form that can also match glued queries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why bigrams help here
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://manual.manticoresearch.com/dev/Creating_a_table/NLP_and_tokenization/Low-level_tokenization#bigram_index" rel="noopener noreferrer"&gt;bigram_index&lt;/a&gt; can help with both &lt;a href="https://dev.to/blog/how-to-speed-up-phrase-search-with-bigram-index/"&gt;phrase acceleration&lt;/a&gt; and model-name matching, and in this article we focus on the &lt;code&gt;xt 850&lt;/code&gt; vs &lt;code&gt;xt850&lt;/code&gt; problem.&lt;/p&gt;

&lt;p&gt;The key idea is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;detect adjacent token pairs that look like model names&lt;/li&gt;
&lt;li&gt;store those pairs in a glued form too&lt;/li&gt;
&lt;li&gt;let queries such as &lt;code&gt;xt850&lt;/code&gt;, &lt;code&gt;iphone5se&lt;/code&gt;, or &lt;code&gt;thinkpadx1&lt;/code&gt; hit the spaced text&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is where &lt;a href="https://manual.manticoresearch.com/dev/Creating_a_table/NLP_and_tokenization/Low-level_tokenization#bigram_delimiter" rel="noopener noreferrer"&gt;bigram_delimiter&lt;/a&gt; matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  A note about &lt;a href="https://manual.manticoresearch.com/dev/Creating_a_table/NLP_and_tokenization/Low-level_tokenization#bigram_delimiter" rel="noopener noreferrer"&gt;bigram_delimiter&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;bigram_index&lt;/code&gt; decides which adjacent pairs are eligible.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;bigram_delimiter&lt;/code&gt; decides how eligible bigrams are stored:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;true&lt;/code&gt;: internal delimited token only&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;none&lt;/code&gt;: glued token only, such as &lt;code&gt;galaxy24&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;both&lt;/code&gt;: both forms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The practical difference is easiest to understand from the query side:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;with &lt;code&gt;true&lt;/code&gt;, Manticore keeps the internal bigram form used for phrase optimization, but it does not keep the glued user-facing form, so a query like &lt;code&gt;xt850&lt;/code&gt; will not match &lt;code&gt;xt 850&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;with &lt;code&gt;none&lt;/code&gt;, Manticore keeps only the glued form, so &lt;code&gt;xt850&lt;/code&gt; can match &lt;code&gt;xt 850&lt;/code&gt;, but you are leaning entirely on the glued representation for those pairs&lt;/li&gt;
&lt;li&gt;with &lt;code&gt;both&lt;/code&gt;, Manticore keeps both the internal bigram representation and the glued form, so &lt;code&gt;xt850&lt;/code&gt; can match &lt;code&gt;xt 850&lt;/code&gt; without giving up ordinary phrase behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For this use case, &lt;code&gt;both&lt;/code&gt; is usually the safer default because it covers the user-visible problem directly while keeping behavior less surprising for normal phrase queries and mixed workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mode 1: &lt;code&gt;second_numeric&lt;/code&gt;
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="py"&gt;bigram_index&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;second_numeric&lt;/span&gt;
&lt;span class="py"&gt;bigram_delimiter&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;both&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This mode is aimed at model names where the second token is purely numeric.&lt;/p&gt;

&lt;p&gt;That is common in product catalogs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;xt 850&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;galaxy 24&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;playstation 5&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;pixel 8&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The idea is simple: users often search these as glued terms such as &lt;code&gt;xt850&lt;/code&gt;, &lt;code&gt;galaxy24&lt;/code&gt;, or &lt;code&gt;playstation5&lt;/code&gt;, even though the source text stores them with a space.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;second_numeric&lt;/code&gt; stores the pair only when the second token is ASCII digits only.&lt;/p&gt;

&lt;p&gt;Use it when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you have product generations and numbered models&lt;/li&gt;
&lt;li&gt;users often remove spaces in search&lt;/li&gt;
&lt;li&gt;the second token is usually just digits&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;bi_second_numeric_demo&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;bi_second_numeric_demo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;bigram_index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'second_numeric'&lt;/span&gt;
  &lt;span class="n"&gt;bigram_delimiter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'both'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;bi_second_numeric_demo&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'xt 850 action camera'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'galaxy 24 ultra'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'playstation 5 slim'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'iphone 5se case'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'canon eos 80d body'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'thinkpad x1 carbon'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then test the queries one by one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bi_second_numeric_demo&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'xt850'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;------+----------------------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;   &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;                &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;------+----------------------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;xt&lt;/span&gt; &lt;span class="mi"&gt;850&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="n"&gt;camera&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;------+----------------------+&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bi_second_numeric_demo&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'galaxy24'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;------+-----------------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;   &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;           &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;------+-----------------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;galaxy&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="n"&gt;ultra&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;------+-----------------+&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bi_second_numeric_demo&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'playstation5'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;------+--------------------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;   &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;              &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;------+--------------------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;playstation&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="n"&gt;slim&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;------+--------------------+&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bi_second_numeric_demo&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'iphone5se'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;Empty&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bi_second_numeric_demo&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'eos80d'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;Empty&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bi_second_numeric_demo&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'thinkpadx1'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;Empty&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That boundary is the whole point of the mode:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;24&lt;/code&gt; and &lt;code&gt;5&lt;/code&gt; qualify&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;5se&lt;/code&gt;, &lt;code&gt;80d&lt;/code&gt;, and &lt;code&gt;x1&lt;/code&gt; do not&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Mode 2: &lt;code&gt;second_has_digit&lt;/code&gt;
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="py"&gt;bigram_index&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;second_has_digit&lt;/span&gt;
&lt;span class="py"&gt;bigram_delimiter&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;both&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This mode is the more flexible sibling of &lt;code&gt;second_numeric&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;It stores the pair when the second token contains at least one ASCII digit. That makes it a much better fit for real product catalogs, where model identifiers are often mixed alphanumeric strings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;xt 850&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;iphone 5se&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;eos 80d&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;thinkpad x1&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use it when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;your model names mix letters and digits&lt;/li&gt;
&lt;li&gt;users frequently remove spaces in their searches&lt;/li&gt;
&lt;li&gt;you want catalog-friendly matching without indexing every pair in the table&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;bi_second_has_digit_demo&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;bi_second_has_digit_demo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;bigram_index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'second_has_digit'&lt;/span&gt;
  &lt;span class="n"&gt;bigram_delimiter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'both'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;bi_second_has_digit_demo&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'xt 850 action camera'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'galaxy 24 ultra'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'playstation 5 slim'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'iphone 5se case'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'canon eos 80d body'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'thinkpad x1 carbon'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'kindle paperwhite signature'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then test the queries one by one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bi_second_has_digit_demo&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'xt850'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;------+----------------------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;   &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;                &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;------+----------------------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;xt&lt;/span&gt; &lt;span class="mi"&gt;850&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="n"&gt;camera&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;------+----------------------+&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bi_second_has_digit_demo&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'galaxy24'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;------+-----------------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;   &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;           &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;------+-----------------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;galaxy&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="n"&gt;ultra&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;------+-----------------+&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bi_second_has_digit_demo&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'iphone5se'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;------+---------------------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;   &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;               &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;------+---------------------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;iphone&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="n"&gt;se&lt;/span&gt; &lt;span class="k"&gt;case&lt;/span&gt;     &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;------+---------------------+&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bi_second_has_digit_demo&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'eos80d'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;------+---------------------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;   &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;               &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;------+---------------------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;canon&lt;/span&gt; &lt;span class="n"&gt;eos&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;  &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;------+---------------------+&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bi_second_has_digit_demo&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'thinkpadx1'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;------+---------------------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;   &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;               &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;------+---------------------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;6&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;thinkpad&lt;/span&gt; &lt;span class="n"&gt;x1&lt;/span&gt; &lt;span class="n"&gt;carbon&lt;/span&gt;  &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;------+---------------------+&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bi_second_has_digit_demo&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'kindlesignature'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;Empty&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is often the better fit for mixed model identifiers, because real catalog data frequently includes forms like &lt;code&gt;5se&lt;/code&gt;, &lt;code&gt;80d&lt;/code&gt;, or &lt;code&gt;x1&lt;/code&gt; rather than only clean numeric suffixes like &lt;code&gt;24&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to choose between the two
&lt;/h2&gt;

&lt;p&gt;If your search problem is specifically "How do I make &lt;code&gt;xt850&lt;/code&gt; find &lt;code&gt;xt 850&lt;/code&gt;?", the practical rule is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;use &lt;code&gt;second_numeric&lt;/code&gt; when the second token is digits-only&lt;/li&gt;
&lt;li&gt;use &lt;code&gt;second_has_digit&lt;/code&gt; when the second token may be mixed, like &lt;code&gt;5se&lt;/code&gt;, &lt;code&gt;80d&lt;/code&gt;, or &lt;code&gt;x1&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is one practical caveat: this is compatible with other common text-processing settings in the straightforward case. &lt;code&gt;xt 850&lt;/code&gt; still matches &lt;code&gt;xt850&lt;/code&gt; with &lt;code&gt;morphology='stem_en'&lt;/code&gt; enabled and with a wordforms rule enabled.&lt;/p&gt;

&lt;p&gt;But that does not mean those settings rewrite the glued query for you. In tests, &lt;code&gt;iphones 5&lt;/code&gt; matched &lt;code&gt;iphones5&lt;/code&gt;, but not &lt;code&gt;iphone5&lt;/code&gt;, even with stemming or a wordforms rule mapping &lt;code&gt;iphones&lt;/code&gt; to &lt;code&gt;iphone&lt;/code&gt;. So the short version is: basic &lt;code&gt;xt 850&lt;/code&gt; vs &lt;code&gt;xt850&lt;/code&gt; matching stays compatible with morphology and wordforms, but if you rely on them, test the exact query shape you care about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final takeaway
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;xt850&lt;/code&gt; problem is not really about one product name. It is about a broader mismatch between how users type model names and how search engines tokenize them.&lt;/p&gt;

&lt;p&gt;Since version &lt;code&gt;23.0.0&lt;/code&gt;, Manticore gives you a built-in way to handle that mismatch with &lt;code&gt;bigram_delimiter&lt;/code&gt; plus the digit-aware &lt;code&gt;bigram_index&lt;/code&gt; modes, which is much cleaner than duplicating fields or inventing custom preprocessing pipelines.&lt;/p&gt;

&lt;p&gt;If your main problem is phrase-search performance rather than glued model-name matching, see &lt;a href="https://manticoresearch.com/blog/how-to-speed-up-phrase-search-with-bigram-index/" rel="noopener noreferrer"&gt;How to Speed Up Phrase Search with bigram_index&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>database</category>
      <category>nlp</category>
      <category>sql</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to Speed Up Phrase Search with bigram_index</title>
      <dc:creator>Sergey Nikolaev</dc:creator>
      <pubDate>Thu, 07 May 2026 08:50:15 +0000</pubDate>
      <link>https://dev.to/sanikolaev/how-to-speed-up-phrase-search-with-bigramindex-l4f</link>
      <guid>https://dev.to/sanikolaev/how-to-speed-up-phrase-search-with-bigramindex-l4f</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://manual.manticoresearch.com/Creating_a_table/NLP_and_tokenization/Low-level_tokenization#bigram_index" rel="noopener noreferrer"&gt;bigram_index&lt;/a&gt; can be used for several purposes, and in this article we focus specifically on phrase-search performance: on the 1M-document benchmark below, &lt;code&gt;bigram_index='all'&lt;/code&gt; improved QPS by about &lt;code&gt;2.9x&lt;/code&gt; and cut average phrase-query latency by about &lt;code&gt;3.2x&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If your main problem is matching &lt;code&gt;xt850&lt;/code&gt; against &lt;code&gt;xt 850&lt;/code&gt; rather than speeding up phrase search, see &lt;a href="https://manticoresearch.com/blog/how-to-make-searches-like-xt850-match-xt-850/" rel="noopener noreferrer"&gt;How to Make xt850 Match xt 850&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Phrase search can be expensive. Even when a query is short, the engine still has to verify ordering and adjacency, and that work gets more noticeable when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the individual words are common&lt;/li&gt;
&lt;li&gt;the dataset is large&lt;/li&gt;
&lt;li&gt;phrase queries are frequent in your workload&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is exactly what &lt;a href="https://manual.manticoresearch.com/Creating_a_table/NLP_and_tokenization/Low-level_tokenization#bigram_index" rel="noopener noreferrer"&gt;bigram_index&lt;/a&gt; is for.&lt;/p&gt;

&lt;h2&gt;
  
  
  What bigram indexing actually does
&lt;/h2&gt;

&lt;p&gt;Normally, a phrase like &lt;code&gt;"noise cancelling headphones"&lt;/code&gt; is handled as separate tokens that also need to appear in the right order and next to each other. Bigram indexing lets Manticore pre-store adjacent token pairs such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;noise cancelling&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;cancelling headphones&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That gives the engine a faster way to narrow down candidate documents during phrase matching.&lt;/p&gt;

&lt;p&gt;This article focuses specifically on phrase acceleration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Important caveat: bigrams work at tokenization level
&lt;/h2&gt;

&lt;p&gt;This is the part that is easy to miss when you only look at the happy-path speedup story.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;bigram_index&lt;/code&gt; works at the tokenization level only. It does not account for later transformations such as morphology, wordforms, or stopwords, and that can materially change phrase-matching expectations.&lt;/p&gt;

&lt;p&gt;The practical conclusion is simple: bigrams can be excellent for phrase speed, but if your index relies heavily on morphology, wordforms, or stopwords, test the actual phrase behavior you care about before rolling the setting out broadly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mode 1: Default behavior
&lt;/h2&gt;

&lt;p&gt;This is the baseline. No explicit bigram indexing is enabled, so no bigram posting lists are stored.&lt;/p&gt;

&lt;p&gt;Use it when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;phrase search is rare&lt;/li&gt;
&lt;li&gt;documents are short&lt;/li&gt;
&lt;li&gt;you want the leanest indexing path&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;bi_none_demo&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;bi_none_demo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;bi_none_demo&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'wireless noise cancelling headphones'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'noise cancelling microphone'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'wireless gaming headset'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bi_none_demo&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'"noise cancelling"'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the baseline behavior. The query matches the expected rows, but Manticore has no precomputed bigram posting lists to help resolve the phrase more efficiently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mode 2: &lt;code&gt;all&lt;/code&gt;
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="py"&gt;bigram_index&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;all&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the most aggressive phrase-acceleration mode. Every adjacent token pair gets indexed as a bigram.&lt;/p&gt;

&lt;p&gt;Use it when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;exact phrase search is a core feature&lt;/li&gt;
&lt;li&gt;phrase queries often include common words and produce many candidates&lt;/li&gt;
&lt;li&gt;you want the strongest phrase acceleration&lt;/li&gt;
&lt;li&gt;you do not want to tune a frequent-word list&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;bi_all_demo&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;bi_all_demo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;bigram_index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'all'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;bi_all_demo&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'lord of the rings trilogy'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'house of the dragon season 2'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'made for iphone charger'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bi_all_demo&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'"house of the dragon"'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bi_all_demo&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'"made for iphone"'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important point here is not different matches, but different indexing strategy: &lt;code&gt;all&lt;/code&gt; stores every adjacent pair, so phrase queries have the maximum amount of bigram help available at search time.&lt;/p&gt;

&lt;p&gt;The reason to choose &lt;code&gt;all&lt;/code&gt; is when phrase search becomes more expensive because many documents match the individual words, and Manticore then has to do more positional verification to confirm the exact phrase. &lt;code&gt;all&lt;/code&gt; helps by narrowing candidates earlier.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mode 3: &lt;code&gt;first_freq&lt;/code&gt;
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="py"&gt;bigram_index&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;first_freq&lt;/span&gt;
&lt;span class="py"&gt;bigram_freq_words&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;for, of, the, with&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This mode stores a pair only when the first token is in your frequent-word list.&lt;/p&gt;

&lt;p&gt;Use it when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;phrase search matters&lt;/li&gt;
&lt;li&gt;you want a lighter alternative to &lt;code&gt;all&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;many phrases in your data contain words that are genuinely frequent in your own corpus&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With the list above:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;for iphone&lt;/code&gt; is eligible&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;of the&lt;/code&gt; is eligible&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;the dragon&lt;/code&gt; is eligible&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;made for&lt;/code&gt; is not eligible&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;lord of&lt;/code&gt; is not eligible&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For production use, do not pick &lt;code&gt;bigram_freq_words&lt;/code&gt; from memory. Derive it from your own data. A practical way is to dump dictionary stats with &lt;a href="https://manual.manticoresearch.com/Miscellaneous_tools#indextool" rel="noopener noreferrer"&gt;indextool&lt;/a&gt; using &lt;code&gt;--dumpdict ... --stats&lt;/code&gt;, review the most frequent tokens, and then build a small &lt;code&gt;bigram_freq_words&lt;/code&gt; list from those results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;bi_first_freq_demo&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;bi_first_freq_demo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;bigram_index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'first_freq'&lt;/span&gt;
  &lt;span class="n"&gt;bigram_freq_words&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'for,of,the,with'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;bi_first_freq_demo&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'made for iphone charger'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'lord of the rings trilogy'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'house of the dragon season 2'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bi_first_freq_demo&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'"made for iphone"'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bi_first_freq_demo&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'"lord of the"'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The queries still return the expected rows. What changes is which pairs get indexed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;"made for iphone"&lt;/code&gt; benefits from &lt;code&gt;for iphone&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;"lord of the"&lt;/code&gt; benefits from &lt;code&gt;of the&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes &lt;code&gt;first_freq&lt;/code&gt; a lighter alternative to &lt;code&gt;all&lt;/code&gt; when many useful phrases involve common bridge words.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mode 4: &lt;code&gt;both_freq&lt;/code&gt;
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="py"&gt;bigram_index&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;both_freq&lt;/span&gt;
&lt;span class="py"&gt;bigram_freq_words&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;for, of, the, with&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the narrowest frequency-based mode. A pair is stored only when both tokens are in the frequent-word list.&lt;/p&gt;

&lt;p&gt;Use it when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you want the most conservative bigram footprint&lt;/li&gt;
&lt;li&gt;you mainly care about pairs built from words that are highly frequent in your corpus&lt;/li&gt;
&lt;li&gt;you are tuning a large corpus and do not want to index every adjacent pair&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With the same list:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;of the&lt;/code&gt; is eligible&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;for iphone&lt;/code&gt; is not eligible&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;the dragon&lt;/code&gt; is not eligible&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;bi_both_freq_demo&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;bi_both_freq_demo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;bigram_index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'both_freq'&lt;/span&gt;
  &lt;span class="n"&gt;bigram_freq_words&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'for,of,the,with'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;bi_both_freq_demo&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'lord of the rings trilogy'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'house of the dragon season 2'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'made for iphone charger'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bi_both_freq_demo&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'"lord of the"'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bi_both_freq_demo&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'"made for iphone"'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The queries still match, but the internal selectivity differs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;"lord of the"&lt;/code&gt; includes &lt;code&gt;of the&lt;/code&gt;, which &lt;code&gt;both_freq&lt;/code&gt; is willing to store&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;"made for iphone"&lt;/code&gt; includes &lt;code&gt;for iphone&lt;/code&gt;, which &lt;code&gt;first_freq&lt;/code&gt; would cover but &lt;code&gt;both_freq&lt;/code&gt; would not&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Which performance mode should you choose?
&lt;/h2&gt;

&lt;p&gt;The benchmark in this article shows that &lt;code&gt;all&lt;/code&gt; can deliver a strong speedup, but it is still just one benchmark on one workload.&lt;/p&gt;

&lt;p&gt;Manticore's own documentation says that for most use cases, &lt;code&gt;both_freq&lt;/code&gt; is the best mode. That is a sensible default because it aims for a more balanced trade-off between phrase acceleration and indexing cost.&lt;/p&gt;

&lt;p&gt;Use the modes like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;choose &lt;code&gt;both_freq&lt;/code&gt; as the default starting point for general phrase-search workloads&lt;/li&gt;
&lt;li&gt;choose &lt;code&gt;all&lt;/code&gt; when phrase search is especially important and you want the strongest acceleration, accepting higher indexing cost&lt;/li&gt;
&lt;li&gt;choose &lt;code&gt;first_freq&lt;/code&gt; when many useful phrases in your data involve common bridge words and you want something broader than &lt;code&gt;both_freq&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;choose the default behavior when phrase acceleration is not important&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Benchmark: does bigram indexing really speed up phrase search?
&lt;/h2&gt;

&lt;p&gt;Yes. In a simple local benchmark, the difference was easy to measure.&lt;/p&gt;

&lt;p&gt;I used &lt;code&gt;manticore-load&lt;/code&gt; to build two 1M-document tables against the same Manticore instance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one with no explicit &lt;code&gt;bigram_index&lt;/code&gt; setting&lt;/li&gt;
&lt;li&gt;one with &lt;code&gt;bigram_index='all'&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The documents were random 60-80 word texts, and the benchmark repeatedly ran random 2-word phrase queries.&lt;/p&gt;

&lt;p&gt;For clarity, both indexing and search were run with &lt;code&gt;--threads=1&lt;/code&gt;. Multi-threaded numbers would of course be higher, but single-thread runs make it easier to see what the feature changes on one CPU core.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bench_bigram_&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'"&amp;lt;text/2/2&amp;gt;"'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Benchmark setup
&lt;/h3&gt;

&lt;p&gt;Data load without bigrams:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;manticore-load &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--drop&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--wait&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--threads&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--batch-size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--total&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1000000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--init&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"CREATE TABLE bench_bigram_none_rand(title text)"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--load&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"INSERT INTO bench_bigram_none_rand(id,title) VALUES(&amp;lt;increment&amp;gt;,'&amp;lt;text/60/80&amp;gt;')"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Data load with all bigrams:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;manticore-load &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--drop&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--wait&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--threads&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--batch-size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--total&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1000000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--init&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"CREATE TABLE bench_bigram_all_rand(title text) bigram_index='all'"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--load&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"INSERT INTO bench_bigram_all_rand(id,title) VALUES(&amp;lt;increment&amp;gt;,'&amp;lt;text/60/80&amp;gt;')"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Search benchmark without bigrams:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;manticore-load &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--threads&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--total&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--load&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"SELECT COUNT(*) FROM bench_bigram_none_rand WHERE MATCH('&lt;/span&gt;&lt;span class="se"&gt;\\\"&lt;/span&gt;&lt;span class="s2"&gt;&amp;lt;text/2/2&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\\\"&lt;/span&gt;&lt;span class="s2"&gt;')"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Search benchmark with all bigrams:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;manticore-load &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--threads&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--total&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--load&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"SELECT COUNT(*) FROM bench_bigram_all_rand WHERE MATCH('&lt;/span&gt;&lt;span class="se"&gt;\\\"&lt;/span&gt;&lt;span class="s2"&gt;&amp;lt;text/2/2&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\\\"&lt;/span&gt;&lt;span class="s2"&gt;')"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What I observed
&lt;/h3&gt;

&lt;p&gt;On this local run:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Table&lt;/th&gt;
&lt;th&gt;QPS&lt;/th&gt;
&lt;th&gt;Avg latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;bench_bigram_none_rand&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;755&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;1.3 ms&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;bench_bigram_all_rand&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;2175&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0.4 ms&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That is roughly a &lt;code&gt;2.9x&lt;/code&gt; improvement in QPS and about a &lt;code&gt;3.2x&lt;/code&gt; improvement in average latency on the same 1M-document workload.&lt;/p&gt;

&lt;p&gt;Indexing was slower with &lt;code&gt;bigram_index='all'&lt;/code&gt;, which is expected:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;without bigrams: about &lt;code&gt;45k docs/sec&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;with &lt;code&gt;all&lt;/code&gt;: about &lt;code&gt;17k docs/sec&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That trade-off is exactly why multiple modes exist.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final takeaway
&lt;/h2&gt;

&lt;p&gt;If your main problem is phrase-search performance, treat &lt;code&gt;bigram_index&lt;/code&gt; first and foremost as an acceleration feature.&lt;/p&gt;

&lt;p&gt;For most real workloads, start with &lt;code&gt;both_freq&lt;/code&gt; and measure. Move to &lt;code&gt;all&lt;/code&gt; if you need a stronger effect and can afford the extra indexing cost. Consider &lt;code&gt;first_freq&lt;/code&gt; when your phrase workload is heavily shaped by common bridge words.&lt;/p&gt;

</description>
      <category>database</category>
      <category>nlp</category>
      <category>performance</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Build a Searchable Catalog with Filters, Facets, and Semantic Search</title>
      <dc:creator>Sergey Nikolaev</dc:creator>
      <pubDate>Wed, 06 May 2026 06:17:07 +0000</pubDate>
      <link>https://dev.to/sanikolaev/build-a-searchable-catalog-with-filters-facets-and-semantic-search-24bm</link>
      <guid>https://dev.to/sanikolaev/build-a-searchable-catalog-with-filters-facets-and-semantic-search-24bm</guid>
      <description>&lt;p&gt;A search box is easy. A searchable catalog that keeps being useful after the first query is the harder part.&lt;/p&gt;

&lt;p&gt;That is the problem this demo takes on. It uses a small board-game catalog, but the shape of the problem is familiar: users type something half-remembered, misspell it, narrow by constraints, keep browsing, open a result, then want "more like this" without starting over. If your product has that flow, most of the work is not the UI polish. It is getting the search behavior right without turning the stack into a science project.&lt;/p&gt;

&lt;p&gt;In this article, we build a searchable catalog with autocomplete, typo tolerance, filters, facets, deep pagination, semantic search, and similar-item recommendations.&lt;/p&gt;

&lt;p&gt;You can try the hosted version first:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://catalog.manticoresearch.com" rel="noopener noreferrer"&gt;https://catalog.manticoresearch.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffn97e2bt8zs5ckipsuqm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffn97e2bt8zs5ckipsuqm.png" alt=" " width="800" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The app itself is implemented in PHP, but that is not really the story here. The interesting part is how little ceremony you need to get from a basic query box to something that already feels like a working catalog: search, filters, facets, and similar-item discovery all show up quickly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Run it locally
&lt;/h2&gt;

&lt;p&gt;To run the same demo locally, you only need PHP 8.1+, Composer, and Docker (or any other way to run Manticore).&lt;/p&gt;

&lt;p&gt;In this setup, Manticore is the search engine behind the catalog: it handles indexing, filtering, faceting, and semantic retrieval. The repo already includes a Docker setup for it, so the quickest way to get the demo running is to clone the repo and start Manticore from the project root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/manticoresoftware/php-catalog-demo
&lt;span class="nb"&gt;cd &lt;/span&gt;php-catalog-demo
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;docker compose ps&lt;/code&gt; should show the container as running.&lt;/p&gt;

&lt;p&gt;Inside the cloned repo, create the app environment file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cp &lt;/span&gt;app/.env.example app/.env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a local run, the important part is just how the app reaches Manticore:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MANTICORE_HOST=127.0.0.1
MANTICORE_PORT=9308
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;app
composer &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The demo reads those settings and creates a Manticore client:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$settings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;require&lt;/span&gt; &lt;span class="nv"&gt;$root&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="s1"&gt;'/config/settings.php'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;$client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="s1"&gt;'host'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$settings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'manticore'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s1"&gt;'host'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="s1"&gt;'port'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$settings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'manticore'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s1"&gt;'port'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="s1"&gt;'transport'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Http'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then load the demo dataset:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;php bin/bootstrap-demo.php
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That command recreates the demo table and imports the starter catalog, so you begin from a known state instead of debugging old data.&lt;/p&gt;

&lt;p&gt;Start the app:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;php &lt;span class="nt"&gt;-S&lt;/span&gt; localhost:8081 &lt;span class="nt"&gt;-t&lt;/span&gt; public
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;code&gt;http://localhost:8081/&lt;/code&gt; and you have a working catalog to search.&lt;/p&gt;

&lt;p&gt;Not glamorous. Still worth it. A lot of search demos lose people before the first query because setup sprawls. This one does not need much.&lt;/p&gt;

&lt;h2&gt;
  
  
  What makes the app feel usable
&lt;/h2&gt;

&lt;p&gt;The part I care about most is not that the demo returns results. Plenty of demos do that. It is that the search flow holds together as users get more specific.&lt;/p&gt;

&lt;h3&gt;
  
  
  Start with autocomplete
&lt;/h3&gt;

&lt;p&gt;People usually begin with fragments. Sometimes they remember the exact game title. Often they do not.&lt;/p&gt;

&lt;p&gt;So the first layer is autocomplete:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'body'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'query'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$term&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'table'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;tableName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'options'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'limit'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$limit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'force_bigrams'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="nv"&gt;$suggestions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;autocomplete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$payload&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using &lt;code&gt;force_bigrams&lt;/code&gt; here helps tighten typo-tolerant matching for short or slightly wrong input, which is exactly where autocomplete can otherwise get mushy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fla04xej74ti60i55veya.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fla04xej74ti60i55veya.gif" alt=" " width="600" height="407"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is a small feature, but it changes the feel of the app immediately. Users stop guessing what your catalog calls things.&lt;/p&gt;

&lt;h3&gt;
  
  
  Make the first results page forgiving
&lt;/h3&gt;

&lt;p&gt;Once the query is submitted, the first page needs to be useful even when the spelling is off by a bit.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$search&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;setTable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;tableName&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$limit&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$query&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$search&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$query&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$fuzzy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$search&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'fuzzy'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'force_bigrams'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$search&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'*'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fuzzy mode is doing plain practical work here: recovering close matches when users do not type the title exactly right.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4tuup98snh18vufjy3tf.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4tuup98snh18vufjy3tf.gif" alt=" " width="720" height="488"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want the lower-level details, see &lt;a href="https://manual.manticoresearch.com/Searching/Spell_correction#Fuzzy-Search" rel="noopener noreferrer"&gt;Spell correction and fuzzy search&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Let users narrow without rewriting
&lt;/h3&gt;

&lt;p&gt;This is where many search interfaces get annoying. The query is close enough, but the result set is still too broad, so now the user has to reformulate it from scratch.&lt;/p&gt;

&lt;p&gt;Better to let them narrow in place.&lt;/p&gt;

&lt;p&gt;Range filters handle constraints like price, player count, play time, and release year. Facets expose the shape of the current result set so users can click into categories or tags instead of thinking up a more precise sentence.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$attributeFilters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'price_min'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$priceMin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'price_max'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$priceMax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'play_time_min'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$playTimeMin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'play_time_max'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$playTimeMax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'player_count_min'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$playerCountMin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'player_count_max'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$playerCountMax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'release_year_min'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$yearMin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'release_year_max'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$yearMax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$categoryIds&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$search&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'category_id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'in'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$categoryIds&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$tagIds&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$search&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'tag_id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'in'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$tagIds&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;applyNumericFilters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$search&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$attributeFilters&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nv"&gt;$search&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;facet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'category_id'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;facet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'tag_id'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3w1n3fpnpafz5vrso6mr.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3w1n3fpnpafz5vrso6mr.gif" alt=" " width="720" height="488"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That combination matters more than it may look on paper. In practice, this is where the catalog starts feeling easy to use: a broad query can shrink fast once you click into a category or tag, without losing the original search intent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Keep deep pagination stable
&lt;/h3&gt;

&lt;p&gt;If people browse further, offset pagination starts showing its age. Data changes between requests, offsets get larger, and eventually "show more" becomes less trustworthy than it should be.&lt;/p&gt;

&lt;p&gt;This demo uses scroll tokens instead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Page 1 starts a fresh scroll session; next pages continue with returned token.&lt;/span&gt;
&lt;span class="nv"&gt;$effectiveScrollToken&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$page&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="nv"&gt;$scrollToken&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nv"&gt;$search&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'scroll'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$effectiveScrollToken&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nv"&gt;$resultSet&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ResultSet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'body'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$body&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="nv"&gt;$nextScroll&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$resultSet&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getScroll&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nv"&gt;$hasMore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$nextScroll&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;$nextScroll&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That gives the app a much better foundation for deep pagination: each request continues from a returned token rather than recomputing larger and larger offsets.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7xmopdkplhlf2f6nc9gk.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7xmopdkplhlf2f6nc9gk.gif" alt=" " width="800" height="527"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Operationally, this is one of those choices users never notice when it works and definitely notice when it does not. More on the mechanism here: &lt;a href="https://manticoresearch.com/blog/pagination/#scroll-based-pagination" rel="noopener noreferrer"&gt;Scroll-Based Pagination&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Add semantic retrieval where keywords fail
&lt;/h3&gt;

&lt;p&gt;Keyword search gets you far. It does not solve everything.&lt;/p&gt;

&lt;p&gt;Sometimes users describe something in roughly the right language, but not in the same words your catalog uses. That is where hybrid search earns its keep.&lt;/p&gt;

&lt;h4&gt;
  
  
  Use hybrid search on the results page
&lt;/h4&gt;

&lt;p&gt;In this demo, one request includes both a lexical &lt;code&gt;query&lt;/code&gt; block and a semantic &lt;code&gt;knn&lt;/code&gt; block, then combines them with reciprocal rank fusion via &lt;code&gt;options.fusion_method = rrf&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'query'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'bool'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'must'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s1"&gt;'query_string'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'query'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$query&lt;/span&gt;&lt;span class="p"&gt;]]]]],&lt;/span&gt;
    &lt;span class="s1"&gt;'knn'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'field'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'description_vector'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'query'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="s1"&gt;'options'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'fusion_method'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'rrf'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="s1"&gt;'limit'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$limit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The vector field uses auto-embeddings, so the app does not have to generate query vectors on its own:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="s1"&gt;'description_vector'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'type'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'float_vector'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'options'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'MODEL_NAME'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'sentence-transformers/all-MiniLM-L6-v2'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'FROM'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'description'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;],&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because the &lt;code&gt;knn&lt;/code&gt; block names the vector field directly (&lt;code&gt;'field' =&amp;gt; 'description_vector'&lt;/code&gt;), Manticore can embed the query text automatically for KNN search.&lt;/p&gt;

&lt;p&gt;That keeps the application logic simpler than many teams expect when they first hear "semantic search." It also lets the results page stay in one flow instead of bolting a separate semantic experience onto the side.&lt;/p&gt;

&lt;h4&gt;
  
  
  Use similar-item discovery on the detail page
&lt;/h4&gt;

&lt;p&gt;The same vector field does a different job on the item page: "show me similar games" without forcing the user to invent another query. This part uses KNN directly against the current item.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$search&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nv"&gt;$search&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;setTable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;tableName&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;knn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'description_vector'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$source&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getId&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;SIMILAR_KNN_LIMIT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;notFilter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'in'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;$source&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getId&lt;/span&gt;&lt;span class="p"&gt;()])&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;SIMILAR_RESULT_LIMIT&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nv"&gt;$resultSet&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$search&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nv"&gt;$hits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;formatResultSet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$resultSet&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="s1"&gt;'hits'&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;array_slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$hits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;SIMILAR_RESULT_LIMIT&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1t1nlbnhbgd3f8es907n.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1t1nlbnhbgd3f8es907n.gif" alt=" " width="720" height="488"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That is where search stops being a utility and starts helping discovery. On a real detail page, this is the part that makes it easy to keep exploring instead of bouncing back to the search box.&lt;/p&gt;

&lt;p&gt;For reference: &lt;a href="https://manual.manticoresearch.com/Searching/KNN#Auto-Embeddings-(Recommended)" rel="noopener noreferrer"&gt;Auto Embeddings&lt;/a&gt; and &lt;a href="https://manual.manticoresearch.com/Searching/KNN#KNN-vector-search" rel="noopener noreferrer"&gt;KNN Search&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keeping writes and search results in sync
&lt;/h2&gt;

&lt;p&gt;A demo app is easy to trust when the data never changes. Real apps do not get that luxury.&lt;/p&gt;

&lt;p&gt;Here, the table stays in sync through the same application flow users and admins already touch: bootstrap for a clean baseline, batched imports from the admin UI, and update/delete actions for individual items.&lt;/p&gt;

&lt;p&gt;Prepared imports use the client's batch write methods:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;indexConfig&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'name'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$appendAsNewIds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;addDocuments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$batch&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;replaceDocuments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$batch&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For individual item changes, the app uses the table API directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$id&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;replaceDocument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$document&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;addDocument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$document&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;deleteDocument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F826omlznhffvf8qofq45.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F826omlznhffvf8qofq45.gif" alt=" " width="720" height="488"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And if you want to reset the experiment, the admin UI can drop imported records and return to the baseline dataset:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$baseMaxId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;resolveBaseMaxId&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;deleteDocuments&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="s1"&gt;'range'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'gt'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$baseMaxId&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No extra background machinery in the demo, no detached sync story to explain away. Just writes going where they need to go.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;What this demo shows is not just that Manticore can return results. It shows you can assemble a searchable catalog that feels complete: users can start loosely, narrow quickly with filters and facets, recover from imperfect queries, open an item, and keep discovering from there without the whole stack getting complicated.&lt;/p&gt;

&lt;p&gt;That is already enough to make search feel like part of the product, not a bolt-on feature.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>database</category>
      <category>showdev</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Why monitoring your search engine matters: Manticore ➡ Prometheus ➡ Grafana</title>
      <dc:creator>Sergey Nikolaev</dc:creator>
      <pubDate>Thu, 09 Apr 2026 04:13:56 +0000</pubDate>
      <link>https://dev.to/sanikolaev/why-monitoring-your-search-engine-matters-manticore-prometheus-grafana-51g3</link>
      <guid>https://dev.to/sanikolaev/why-monitoring-your-search-engine-matters-manticore-prometheus-grafana-51g3</guid>
      <description>&lt;p&gt;One of our users reached out recently with a familiar problem: search had suddenly become noticeably slower, even though nothing looked obviously broken.&lt;/p&gt;

&lt;p&gt;The service was up, no errors in the logs, CPU usage looked normal — yet users were starting to complain that results felt sluggish.&lt;/p&gt;

&lt;p&gt;This is how search problems usually show up in production. Not with a dramatic outage, but as a slow, creeping degradation. A little more traffic here, some extra indexing there, and before you know it, performance has slipped.&lt;/p&gt;

&lt;p&gt;By the time users notice, the real issue has often been building for hours. Without good visibility you’re left guessing: Is the system overloaded? Is one table eating up resources? Or is something else quietly going wrong?&lt;/p&gt;

&lt;p&gt;That’s why monitoring matters. It turns the vague “search feels slow” complaint into something you can actually diagnose and fix.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing the Manticore Grafana dashboard
&lt;/h2&gt;

&lt;p&gt;This is exactly what our new Manticore Grafana dashboard is built for.&lt;/p&gt;

&lt;p&gt;Instead of raw metrics, it gives you a clean, practical view of what really matters when running search in production. At a glance you can see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is the node healthy?&lt;/li&gt;
&lt;li&gt;How heavy is the current load?&lt;/li&gt;
&lt;li&gt;Are queries slowing down?&lt;/li&gt;
&lt;li&gt;Which tables are using the most resources?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s designed to help you move quickly from a user symptom to the actual root cause.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the stack works
&lt;/h2&gt;

&lt;p&gt;The setup is straightforward: &lt;strong&gt;Manticore → Prometheus → Grafana&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Manticore exposes rich internal metrics, Prometheus collects and stores them as time-series data, and Grafana visualizes everything with our pre-built dashboard — including &lt;strong&gt;21 production-ready alerts&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You can launch the entire stack with a single Docker command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;MANTICORE_TARGETS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;localhost:9308 &lt;span class="nt"&gt;-p&lt;/span&gt; 3000:3000 manticoresearch/dashboard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(Just change the &lt;code&gt;MANTICORE_TARGETS&lt;/code&gt; environment variable if your Manticore instance is running somewhere else.)&lt;/p&gt;

&lt;p&gt;If you prefer to set things up manually, grab these files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://raw.githubusercontent.com/manticoresoftware/grafana-dashboard/main/grafana/dashboards/manticore-dashboard.json" rel="noopener noreferrer"&gt;Dashboard JSON&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://raw.githubusercontent.com/manticoresoftware/grafana-dashboard/main/prometheus/rules/manticore-alerts.yml" rel="noopener noreferrer"&gt;Alert rules&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Example &lt;a href="https://raw.githubusercontent.com/manticoresoftware/grafana-dashboard/main/prometheus/prometheus.yml" rel="noopener noreferrer"&gt;Prometheus config&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Minimal Prometheus scrape config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;scrape_configs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;job_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;manticore"&lt;/span&gt;
    &lt;span class="na"&gt;static_configs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;targets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost:9308"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Exploring the dashboard
&lt;/h2&gt;

&lt;p&gt;The dashboard is laid out so you can follow a natural troubleshooting flow.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Health summary (start here)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmamj2w49gnkrc2xm6d5q.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmamj2w49gnkrc2xm6d5q.jpeg" alt=" " width="800" height="163"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Open the dashboard and look at the top row first. It gives you an instant picture of the node’s overall health.&lt;/p&gt;

&lt;p&gt;Key panels to watch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Health / Up&lt;/strong&gt; — Is Prometheus even able to scrape metrics?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Health / Crash indicator&lt;/strong&gt; — Any recent crashes?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workers Utilization %&lt;/strong&gt; + &lt;strong&gt;Load / Queue pressure&lt;/strong&gt; — These two together are gold. High utilization plus rising queue pressure is one of the clearest early signs the node is approaching saturation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;strong&gt;System Score&lt;/strong&gt; panel also gives you a quick overall health rating at a glance.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Query load and latency
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F01t1b1j8aknf8fqbfjy7.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F01t1b1j8aknf8fqbfjy7.jpeg" alt=" " width="800" height="376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuwxzmfukmebspz6bwxjz.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuwxzmfukmebspz6bwxjz.jpeg" alt=" " width="800" height="233"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, check what kind of workload the system is handling.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;QPS Total&lt;/strong&gt; shows overall traffic levels.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Search Latency (p95/p99)&lt;/strong&gt; is one of the most important panels — averages can hide problems, but percentiles show what your users are really experiencing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slowest Thread&lt;/strong&gt; helps spot expensive or stuck queries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Work Queue Length&lt;/strong&gt; and &lt;strong&gt;Worker Saturation&lt;/strong&gt; together tell you whether the node is keeping up or starting to fall behind.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Memory and resources
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikr650w81upzodhm2kll.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikr650w81upzodhm2kll.jpeg" alt=" " width="800" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This section is one of the most useful because memory pressure is a very common (and often hidden) cause of slowdowns in search engines. Instead of showing one vague number, the dashboard breaks it down so you can see exactly where the growth is happening.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Searchd RSS&lt;/strong&gt; and &lt;strong&gt;Buddy RSS&lt;/strong&gt; show the &lt;em&gt;total resident memory&lt;/em&gt; — how much physical RAM the main search daemon (searchd) and the Buddy helper process are actually using right now.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;Anon RSS&lt;/strong&gt; panels go one level deeper. “Anonymous” memory is the private, dynamic RAM allocated by Manticore itself (think heap, query caches, loaded data structures, temporary buffers — everything not backed by a file on disk). Unlike file-mapped memory (which the OS can page out or reclaim), anon memory is what usually puts real pressure on your system.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why show both RSS &lt;em&gt;and&lt;/em&gt; Anon RSS? Total RSS gives you the big picture, but Anon RSS tells you the story behind it. If total RSS is climbing but Anon RSS is stable, the growth might be harmless (e.g. more cached files). If Anon RSS is also rising fast, that’s usually a sign that Manticore’s own data structures or query activity are consuming more and more memory — exactly the kind of thing that leads to slower queries or even swapping.&lt;/p&gt;

&lt;p&gt;At the bottom you’ll also see several quick counters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Resources / FDs (searchd)&lt;/strong&gt; — current number of open file descriptors used by the search daemon. Manticore opens a lot of files for indexes (especially large real-time tables with many disk chunks). If this number gets too high you can hit the OS limit and start seeing “Too many open files” errors. You can raise the soft limit with the &lt;code&gt;max_open_files&lt;/code&gt; setting (see the &lt;a href="https://manual.manticoresearch.com/Server_settings/Searchd#max_open_files" rel="noopener noreferrer"&gt;Manticore docs on server settings&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;Active workers, table counts, and non-served tables — all quick signals that something might need attention.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Table-level insights
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3wvh6hf6or0d5s3ikjlj.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3wvh6hf6or0d5s3ikjlj.jpeg" alt=" " width="800" height="465"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now zoom in on the data itself.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Document counts per table&lt;/li&gt;
&lt;li&gt;Top 10 tables by RAM and disk usage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tables / Health&lt;/strong&gt; panel — this one is particularly valuable because it combines docs, RAM, disk, and state flags (locked/optimizing) in a single view.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Cluster state and history
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwfar2cbhgdxep5qsv8w8.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwfar2cbhgdxep5qsv8w8.jpeg" alt=" " width="800" height="91"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyyv61ev2ltjw4dkr49fj.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyyv61ev2ltjw4dkr49fj.jpeg" alt=" " width="800" height="523"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For distributed setups you get node status and sync state. The history section is excellent for answering the most important question during any incident: &lt;em&gt;what changed right before things slowed down?&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Remember the user who reached out because search had suddenly become noticeably slower?&lt;/p&gt;

&lt;p&gt;Once he enabled this dashboard, the problem became obvious almost immediately: workers were getting busier, queues were growing, and memory pressure was building — all before any obvious errors or crashes appeared. With clear visibility into what was actually happening inside the engine, he quickly pinpointed the root cause, made the right adjustments, and got performance back to the fast, reliable level his users expected.&lt;/p&gt;

&lt;p&gt;The real value of monitoring isn’t just seeing pretty graphs. It’s catching those creeping issues early — before they cost you money or customers.&lt;/p&gt;

&lt;p&gt;This dashboard removes that blind spot. It gives you the visibility you need to keep your search fast and reliable.&lt;/p&gt;

</description>
      <category>database</category>
      <category>devops</category>
      <category>monitoring</category>
      <category>performance</category>
    </item>
    <item>
      <title>Monitor Manticore Search in Grafana with One Command</title>
      <dc:creator>Sergey Nikolaev</dc:creator>
      <pubDate>Wed, 08 Apr 2026 02:27:24 +0000</pubDate>
      <link>https://dev.to/sanikolaev/monitor-manticore-search-in-grafana-with-one-command-d04</link>
      <guid>https://dev.to/sanikolaev/monitor-manticore-search-in-grafana-with-one-command-d04</guid>
      <description>&lt;p&gt;The most annoying kind of incident is when database doesn’t go down completely - it just gets slower.&lt;/p&gt;

&lt;p&gt;Users start noticing it right away. Complaints come in. Everything is technically still running, but clearly something is off.&lt;/p&gt;

&lt;p&gt;And that is usually the hardest part: not noticing the problem, but figuring out what is actually happening.&lt;/p&gt;

&lt;h2&gt;
  
  
  When everything looks fine, but search is still slow
&lt;/h2&gt;

&lt;p&gt;Let’s take a pretty normal scenario.&lt;/p&gt;

&lt;p&gt;Search starts slowing down. It is not crashing. It is not returning obvious errors. The service is up. From the outside, nothing looks broken in a dramatic way.&lt;/p&gt;

&lt;p&gt;But users can feel it.&lt;/p&gt;

&lt;p&gt;So you open your monitoring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CPU looks fine.&lt;/li&gt;
&lt;li&gt;Average latency does not look too bad.&lt;/li&gt;
&lt;li&gt;No obvious alerts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At first glance, nothing really explains the slowdown.&lt;/p&gt;

&lt;p&gt;So you keep digging...&lt;/p&gt;

&lt;p&gt;You check the queue. Nothing jumps out immediately.&lt;br&gt;
You look at worker usage. They are busy, but not in a way that tells you much on its own.&lt;br&gt;
You check the logs. Still nothing obvious.&lt;/p&gt;

&lt;p&gt;And after a while you get to that frustrating point where you realize you have already checked the usual things, and you still do not know where the problem is.&lt;/p&gt;

&lt;p&gt;Each metric, by itself, looks more or less okay. But together, the system is clearly degrading.&lt;/p&gt;

&lt;p&gt;So now you are no longer following a clear line of investigation. You are just checking everything you can think of and hoping the pattern shows up.&lt;/p&gt;

&lt;p&gt;Meanwhile, time is passing.&lt;/p&gt;
&lt;h2&gt;
  
  
  What was actually going on
&lt;/h2&gt;

&lt;p&gt;A couple of hours later, the picture finally starts to make sense.&lt;/p&gt;

&lt;p&gt;It turns out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the request queue has been slowly growing;&lt;/li&gt;
&lt;li&gt;workers have been sitting near 100% utilization;&lt;/li&gt;
&lt;li&gt;one heavy query keeps blocking execution from time to time;&lt;/li&gt;
&lt;li&gt;p99 latency is much worse than the average suggests;&lt;/li&gt;
&lt;li&gt;and one of the nodes restarted recently.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the signals were there all along.&lt;/p&gt;

&lt;p&gt;The problem was that they were scattered across different places, and it took too long to connect them into one clear story.&lt;/p&gt;
&lt;h2&gt;
  
  
  The solution: see the whole picture right away
&lt;/h2&gt;

&lt;p&gt;Instead of spending hours piecing all of that together by hand, it is much better to have one place where the important signals are already visible.&lt;/p&gt;

&lt;p&gt;That is why we put together a ready-to-use dashboard for Manticore Search that starts with a single Docker command. It comes with Grafana, Prometheus, a preconfigured data source, and built-in alerts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-p&lt;/span&gt; 3000:3000 manticoresearch/dashboard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Environment variables
&lt;/h3&gt;

&lt;p&gt;The container supports two &lt;a href="https://github.com/manticoresoftware/grafana-dashboard?tab=readme-ov-file#environment-variables" rel="noopener noreferrer"&gt;environment variables&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;MANTICORE_TARGETS&lt;/code&gt; - comma-separated list of Manticore Search instances (default: &lt;code&gt;localhost:9308&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;GF_AUTH_ENABLED&lt;/code&gt; - set to &lt;code&gt;true&lt;/code&gt; to enable Grafana login (by default, anonymous admin access is enabled)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-p&lt;/span&gt; 3000:3000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;MANTICORE_TARGETS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-host:9308 &lt;span class="se"&gt;\&lt;/span&gt;
  manticoresearch/dashboard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you monitor multiple nodes, pass them as a comma-separated list:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-p&lt;/span&gt; 3000:3000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;MANTICORE_TARGETS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;node1:9308,node2:9308,node3:9308 &lt;span class="se"&gt;\&lt;/span&gt;
  manticoresearch/dashboard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  If Manticore is running on a remote server
&lt;/h3&gt;

&lt;p&gt;By default, the dashboard expects Manticore at &lt;code&gt;localhost:9308&lt;/code&gt;. If your instance is running on a remote machine, the simplest option is SSH port forwarding:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh &lt;span class="nt"&gt;-L&lt;/span&gt; 9308:localhost:9308 user@your-server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, local connections to &lt;code&gt;localhost:9308&lt;/code&gt; will be forwarded to the remote server, so the dashboard can connect without additional changes.&lt;/p&gt;

&lt;p&gt;A minute later, you have a usable overview of your system.&lt;/p&gt;

&lt;p&gt;Not just a pile of graphs, but a dashboard that helps you quickly answer the questions you actually care about when something feels wrong.&lt;/p&gt;

&lt;p&gt;You can see queue growth, worker saturation, latency, process state, and query behavior in one place, instead of bouncing between tools and trying to stitch the story together in your head.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the dashboard shows
&lt;/h2&gt;

&lt;p&gt;The value here is not that there are a lot of panels. The value is that the panels answer the right questions quickly.&lt;/p&gt;

&lt;p&gt;The first place to look is the overall system view:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feh7qbvu0wv7o7hd6oo0d.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feh7qbvu0wv7o7hd6oo0d.jpeg" alt=" " width="800" height="163"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This gives you the basic picture right away:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;is the service up;&lt;/li&gt;
&lt;li&gt;has it restarted recently;&lt;/li&gt;
&lt;li&gt;is there queue pressure;&lt;/li&gt;
&lt;li&gt;are workers already under load.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If this row looks healthy, maybe the issue is narrow and local. If it does not, you know right away that the system is under real pressure.&lt;/p&gt;

&lt;p&gt;Then you move to load and query behavior:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8j7ttiqfehxif501w84g.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8j7ttiqfehxif501w84g.jpeg" alt=" " width="800" height="376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is where you can quickly see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;whether work is starting to pile up;&lt;/li&gt;
&lt;li&gt;whether workers are saturated;&lt;/li&gt;
&lt;li&gt;whether latency is getting worse, especially p95 and p99;&lt;/li&gt;
&lt;li&gt;whether one slow thread is causing a disproportionate amount of trouble.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And if you need more context, you can drill down into the rest of the dashboard:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cluster state:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo3x0i86suqqt44ktchci.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo3x0i86suqqt44ktchci.jpeg" alt=" " width="800" height="91"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tables and data:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz5c5m9fp73wj14obwdab.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz5c5m9fp73wj14obwdab.jpeg" alt=" " width="800" height="465"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At that point, you are no longer looking at disconnected metrics. You are looking at the system as a whole.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;In the kind of situation that used to cost you a couple of hours just to understand, now you can usually spot the direction in a few minutes.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can see that the queue is growing.&lt;/li&gt;
&lt;li&gt;You can see that workers are pinned.&lt;/li&gt;
&lt;li&gt;You can see that p99 is climbing.&lt;/li&gt;
&lt;li&gt;You can see that one node restarted.&lt;/li&gt;
&lt;li&gt;You can see that one query is probably doing most of the damage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That does not mean the dashboard magically fixes the issue for you.&lt;/p&gt;

&lt;p&gt;What it does do is remove the slowest part of the whole process: figuring out where to look.&lt;/p&gt;

&lt;p&gt;And in practice, that is often the difference between spending two hours trying to understand the incident and spending five minutes getting to the real problem.&lt;/p&gt;

</description>
      <category>database</category>
      <category>monitoring</category>
      <category>performance</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Parallel chunk merging in Manticore Search</title>
      <dc:creator>Sergey Nikolaev</dc:creator>
      <pubDate>Tue, 07 Apr 2026 03:52:38 +0000</pubDate>
      <link>https://dev.to/sanikolaev/parallel-chunk-merging-in-manticore-search-47h2</link>
      <guid>https://dev.to/sanikolaev/parallel-chunk-merging-in-manticore-search-47h2</guid>
      <description>&lt;p&gt;Starting from &lt;strong&gt;Manticore Search 24.4.0&lt;/strong&gt;, RT table compaction has a more capable execution model. Instead of merging chunk pairs one-by-one in a serial flow, optimization now supports two important improvements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;disk chunk merges can run in parallel&lt;/li&gt;
&lt;li&gt;&lt;p&gt;each merge job can merge more than two chunks at once&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://manual.manticoresearch.com/Server_settings/Searchd#parallel_chunk_merges" rel="noopener noreferrer"&gt;parallel_chunk_merges&lt;/a&gt;: how many RT disk chunk merge jobs may run at the same time&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://manual.manticoresearch.com/Server_settings/Searchd#merge_chunks_per_job" rel="noopener noreferrer"&gt;merge_chunks_per_job&lt;/a&gt;: how many RT disk chunks a single job can merge in one pass&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The compaction docs were also updated to describe optimization as an &lt;strong&gt;N-way merge&lt;/strong&gt; handled by a &lt;strong&gt;background worker pool&lt;/strong&gt; rather than a single serial merge thread.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;For RT workloads, the interesting number is often not just how fast you can insert documents, but how long it takes until compaction catches up and the table returns to its target chunk count.&lt;/p&gt;

&lt;p&gt;That is especially noticeable when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you ingest data at a sustained rate&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;optimize_cutoff&lt;/code&gt; is low enough that merges kick in early&lt;/li&gt;
&lt;li&gt;you wait for compaction to finish before considering the load fully complete&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This matters most in two common cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you are doing an initial bulk upload into a real-time table and want the table not just searchable, but already compacted to its steady state before putting more pressure on it&lt;/li&gt;
&lt;li&gt;you regularly ingest large batches and want each batch to finish cleanly before the next one arrives&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The table is searchable before compaction finishes, but "fully searchable" and "fully optimized" are not the same thing. A higher chunk count can still matter if you care about keeping the table close to its target shape, limiting background merge work before the next ingest wave, or reducing the window where storage is busy with post-load compaction.&lt;/p&gt;

&lt;p&gt;To show the difference, we loaded &lt;strong&gt;10 million documents&lt;/strong&gt; into an RT table. Each document contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;id bigint&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;name text&lt;/code&gt; with generated text between 10 and 100 words&lt;/li&gt;
&lt;li&gt;&lt;code&gt;type int&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The table was created with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;bigint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;optimize_cutoff&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'16'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So the target was to compact the table back down to roughly 16 disk chunks.&lt;/p&gt;

&lt;p&gt;For the benchmark we used &lt;a href="https://dev.to/blog/manticore-load/"&gt;manticore-load&lt;/a&gt;, our load generation and benchmarking tool. It is useful for reproducing scenarios like this, stress-testing ingestion, and comparing configuration changes without building custom scripts every time.&lt;/p&gt;

&lt;p&gt;The data was loaded with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;manticore-load &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cache-gen-workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--drop&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--batch-size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--threads&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--total&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;10000000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--init&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"CREATE TABLE test(id bigint, name text, type int) optimize_cutoff='16'"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--load&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"INSERT INTO test(id,name,type) VALUES(&amp;lt;increment&amp;gt;,'&amp;lt;text/10/100&amp;gt;',&amp;lt;int/1/100&amp;gt;)"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--wait&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Before: one merge job, two chunks at a time
&lt;/h2&gt;

&lt;p&gt;With the old behavior forced explicitly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;mysql &lt;span class="nt"&gt;-P9306&lt;/span&gt; &lt;span class="nt"&gt;-h0&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"set global parallel_chunk_merges=1; set global merge_chunks_per_job=2"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;the run looked like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;merging started at &lt;strong&gt;14 seconds&lt;/strong&gt;, when about &lt;strong&gt;1.8M&lt;/strong&gt; documents had been inserted&lt;/li&gt;
&lt;li&gt;all &lt;strong&gt;10M&lt;/strong&gt; documents were loaded after &lt;strong&gt;1 minute 18 seconds&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;at that point the data was already fully searchable&lt;/li&gt;
&lt;li&gt;compaction kept running in the background until &lt;strong&gt;3 minutes 23 seconds&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At &lt;code&gt;01:18&lt;/code&gt;, the table still had more than 50 chunks. Near the end of loading the status looked like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;17:14:50  01:17     98%         133      128.4K   21%     5          53        1         4.22GB      9.9M
17:14:51  01:18     100%        131      310.9K   15%     1          53        1         4.27GB      10.0M
...
17:16:55  03:22     100%        0        49.4K    4%      1          17        1         4.27GB      10.0M
...
Total time:       03:23
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the classic pattern of a healthy ingest pipeline followed by a long merge tail.&lt;/p&gt;

&lt;h2&gt;
  
  
  After: parallel merges plus larger merge jobs
&lt;/h2&gt;

&lt;p&gt;With the new settings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;mysql &lt;span class="nt"&gt;-P9306&lt;/span&gt; &lt;span class="nt"&gt;-h0&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"set global parallel_chunk_merges=3; set global merge_chunks_per_job=5"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;the same workload finished much faster:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;merging again started at about &lt;strong&gt;14 seconds&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;all &lt;strong&gt;10M&lt;/strong&gt; documents were again loaded after about &lt;strong&gt;1 minute 18 seconds&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;full compaction finished after only &lt;strong&gt;1 minute 31 seconds&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The end of the run looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;17:19:22  01:17     99%         127      127.9K   28%     6          26        1         4.22GB      9.9M
17:19:23  01:18     100%        132      1883.8K  17%     1          23        1         4.25GB      10.0M
...
17:19:36  01:31     100%        0        110.2K   3%      1          17        1         4.25GB      10.0M
...
Total time:       01:31
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What changed in practice
&lt;/h2&gt;

&lt;p&gt;The ingest phase itself stayed roughly the same:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;old settings: &lt;strong&gt;1:18&lt;/strong&gt; to load all data&lt;/li&gt;
&lt;li&gt;new settings: &lt;strong&gt;1:18&lt;/strong&gt; to load all data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The big gain came from post-ingest compaction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;old settings: about &lt;strong&gt;2:05&lt;/strong&gt; of additional merge time after loading finished&lt;/li&gt;
&lt;li&gt;new settings: about &lt;strong&gt;0:13&lt;/strong&gt; of additional merge time after loading finished&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is roughly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;55% lower total time&lt;/strong&gt; overall, from &lt;strong&gt;3:23&lt;/strong&gt; down to &lt;strong&gt;1:31&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;about &lt;strong&gt;90% less merge tail&lt;/strong&gt; after the last document was inserted&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Chunk pressure during ingest was much lower too. Near the end of loading:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;old settings: &lt;strong&gt;53 chunks&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;new settings: &lt;strong&gt;23 chunks&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the improvement is not just that compaction finishes sooner. It also keeps the chunk count under control much more aggressively while data is still being inserted.&lt;/p&gt;

&lt;h2&gt;
  
  
  What about the new defaults?
&lt;/h2&gt;

&lt;p&gt;On this server, with the new default settings and no explicit tuning at all, the same workload finished in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Total time:       01:57
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That already cuts the old &lt;code&gt;03:23&lt;/code&gt; result substantially, while still leaving room for additional tuning with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;parallel_chunk_merges&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;merge_chunks_per_job&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, the new defaults already improve the out-of-the-box experience, and systems with enough I/O headroom can push compaction even further by increasing both settings carefully.&lt;/p&gt;

&lt;h2&gt;
  
  
  Broader benchmark results: row-wise and columnar storage
&lt;/h2&gt;

&lt;p&gt;The 10M-document example above shows the mechanics clearly, but the larger picture is even more interesting. In a wider test matrix we measured the combined &lt;strong&gt;load + optimize&lt;/strong&gt; time for both row-wise and columnar storage across multiple values of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;parallel_chunk_merges&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;merge_chunks_per_job&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The headline result is that, in some cases, tuning these settings can reduce total load + optimize time by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;up to &lt;strong&gt;4.6x&lt;/strong&gt; for &lt;strong&gt;row-wise&lt;/strong&gt; storage&lt;/li&gt;
&lt;li&gt;up to &lt;strong&gt;6.8x&lt;/strong&gt; for &lt;strong&gt;columnar&lt;/strong&gt; storage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is the best-vs-worst picture from that test set:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Storage&lt;/th&gt;
&lt;th&gt;Best settings&lt;/th&gt;
&lt;th&gt;Best time&lt;/th&gt;
&lt;th&gt;Slowest settings&lt;/th&gt;
&lt;th&gt;Slowest time&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Row-wise&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;parallel_chunk_merges=4&lt;/code&gt;, &lt;code&gt;merge_chunks_per_job=5&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;14:35&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;parallel_chunk_merges=1&lt;/code&gt;, &lt;code&gt;merge_chunks_per_job=2&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;67:15&lt;/td&gt;
&lt;td&gt;4.61x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Columnar&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;parallel_chunk_merges=4&lt;/code&gt;, &lt;code&gt;merge_chunks_per_job=5&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;15:10&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;parallel_chunk_merges=1&lt;/code&gt;, &lt;code&gt;merge_chunks_per_job=2&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;99:14&lt;/td&gt;
&lt;td&gt;6.80x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;There is also a useful tuning pattern in the full results:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the best runs for both storage modes clustered around &lt;code&gt;parallel_chunk_merges=4..5&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;the best runs also clustered around &lt;code&gt;merge_chunks_per_job=4..5&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;the slowest results were consistently at &lt;code&gt;parallel_chunk_merges=1&lt;/code&gt; with &lt;code&gt;merge_chunks_per_job=2&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, the old serial two-chunk pattern is not just a little slower. On large workloads it can become dramatically slower, especially with columnar storage.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to think about the two settings
&lt;/h2&gt;

&lt;p&gt;The new docs describe two separate levers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;parallel_chunk_merges&lt;/code&gt; increases how many merge jobs can run at once&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;merge_chunks_per_job&lt;/code&gt; increases how many chunks each job can consume&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lower &lt;code&gt;merge_chunks_per_job&lt;/code&gt; values make it easier to schedule more jobs in parallel because each job consumes fewer chunks from the available pool. If a table has many chunks waiting to be compacted, smaller jobs leave more independent chunks available for other workers, so the scheduler can keep several merges active at once. Higher values reduce the total number of merge steps, but each job becomes heavier and grabs a larger portion of the available chunks, which can leave less room for concurrent jobs.&lt;/p&gt;

&lt;p&gt;The right balance depends on your storage and workload, but the benchmark above shows that combining both approaches can dramatically reduce the time spent waiting for RT chunk compaction to finish.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaway
&lt;/h2&gt;

&lt;p&gt;If your RT workloads spend too long waiting for chunk compaction after bulk inserts, the new parallel merge model changes that equation significantly.&lt;/p&gt;

&lt;p&gt;On this 10M-document test with &lt;code&gt;optimize_cutoff=16&lt;/code&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Searchable at&lt;/th&gt;
&lt;th&gt;Fully optimized at&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Old settings: &lt;code&gt;parallel_chunk_merges=1&lt;/code&gt;, &lt;code&gt;merge_chunks_per_job=2&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;1:18&lt;/td&gt;
&lt;td&gt;3:23&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;New defaults&lt;/td&gt;
&lt;td&gt;1:18&lt;/td&gt;
&lt;td&gt;1:57&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tuned new settings: &lt;code&gt;parallel_chunk_merges=3&lt;/code&gt;, &lt;code&gt;merge_chunks_per_job=5&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;1:18&lt;/td&gt;
&lt;td&gt;1:31&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;the time until all data became searchable stayed the same&lt;/li&gt;
&lt;li&gt;the time until chunk compaction completed dropped from &lt;strong&gt;3:23&lt;/strong&gt; to &lt;strong&gt;1:31&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;even the new defaults reduced the total time to &lt;strong&gt;1:57&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is exactly the kind of improvement that matters for operational RT indexing. The data is searchable as soon as it is loaded, and that point stayed about the same in both runs. The difference is what happens after that: how long the server keeps spending time compacting chunks in the background before the table returns to its target shape. If your workflow depends on the table becoming compact again before the next heavy ingest, before a maintenance window closes, or before you hand the system over to a search workload that should run with fewer chunks and less background merge pressure, the improvement is substantial.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>database</category>
      <category>news</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
