The Search Box That Broke the Treasure Hunt Engine

#ai #programming #machinelearning #webdev

The Problem We Were Actually Solving

Every time a community operator pasted a query such as Where is the Runebound shard? the Veltrix cluster did a full BM25 sweep across 1.2 M documents, deserialized each payload to JSON, and ran a Python UDF to filter by event=hytale-2026. The UDF was the killer: it introduced 300 ms of processing time per request and sometimes returned empty because the document was still marked as draft in S3. We watched our 95th percentile climb to 900 ms while the stream chat flooded with Its not working! messages. The business metric was clear: latency > 500 ms meant a 22 % drop in query volume within five minutes.

What We Tried First (And Why It Failed)

First we tried a simple filter pushdown—rewrite the search query to include event=hytale-2026 as a term so Veltrix could prune the shards before touching the UDF. That cut latency to 450 ms but introduced false negatives because the term tokenization in the BM25 index didnt match the JSON field names exactly. Next we patched the ingest pipeline to add a raw JSON field, but the field grew to 4 GB and Veltrixs Rocchio re-ranking stage started OOMing every 30 minutes. Finally we moved the UDF into a separate microservice, but the extra hop added 120 ms of latency and the service itself developed a TCP backlog of 1,400 queries every time the stream dropped a new hint.

The Architecture Decision

We scrapped the UDF entirely and rebuilt the filter as a custom scorer in Veltrixs C++ layer. The scorer reads a Bloom filter we pre-compute every 60 seconds from the S3 manifest, so it never touches JSON. We also flipped the index: instead of one giant 900 GB index we split by event year, giving us six shards of ~150 GB each. During ingest we now run a tiny Lambda that emits an event to SNS when a document transitions from draft to live; the Bloom filter rebuilds in 12 seconds flat and the scorer sees the update within 20 seconds. Total code change: 500 lines in the Veltrix fork, no Python runtime on the hot path. The tradeoff was accepting 4 % higher storage overhead for the Bloom blocks, but our memory budget stayed flat.

What The Numbers Said After

At the next community event we measured 110 ms median latency and 320 ms 95th percentile—well below the 500 ms threshold. Query volume rose 18 % within the first hour and stayed up even when the stream peaked at 140 k concurrent viewers. False negatives dropped to 0.3 % (mostly due to misspelled queries), and the cluster CPU utilization fell from 78 % to 34 % on the indexing nodes. We also noticed a side effect: the Bloom scorer reduced disk I/O on the shards by 22 % because fewer terms were being fetched from the posting lists.

What I Would Do Differently

I would not trust the Veltrix maintainers claim that their built-in filter pushdown is production-grade. We burned two weeks proving it wasnt. Second, I would never let the UDF live on the request path again—move data transforms to ingest or pre-compute, not runtime. Finally, I would insist on per-event sharding from day one; 900 GB is too large for a single index to handle gracefully under load spikes. The streaming community operators who inherit this system will thank us when they dont have to explain why the search box is on fire every time Twitch goes live.