Why I Added an LLM Parser on Top of Vector Search (And What It Changed)

#vectorsearch #ai #woocommerce #semmantic

I thought vector search was enough.

I'd built Queryra — an AI search plugin for WooCommerce and Shopify. Replaced keyword matching with semantic embeddings. Customers could search "something warm for winter" and find sweaters, fleece jackets, blankets. Zero results became rare. It worked.

Then someone searched: "wireless headphones under $80, not Beats"

The vector search returned wireless headphones. Some were $200. Several were Beats. The price cap and brand exclusion were completely invisible to the embedding model.

That's when I realized: vector search was layer one. I was missing layer two.

The Problem With Pure Vector Search

Embeddings are brilliant at one thing: encoding semantic similarity. "Sneakers" lands close to "trainers" and "running shoes" in vector space. "Gift for dad" finds garden tools, BBQ sets, and watches — even without those words in the query.

But a query like "laptop under $1000 for video editing, not Chromebook" contains two fundamentally different types of information:

Semantic intent — what the customer wants (a powerful laptop for video work)
Structural constraints — how to filter results (price cap, category exclusion)

Embeddings handle #1 well. They have no mechanism for #2.

You can't encode "under $1000" as a direction in vector space. "Not Chromebook" isn't a semantic concept — it's an instruction to the search system. Every vector-only implementation has this blind spot, and it gets worse as queries get more specific.

The customers most affected? Highest-intent buyers. The ones ready to purchase right now.

The Solution: LLM Parser as Layer Two

I added a query parser that runs before the vector search. Its job: decompose the query into structured components.

Here's the logic (simplified):

Input: "organic shampoo without sulfates under $25, best rated"

Parser output:
{
  "semantic_query": "organic shampoo",
  "price_max": 25,
  "attribute_exclude": ["sulfates"],
  "sort_by": "rating"
}

Each component then goes to the right system:

semantic_query → vector search (finds semantically relevant products)
price_max → database filter (hard cut at $25)
attribute_exclude → post-filter (removes sulfate-containing products)
sort_by → result reranking (surfaces highest-rated first)

The vector layer finds what the customer means. The parser layer applies what they said.

The Bypass Problem (Latency)

The parser adds ~700–800ms latency. For a simple query like "blue t-shirt", that's pure overhead — embeddings handle it fine alone.

So I added a pre-filter that routes queries before hitting the parser:

def should_parse(query: str) -> bool:
    # Price signals
    if re.search(r'under \$|below \$|\$\d+|budget|cheap|premium', query, re.I):
        return True
    # Exclusion signals  
    if re.search(r'\bnot\b|\bwithout\b|\bno\b|\bexclude\b', query, re.I):
        return True
    # Sorting signals
    if re.search(r'best rated|top rated|newest|cheapest|most popular', query, re.I):
        return True
    # Brand signals (capitalized words that aren't at sentence start)
    if re.search(r'(?<!^)(?<!\. )[A-Z][a-z]+(?:\s[A-Z][a-z]+)*', query):
        return True

    return False  # Simple query — go straight to vector search

Simple queries skip the parser entirely. Complex queries get full intent extraction. The routing is invisible to the user — they just get better results.

What Changed

Before the parser, queries with constraints returned random results within the right category. After:

Query	Before	After
"headphones under $80"	All headphones	Headphones ≤ $80 only
"not from BrandX"	Includes BrandX	BrandX excluded
"best rated coffee maker"	Random order	Sorted by rating
"organic, no sulfates"	All organic shampoos	Sulfate-free filtered

The first row of every table is identical — simple semantic queries work the same. Every other row shows the gap.

One Unexpected Benefit: Typo + Constraint Combinations

I expected the parser to help with structured queries. I didn't expect it to also fix a secondary problem: typos combined with constraints.

Vector search handles typos well on their own — "moisturiser" finds "moisturizer". But "moisturiser under $20 without pareban" (parabens misspelled) was tricky. The embedding similarity dropped on the misspelled exclusion.

The LLM parser handles both in one pass: corrects the typo, extracts the price constraint, identifies the exclusion. Combined robustness I didn't plan for.

The Tradeoff

This approach has a real cost: the parser uses an LLM API call on complex queries. That's not free. I use gpt-4.1-nano (cheapest option, identical quality to gpt-4o-mini for this use case, ~33% cheaper). With the bypass logic, only a fraction of queries hit the parser — but it's still a cost that scales with traffic.

For a self-hosted open-source setup, you'd replace the LLM call with a local model (Ollama + Mistral 7B works reasonably well for intent extraction). For a SaaS product, you build it into pricing.

Where This Goes Next

The parser currently extracts: price ranges, brand references, attribute filters, exclusions, sorting preferences, and basic negations.

Next on the list: multi-intent queries. "Something for the office and something for the gym" — two separate semantic searches, merged results. Vector search alone can't split the intent. Parser can.

If you're building ecommerce search and hitting the same wall — vector results that ignore everything after the first two meaningful words — this two-layer approach is worth the added complexity.

I wrote a longer non-technical version for store owners here: Why Vector Search Alone Isn't Enough for Ecommerce Stores

Happy to answer questions about the implementation in the comments.

Queryra is AI search for WooCommerce and Shopify. queryra.com