DEV Community

Cover image for Why Regex Fails at Google Taxonomy: Building a 98% Accurate RAG Agent
CatMap
CatMap

Posted on

Why Regex Fails at Google Taxonomy: Building a 98% Accurate RAG Agent

The Problem: "Is a 'Hot Dog' a Dog?" 🌭

In Google Merchant Center, categorization is everything. If you misclassify a product, your ads stop running.

Most feed tools use keyword matching (Regex).

  • Rule: If title contains "Dog" -> Category: Animals > Pets > Dogs
  • Input: "Hot Dog Costume"
  • Result: Animals > Pets > Dogs ❌ (Wrong!)

This is why 15-20% of products in large catalogs often sit in "Disapproved" purgatory.

The Solution: Retrieval Augmented Generation (RAG) 🧠

I built CatMap AI to solve this using Vectors, not Keywords.

1. The Architecture

Instead of rules, we convert the entire Google Product Taxonomy (5,500+ nodes) into a Vector Index using OpenAI's text-embedding-3-small.

When a product comes in ("Pallash Casual Women's Kurti"), we don't look for the word "Kurti". We look for the mathematical concept of the product in vector space.

2. The "Smart Retry" Pattern πŸ”„

Here is where it gets interesting. Standard Vector Search fails on cultural terms.

  • Input: Kurti
  • Vector Match: Generic Clothing (Confidence: Low)

To fix this, we implemented an Agentic Loop:

  1. Attempt 1: Standard Search. Result: Uncategorized.
  2. Trigger: Agent detects failure.
  3. Action: Agent calls an LLM (gpt-5-nano) to "expand" the query.
    • Prompt: "What is a Kurti? Give me synonyms."
    • Response: "Tunic, Blouse, Shirt".
  4. Attempt 2: Vector Search with "Tunic Blouse Shirt".
  5. Result: Apparel > Clothing > Shirts & Tops. βœ…

3. The Stress Test πŸ“‰

We ran this system against 2,000 real-world edge cases.

  • Coverage: 100% (Up from 85%).
  • Accuracy: 98.3%.
  • Time per Row: ~200ms.

Code Snippet (The Retry Logic)

// Simplified Logic
if (result.status === "Uncategorized") {
    const synonyms = await expandQuery(product.name); // AI Call
    const newContext = await VectorStore.search(synonyms);
    return categorizeWithContext(product, newContext);
}
Enter fullscreen mode Exit fullscreen mode

Conclusion

Regex is dead for categorization. Context-aware AI is the only way to handle the complexity of modern e-commerce catalogs.

If you want to test the API, I'm opening a Free Beta for developers. Link to CatMap AI

Follow me for more Engineering Deep Dives into AI Agents.

Top comments (0)