<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: CatMap</title>
    <description>The latest articles on DEV Community by CatMap (@catmap).</description>
    <link>https://dev.to/catmap</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3646267%2F301d08d2-8551-4c4e-aef2-4ab206b382eb.png</url>
      <title>DEV Community: CatMap</title>
      <link>https://dev.to/catmap</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/catmap"/>
    <language>en</language>
    <item>
      <title>Why Regex Fails at Google Taxonomy: Building a 98% Accurate RAG Agent</title>
      <dc:creator>CatMap</dc:creator>
      <pubDate>Mon, 15 Dec 2025 06:42:49 +0000</pubDate>
      <link>https://dev.to/catmap/why-regex-fails-at-google-taxonomy-building-a-98-accurate-rag-agent-57d9</link>
      <guid>https://dev.to/catmap/why-regex-fails-at-google-taxonomy-building-a-98-accurate-rag-agent-57d9</guid>
      <description>&lt;h2&gt;
  
  
  The Problem: "Is a 'Hot Dog' a Dog?" 🌭
&lt;/h2&gt;

&lt;p&gt;In Google Merchant Center, categorization is everything. If you misclassify a product, your ads stop running.&lt;/p&gt;

&lt;p&gt;Most feed tools use &lt;strong&gt;keyword matching&lt;/strong&gt; (Regex).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Rule: &lt;code&gt;If title contains "Dog" -&amp;gt; Category: Animals &amp;gt; Pets &amp;gt; Dogs&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  Input: &lt;code&gt;"Hot Dog Costume"&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  Result: &lt;code&gt;Animals &amp;gt; Pets &amp;gt; Dogs&lt;/code&gt; ❌ (Wrong!)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why 15-20% of products in large catalogs often sit in "Disapproved" purgatory.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: Retrieval Augmented Generation (RAG) 🧠
&lt;/h2&gt;

&lt;p&gt;I built &lt;strong&gt;CatMap AI&lt;/strong&gt; to solve this using Vectors, not Keywords.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The Architecture
&lt;/h2&gt;

&lt;p&gt;Instead of rules, we convert the entire &lt;strong&gt;Google Product Taxonomy (5,500+ nodes)&lt;/strong&gt; into a Vector Index using OpenAI's &lt;code&gt;text-embedding-3-small&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;When a product comes in (&lt;code&gt;"Pallash Casual Women's Kurti"&lt;/code&gt;), we don't look for the word "Kurti". We look for the &lt;strong&gt;mathematical concept&lt;/strong&gt; of the product in vector space.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The "Smart Retry" Pattern 🔄
&lt;/h2&gt;

&lt;p&gt;Here is where it gets interesting. Standard Vector Search fails on cultural terms.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Input: &lt;code&gt;Kurti&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  Vector Match: &lt;code&gt;Generic Clothing&lt;/code&gt; (Confidence: Low)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To fix this, we implemented an &lt;strong&gt;Agentic Loop&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Attempt 1:&lt;/strong&gt; Standard Search. Result: &lt;code&gt;Uncategorized&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Trigger:&lt;/strong&gt; Agent detects failure.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Action:&lt;/strong&gt; Agent calls an LLM (&lt;code&gt;gpt-5-nano&lt;/code&gt;) to "expand" the query.

&lt;ul&gt;
&lt;li&gt;  &lt;em&gt;Prompt:&lt;/em&gt; "What is a Kurti? Give me synonyms."&lt;/li&gt;
&lt;li&gt;  &lt;em&gt;Response:&lt;/em&gt; "Tunic, Blouse, Shirt".&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Attempt 2:&lt;/strong&gt; Vector Search with &lt;code&gt;"Tunic Blouse Shirt"&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Result:&lt;/strong&gt; &lt;code&gt;Apparel &amp;gt; Clothing &amp;gt; Shirts &amp;amp; Tops&lt;/code&gt;. ✅&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  3. The Stress Test 📉
&lt;/h2&gt;

&lt;p&gt;We ran this system against &lt;strong&gt;2,000 real-world edge cases&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Coverage:&lt;/strong&gt; 100% (Up from 85%).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Accuracy:&lt;/strong&gt; 98.3%.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Time per Row:&lt;/strong&gt; ~200ms.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Code Snippet (The Retry Logic)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Simplified Logic&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Uncategorized&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;synonyms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;expandQuery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// AI Call&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;newContext&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;VectorStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;synonyms&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;categorizeWithContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;product&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;newContext&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Regex is dead for categorization. Context-aware AI is the only way to handle the complexity of modern e-commerce catalogs.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you want to test the API, I'm opening a Free Beta for developers. &lt;a href="https://catmap.dev/" rel="noopener noreferrer"&gt;Link to CatMap AI&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow me for more Engineering Deep Dives into AI Agents.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>ai</category>
      <category>vectorsearch</category>
      <category>ecommerce</category>
    </item>
    <item>
      <title>How I processed 2,000 concurrent OpenAI requests using Node.js Streams (Zero 429 Errors)</title>
      <dc:creator>CatMap</dc:creator>
      <pubDate>Thu, 04 Dec 2025 13:19:26 +0000</pubDate>
      <link>https://dev.to/catmap/how-i-processed-2000-concurrent-openai-requests-using-nodejs-streams-zero-429-errors-341a</link>
      <guid>https://dev.to/catmap/how-i-processed-2000-concurrent-openai-requests-using-nodejs-streams-zero-429-errors-341a</guid>
      <description>&lt;p&gt;I recently built a backend engine to solve a boring but massive problem in e-commerce: &lt;strong&gt;Taxonomy Mapping.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Watch the demo test:&lt;/strong&gt;&lt;br&gt;


  &lt;iframe src="https://www.youtube.com/embed/ygqJHb4Z5Xc"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;The goal was simple: Take a messy CSV of 20,000 products and map them to the official Google Taxonomy IDs using an LLM.&lt;/p&gt;

&lt;p&gt;The problem? &lt;strong&gt;Rate Limits.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you try to &lt;code&gt;Promise.all()&lt;/code&gt; 2,000 requests to OpenAI, three things happen:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Memory Spike:&lt;/strong&gt; Loading a 15MB+ CSV into a variable kills the Node process.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;429 Errors:&lt;/strong&gt; OpenAI bans you for hitting the Request Per Minute (RPM) limit instantly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error Collapse:&lt;/strong&gt; &lt;code&gt;Promise.all&lt;/code&gt; fails fast if one request fails, ruining the whole batch.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here is the architecture I built to process &lt;strong&gt;450+ requests per minute&lt;/strong&gt; reliably using Node.js Streams and Bottleneck.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The Memory Problem (Streams vs. Arrays)
&lt;/h2&gt;

&lt;p&gt;Loading a large CSV into memory is a rookie mistake. I switched to &lt;code&gt;fs.createReadStream&lt;/code&gt; combined with &lt;code&gt;csv-parser&lt;/code&gt;. This allows us to pipe the data row-by-row, keeping memory usage almost flat regardless of file size.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;javascript
const fs = require('fs');
const csv = require('csv-parser');

const stream = fs.createReadStream(inputFilePath)
  .pipe(csv())
  .on("data", (row) =&amp;gt; {
     // Push job to the limiter (see next section)
     // RAM usage stays constant even with 500MB files
     limiter.schedule(() =&amp;gt; processRow(row));
  });
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  2. The Rate Limit Problem (Bottleneck)
&lt;/h2&gt;

&lt;p&gt;This was the hardest part. OpenAI's Tier 1 limits are strict (Requests Per Day and Requests Per Minute). I needed a queue system that was "Aware" of time.&lt;/p&gt;

&lt;p&gt;I used the &lt;code&gt;bottleneck&lt;/code&gt; library to enforce a strict "Speed Limit" that is aware of concurrency.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Target Speed:&lt;/strong&gt; ~450 RPM (Requests Per Minute) to stay safe.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Calculation:&lt;/strong&gt; 60,000ms / 450 ≈ 133ms delay.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Concurrency:&lt;/strong&gt; We allow 10 concurrent requests so we don't lose time waiting for network latency.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;javascript
const Bottleneck = require("bottleneck");

// Configure the limiter
const limiter = new Bottleneck({
  minTime: 133, // Wait 133ms between launching requests
  maxConcurrent: 10 // Allow 10 active connections to handle latency
});

// Wrap the AI call
const task = limiter.schedule(async () =&amp;gt; {
   return await callOpenAI(row);
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. Handling "Fatal" vs "Minor" Errors
&lt;/h2&gt;

&lt;p&gt;When processing thousands of rows, you don't want to stop if &lt;em&gt;one&lt;/em&gt; row fails (e.g., bad encoding). But you &lt;em&gt;do&lt;/em&gt; want to stop if you run out of API Credits or hit a hard daily limit.&lt;/p&gt;

&lt;p&gt;We implemented a custom error handling logic where the agent throws specific &lt;code&gt;FATAL_&lt;/code&gt; error codes, which the queue listener catches to &lt;code&gt;stream.destroy()&lt;/code&gt; immediately.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;javascript
// Simplified Logic
limiter.schedule(async () =&amp;gt; {
  try {
     return await agent(row);
  } catch (e) {
     if (e.message.startsWith("FATAL_")) {
        // Kill the queue immediately so we don't waste retries
        limiter.stop({ dropWaitingJobs: true });
        stream.destroy(); 
        console.error("🛑 Queue Killed: " + e.message);
     }
  }
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. Context-Aware Prompting
&lt;/h2&gt;

&lt;p&gt;Even with the architecture fixed, LLMs have a habit of hallucinating IDs. If a product description says "100% Cotton," the model might return &lt;code&gt;100&lt;/code&gt; as the ID.&lt;/p&gt;

&lt;p&gt;We solved this using &lt;strong&gt;Negative Constraints&lt;/strong&gt; and &lt;strong&gt;Few-Shot Prompting&lt;/strong&gt; to force strict integer validation against the 2024 Taxonomy standard.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Result
&lt;/h2&gt;

&lt;p&gt;We ran a stress test yesterday against a raw dataset of unorganized products:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Input:&lt;/strong&gt; 2,000 Unorganized SKUs (15MB CSV).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Throughput:&lt;/strong&gt; ~450 RPM (Requests Per Minute).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Errors:&lt;/strong&gt; 0 Rate Limit Errors (429s).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Time:&lt;/strong&gt; ~4.5 Minutes total.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Accuracy:&lt;/strong&gt; 100% Valid Integer IDs (No text hallucinations).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fccmpatee4z2701li471y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fccmpatee4z2701li471y.png" alt="Result of stress test of 2000 rows"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By combining Node.js Streams for memory management and Bottleneck for flow control, we turned a script that crashed at 500 rows into an engine that handles 50k rows effortlessly.&lt;/p&gt;

&lt;h2&gt;
  
  
  🚀 We just launched on Product Hunt!
&lt;/h2&gt;

&lt;p&gt;I wrapped this engine into an API called &lt;strong&gt;CatMap&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It’s live on Product Hunt today. If you want to test the speed yourself (or try to break it with a messy CSV), we just opened the &lt;strong&gt;Public Demo Key&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Check it out here (and I'd love your support!):&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://www.producthunt.com/products/catmap" rel="noopener noreferrer"&gt;CatMap API on Product Hunt 🚀&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let me know in the comments if you have questions about the Node.js implementation or the prompting strategy!&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>node</category>
      <category>openai</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
