<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Medhansh Pratap Singh</title>
    <description>The latest articles on DEV Community by Medhansh Pratap Singh (@singhmedhansh).</description>
    <link>https://dev.to/singhmedhansh</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3904601%2F1019582d-2605-4256-9998-0a166c06ed7e.jpeg</url>
      <title>DEV Community: Medhansh Pratap Singh</title>
      <link>https://dev.to/singhmedhansh</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/singhmedhansh"/>
    <language>en</language>
    <item>
      <title>I audited my AI tool catalog with Claude — turns out 50% was mis-categorized</title>
      <dc:creator>Medhansh Pratap Singh</dc:creator>
      <pubDate>Wed, 29 Apr 2026 15:21:37 +0000</pubDate>
      <link>https://dev.to/singhmedhansh/i-audited-my-ai-tool-catalog-with-claude-turns-out-50-was-mis-categorized-2414</link>
      <guid>https://dev.to/singhmedhansh/i-audited-my-ai-tool-catalog-with-claude-turns-out-50-was-mis-categorized-2414</guid>
      <description>&lt;p&gt;A week before my college's AI Expo, I was demoing my side project — a curated AI tool finder for students called AI Compass.&lt;/p&gt;

&lt;p&gt;I picked &lt;strong&gt;"Coding"&lt;/strong&gt; as the goal. The wizard recommended &lt;strong&gt;Suno&lt;/strong&gt; in the top 6.&lt;/p&gt;

&lt;p&gt;Suno makes AI music. It has nothing to do with coding.&lt;/p&gt;

&lt;p&gt;I checked the data file. Suno's category? &lt;code&gt;"Coding"&lt;/code&gt;. So was Power BI. So was Loom. So was Discord, somehow.&lt;/p&gt;

&lt;p&gt;The catalog was lying.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shape of the problem
&lt;/h2&gt;

&lt;p&gt;AI Compass works simply: students answer 4 questions (goal, use case, budget, platform), and the wizard returns 5-6 tools with a one-line reason for each. The catalog had ~450 tools, each tagged into one category from a fixed list (Coding, Writing &amp;amp; Chat, Research, Productivity, Image Gen, Video Gen).&lt;/p&gt;

&lt;p&gt;The wizard had a strict hard-gate: pick goal=coding → only tools tagged "Coding" can surface.&lt;/p&gt;

&lt;p&gt;The gate worked. The data didn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  The midnight band-aid
&lt;/h2&gt;

&lt;p&gt;I had hours, not days. So I shipped a defense-in-depth fix instead of touching all 450 entries: a tag-based veto.&lt;/p&gt;

&lt;p&gt;A tool now had to pass &lt;em&gt;two&lt;/em&gt; checks: its category had to match, AND at least one of its tags or use_cases had to contain a category-relevant keyword.&lt;/p&gt;

&lt;p&gt;For Coding, that meant &lt;code&gt;code&lt;/code&gt;, &lt;code&gt;programming&lt;/code&gt;, &lt;code&gt;developer&lt;/code&gt;, &lt;code&gt;github&lt;/code&gt;, &lt;code&gt;debug&lt;/code&gt;, &lt;code&gt;framework&lt;/code&gt;, etc.&lt;/p&gt;

&lt;p&gt;Suno's tags: &lt;code&gt;['music', 'vocals', 'AI generation', 'creative', 'design']&lt;/code&gt;. Zero coding keywords. Vetoed.&lt;/p&gt;

&lt;p&gt;The veto caught &lt;strong&gt;60 mis-tagged tools across all 6 goals.&lt;/strong&gt; Suno, Power BI, Loom, Discord, Khan Academy (all "Coding" for some reason), plus presentation tools tagged "Writing &amp;amp; Chat," education courses tagged "Video Generation."&lt;/p&gt;

&lt;p&gt;Shipped. Suno gone. Crisis averted.&lt;/p&gt;

&lt;p&gt;But the band-aid was hiding the real problem: about half of my categories were just wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  The actual fix: auditing 450 tools with Claude
&lt;/h2&gt;

&lt;p&gt;Cold-prompting an LLM with "categorize my catalog" is a hallucination factory. It'll confidently relabel tools based on training data that's outdated or just wrong.&lt;/p&gt;

&lt;p&gt;I structured the audit around three constraints:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. No web lookups, no training-data inference.&lt;/strong&gt; Categorize each tool &lt;em&gt;only&lt;/em&gt; using fields already in the data file (name, description, tags, use_cases). If those don't make the category obvious, mark &lt;code&gt;confidence: low&lt;/code&gt; and flag for human review. Don't guess.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Apply an explicit taxonomy.&lt;/strong&gt; I wrote the 8-category rulebook upfront — what each category means, what the multi-category tiebreaker is, where edge cases go. The prompt embedded this verbatim.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Output a proposal, not changes.&lt;/strong&gt; The audit wrote a separate JSON file with one entry per tool: current category, proposed category, confidence, reasoning. I reviewed before anything got applied.&lt;/p&gt;

&lt;p&gt;The audit found:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;226 proposed changes&lt;/strong&gt; (50% of the catalog)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3 missing categories&lt;/strong&gt; the taxonomy didn't have: Audio &amp;amp; Voice, Courses &amp;amp; Tutorials, Design &amp;amp; Graphics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~60 tools borderline by inclusion bar&lt;/strong&gt; — vanilla productivity apps without meaningful AI features&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;40 tools with thin metadata&lt;/strong&gt; — even with the rules explicit, the data was too sparse to categorize&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I approved 209 changes, removed 7 vanilla apps (Slack, Zulip, Apple Notes, etc.), and added the 3 new categories.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed for users
&lt;/h2&gt;

&lt;p&gt;Top results for &lt;strong&gt;goal = coding&lt;/strong&gt;, before:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Chatgpt&lt;/li&gt;
&lt;li&gt;Suno (Music AI)&lt;/li&gt;
&lt;li&gt;Power BI (BI Tool)&lt;/li&gt;
&lt;li&gt;Some Educational courses &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Chatgpt&lt;/li&gt;
&lt;li&gt;Claude&lt;/li&gt;
&lt;li&gt;Github Copilot&lt;/li&gt;
&lt;li&gt;Cursor&lt;/li&gt;
&lt;li&gt;VS Code&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The wizard finally returns the tools you'd actually expect. Claude went from "Writing &amp;amp; Chat" to "Coding" — matching its 2026 reputation as one of the strongest coding models, instead of being buried under chat queries.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm taking from this
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Bad data is silent.&lt;/strong&gt; No test caught Suno-for-coding because the schema was valid — the &lt;em&gt;value&lt;/em&gt; was wrong. I need tests that validate "goal X should surface tools like Y," not just that the JSON parses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LLMs are great at applying rules, terrible at inferring them.&lt;/strong&gt; The audit worked because I wrote the taxonomy first. "Categorize this however you think best" would've been 450 hallucinations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Boring infrastructure work is the actual moat.&lt;/strong&gt; TAAFT has 48,000 tools scraped. AI Compass has ~440 hand-curated. The only reason a student picks the smaller one is &lt;em&gt;curation quality&lt;/em&gt; — and that breaks the moment your data lies.&lt;/p&gt;




&lt;p&gt;If you want to try it: &lt;a href="https://ai-compass.in/ai-tool-finder" rel="noopener noreferrer"&gt;ai-compass.in/ai-tool-finder&lt;/a&gt; — 4 questions, 5-6 hand-picked tools, no signup.&lt;/p&gt;

&lt;p&gt;Honest feedback welcome, especially queries where the recommendations &lt;em&gt;still&lt;/em&gt; feel wrong. That's how the catalog actually gets better.&lt;/p&gt;

&lt;p&gt;19, CS at RVCE Bangalore. Building AI Compass as my first indie project.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>sideprojects</category>
      <category>beginners</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
