<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: mohd zaki</title>
    <description>The latest articles on DEV Community by mohd zaki (@mohd_zaki_0f1543d6bd21ea2).</description>
    <link>https://dev.to/mohd_zaki_0f1543d6bd21ea2</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3658015%2Fcfebb2fc-892e-4d70-8962-9e4fc58dee1f.jpg</url>
      <title>DEV Community: mohd zaki</title>
      <link>https://dev.to/mohd_zaki_0f1543d6bd21ea2</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mohd_zaki_0f1543d6bd21ea2"/>
    <language>en</language>
    <item>
      <title>Building an AI cost-optimizer (routing + caching + VSCode SDK). Looking for feedback.</title>
      <dc:creator>mohd zaki</dc:creator>
      <pubDate>Fri, 12 Dec 2025 01:50:57 +0000</pubDate>
      <link>https://dev.to/mohd_zaki_0f1543d6bd21ea2/building-an-ai-cost-optimizer-routing-caching-vscode-sdk-looking-for-feedback-10p5</link>
      <guid>https://dev.to/mohd_zaki_0f1543d6bd21ea2/building-an-ai-cost-optimizer-routing-caching-vscode-sdk-looking-for-feedback-10p5</guid>
      <description>&lt;p&gt;👋 Hey devs — Looking for feedback on my AI cost-optimization + “AI Slop Prevention” tool&lt;/p&gt;

&lt;p&gt;I'm Zach, and I’ve been building AI features for a while now.&lt;br&gt;
Like many of you, I started noticing the same painful problems every time I shipped anything that used LLMs.&lt;/p&gt;




&lt;p&gt;💸 The problem (from a developer’s perspective)&lt;/p&gt;

&lt;p&gt;AI bills get out of control fast.&lt;br&gt;
Even if you log usage, you still can't answer:&lt;/p&gt;

&lt;p&gt;“Which model is burning money?”&lt;/p&gt;

&lt;p&gt;“Why did this prompt suddenly cost 10× more?”&lt;/p&gt;

&lt;p&gt;“Is this output identical to something we already generated?”&lt;/p&gt;

&lt;p&gt;“Should this request even go to GPT-4, or would Groq/Claude suffice?”&lt;/p&gt;

&lt;p&gt;“Why did the LLM produce 3,000 tokens of slop when I asked for 200?”&lt;/p&gt;

&lt;p&gt;“How do I give my team access without accidentally giving them access to ruin my budget?”&lt;/p&gt;

&lt;p&gt;And then there’s AI Slop —&lt;br&gt;
unnecessary tokens, verbose responses, hallucinated filler text, or redundant reasoning chains that waste tokens without adding value.&lt;/p&gt;

&lt;p&gt;Most teams have no defense against it.&lt;/p&gt;

&lt;p&gt;I got tired of fighting this manually, so I started building something small…&lt;br&gt;
and it turned into a real product.&lt;/p&gt;




&lt;p&gt;🚀 Introducing PricePrompter Cloud&lt;/p&gt;

&lt;p&gt;A lightweight proxy + devtool that optimizes AI cost, reduces token waste, and prevents AI slop — without changing how you code.&lt;/p&gt;

&lt;p&gt;You keep your existing OpenAI/Anthropic calls.&lt;br&gt;
We handle the optimization layer behind the scenes.&lt;/p&gt;




&lt;p&gt;🔧 What it does&lt;/p&gt;




&lt;p&gt;1️⃣ Smart Routing (UCG Engine)&lt;/p&gt;

&lt;p&gt;Send your AI request to PricePrompter →&lt;br&gt;
we send it to the cheapest model that satisfies your quality requirements.&lt;/p&gt;

&lt;p&gt;GPT-4 → Claude-Sonnet if equivalent&lt;/p&gt;

&lt;p&gt;GPT-3.5 style → Groq if faster/cheaper&lt;/p&gt;

&lt;p&gt;Or stay on your preferred model with cost warnings&lt;/p&gt;

&lt;p&gt;Your code stays unchanged.&lt;/p&gt;




&lt;p&gt;2️⃣ FREE Semantic Caching&lt;/p&gt;

&lt;p&gt;We automatically store/recognize semantically similar requests and return cached results when safe.&lt;/p&gt;

&lt;p&gt;You get real observability:&lt;/p&gt;

&lt;p&gt;Cache hits&lt;/p&gt;

&lt;p&gt;Cache misses&lt;/p&gt;

&lt;p&gt;Percentage matched&lt;/p&gt;

&lt;p&gt;Total savings&lt;/p&gt;

&lt;p&gt;Caching will always remain free.&lt;/p&gt;




&lt;p&gt;3️⃣ AI Slop Prevention Engine&lt;/p&gt;

&lt;p&gt;This is one of the features I’m most excited about.&lt;/p&gt;

&lt;p&gt;We detect:&lt;/p&gt;

&lt;p&gt;Overlong responses&lt;/p&gt;

&lt;p&gt;Repeated sections&lt;/p&gt;

&lt;p&gt;Chain-of-thought that isn’t needed&lt;/p&gt;

&lt;p&gt;Redundant reasoning&lt;/p&gt;

&lt;p&gt;Token inflation&lt;/p&gt;

&lt;p&gt;Hallucinated filler&lt;/p&gt;

&lt;p&gt;And we trim, constrain, or guide the LLM to reduce token waste before the request hits your billing.&lt;/p&gt;

&lt;p&gt;Think of it as:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Linting for LLM calls.”&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;4️⃣ Developer Tools (Cursor-style SDK)&lt;/p&gt;

&lt;p&gt;A VS Code extension + SDK that gives you:&lt;/p&gt;

&lt;p&gt;Cost per request (live)&lt;/p&gt;

&lt;p&gt;Alternative model suggestions&lt;/p&gt;

&lt;p&gt;Token breakdown&lt;/p&gt;

&lt;p&gt;“Why this request was expensive” explanation&lt;/p&gt;

&lt;p&gt;Model routing logs&lt;/p&gt;

&lt;p&gt;Usage analytics directly in your editor&lt;/p&gt;

&lt;p&gt;No need to open dashboards unless you want deeper insights.&lt;/p&gt;




&lt;p&gt;5️⃣ Team &amp;amp; Enterprise Governance&lt;/p&gt;

&lt;p&gt;Practical controls for growing teams:&lt;/p&gt;

&lt;p&gt;Spending limits&lt;/p&gt;

&lt;p&gt;Model-level permissions&lt;/p&gt;

&lt;p&gt;Approval for high-cost requests&lt;/p&gt;

&lt;p&gt;PII masking&lt;/p&gt;

&lt;p&gt;Key rotation&lt;/p&gt;

&lt;p&gt;Audit logs&lt;/p&gt;

&lt;p&gt;Team-level reporting&lt;/p&gt;

&lt;p&gt;Nothing enterprise-y in a bad way — just the stuff dev teams actually need.&lt;/p&gt;




&lt;p&gt;🎯 Who this is for&lt;/p&gt;

&lt;p&gt;Developers building LLM features&lt;/p&gt;

&lt;p&gt;SaaS teams using expensive models&lt;/p&gt;

&lt;p&gt;Startups struggling with unpredictable OpenAI bills&lt;/p&gt;

&lt;p&gt;Agencies running multi-client workloads&lt;/p&gt;

&lt;p&gt;Anyone experimenting with multi-model routing&lt;/p&gt;

&lt;p&gt;Anyone who wants visibility into token usage&lt;/p&gt;

&lt;p&gt;Anyone tired of “AI slop” blowing up their costs&lt;/p&gt;




&lt;p&gt;💬 What I’m looking for:&lt;/p&gt;

&lt;p&gt;I’d love real feedback from developers:&lt;/p&gt;

&lt;p&gt;Would you trust a proxy that optimizes your LLM cost?&lt;/p&gt;

&lt;p&gt;Is AI slop prevention actually useful in your workflow?&lt;/p&gt;

&lt;p&gt;Is free semantic caching valuable?&lt;/p&gt;

&lt;p&gt;What would make this a must-have devtool?&lt;/p&gt;

&lt;p&gt;What pricing model makes sense for you?&lt;/p&gt;

&lt;p&gt;Any dealbreakers or concerns?&lt;/p&gt;

&lt;p&gt;Still shaping the MVP — so your input directly influences what gets built next.&lt;/p&gt;

&lt;p&gt;Happy to answer questions or share a preview.&lt;/p&gt;

&lt;p&gt;Thanks dev.to! 🙌&lt;br&gt;
— Zach&lt;/p&gt;

</description>
      <category>aiops</category>
      <category>ai</category>
      <category>saas</category>
      <category>startup</category>
    </item>
  </channel>
</rss>
