DEV Community

mohd zaki
mohd zaki

Posted on

Building an AI cost-optimizer (routing + caching + VSCode SDK). Looking for feedback.

👋 Hey devs — Looking for feedback on my AI cost-optimization + “AI Slop Prevention” tool

I'm Zach, and I’ve been building AI features for a while now.
Like many of you, I started noticing the same painful problems every time I shipped anything that used LLMs.


💸 The problem (from a developer’s perspective)

AI bills get out of control fast.
Even if you log usage, you still can't answer:

“Which model is burning money?”

“Why did this prompt suddenly cost 10× more?”

“Is this output identical to something we already generated?”

“Should this request even go to GPT-4, or would Groq/Claude suffice?”

“Why did the LLM produce 3,000 tokens of slop when I asked for 200?”

“How do I give my team access without accidentally giving them access to ruin my budget?”

And then there’s AI Slop —
unnecessary tokens, verbose responses, hallucinated filler text, or redundant reasoning chains that waste tokens without adding value.

Most teams have no defense against it.

I got tired of fighting this manually, so I started building something small…
and it turned into a real product.


🚀 Introducing PricePrompter Cloud

A lightweight proxy + devtool that optimizes AI cost, reduces token waste, and prevents AI slop — without changing how you code.

You keep your existing OpenAI/Anthropic calls.
We handle the optimization layer behind the scenes.


🔧 What it does


1️⃣ Smart Routing (UCG Engine)

Send your AI request to PricePrompter →
we send it to the cheapest model that satisfies your quality requirements.

GPT-4 → Claude-Sonnet if equivalent

GPT-3.5 style → Groq if faster/cheaper

Or stay on your preferred model with cost warnings

Your code stays unchanged.


2️⃣ FREE Semantic Caching

We automatically store/recognize semantically similar requests and return cached results when safe.

You get real observability:

Cache hits

Cache misses

Percentage matched

Total savings

Caching will always remain free.


3️⃣ AI Slop Prevention Engine

This is one of the features I’m most excited about.

We detect:

Overlong responses

Repeated sections

Chain-of-thought that isn’t needed

Redundant reasoning

Token inflation

Hallucinated filler

And we trim, constrain, or guide the LLM to reduce token waste before the request hits your billing.

Think of it as:

“Linting for LLM calls.”


4️⃣ Developer Tools (Cursor-style SDK)

A VS Code extension + SDK that gives you:

Cost per request (live)

Alternative model suggestions

Token breakdown

“Why this request was expensive” explanation

Model routing logs

Usage analytics directly in your editor

No need to open dashboards unless you want deeper insights.


5️⃣ Team & Enterprise Governance

Practical controls for growing teams:

Spending limits

Model-level permissions

Approval for high-cost requests

PII masking

Key rotation

Audit logs

Team-level reporting

Nothing enterprise-y in a bad way — just the stuff dev teams actually need.


🎯 Who this is for

Developers building LLM features

SaaS teams using expensive models

Startups struggling with unpredictable OpenAI bills

Agencies running multi-client workloads

Anyone experimenting with multi-model routing

Anyone who wants visibility into token usage

Anyone tired of “AI slop” blowing up their costs


💬 What I’m looking for:

I’d love real feedback from developers:

Would you trust a proxy that optimizes your LLM cost?

Is AI slop prevention actually useful in your workflow?

Is free semantic caching valuable?

What would make this a must-have devtool?

What pricing model makes sense for you?

Any dealbreakers or concerns?

Still shaping the MVP — so your input directly influences what gets built next.

Happy to answer questions or share a preview.

Thanks dev.to! 🙌
— Zach

Top comments (0)