<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Joao Romao</title>
    <description>The latest articles on DEV Community by Joao Romao (@jrmromao).</description>
    <link>https://dev.to/jrmromao</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F303292%2Fd9e47fc3-6ee8-47ca-9318-6f3a0be98a82.jpeg</url>
      <title>DEV Community: Joao Romao</title>
      <link>https://dev.to/jrmromao</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jrmromao"/>
    <language>en</language>
    <item>
      <title>We tracked 200K AI requests. Here's where the money actually goes</title>
      <dc:creator>Joao Romao</dc:creator>
      <pubDate>Sat, 16 May 2026 19:32:47 +0000</pubDate>
      <link>https://dev.to/jrmromao/we-tracked-200k-ai-requests-heres-where-the-money-actually-goes-495e</link>
      <guid>https://dev.to/jrmromao/we-tracked-200k-ai-requests-heres-where-the-money-actually-goes-495e</guid>
      <description>&lt;p&gt;Six months ago I posted here about CostLens — a tool to reduce OpenAI costs. Since then, we've completely rebuilt it based on one question a VP asked our team:&lt;/p&gt;

&lt;p&gt;"How much faster are we delivering with AI? What's the number?"&lt;/p&gt;

&lt;p&gt;Nobody could answer. We had cost dashboards, but no way to connect spend to output. So we built that.&lt;/p&gt;

&lt;h2&gt;
  
  
  What CostLens does now
&lt;/h2&gt;

&lt;p&gt;It's not just cost tracking anymore. It's three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Cost attribution — not "you spent $4,200 on OpenAI" but "code review costs $340/mo, customer support costs $89/mo, and classification costs $12/mo." Per-feature, per-developer.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Smart routing — simple prompts automatically go to cheaper models. No code changes. We're seeing 30-40% savings on teams that were sending everything to GPT-5.4.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Productivity tracking via MCP — works inside Claude Code, Kiro, and Cursor. Tracks sessions, commits, PRs, and time-to-ship. Generates weekly reports your VP can forward to justify the AI budget.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgfidmv3p7rrghi405l6z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgfidmv3p7rrghi405l6z.png" alt="CostLens weekly engineering report showing AI cost per commit ($4.36), per PR ($12.80), session trends, and deliverables — the numbers your VP needs to justify AI spend." width="800" height="667"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup is still one line
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;CostLens&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;costlens&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;costlens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CostLens&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cl_...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;costlens&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wrapOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="c1"&gt;// Done. Every request is tracked and optimized.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For coding agents (Claude Code, Kiro, Cursor):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"costlens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"@costlens/mcp-server"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"COSTLENS_MCP_KEY"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"your_key"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What we learned from tracking 200K requests
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;52% of requests were simple tasks running on expensive models (classification, extraction, yes/no questions)&lt;/li&gt;
&lt;li&gt;Smart routing cut our own bill from $4,100 to $2,337/month&lt;/li&gt;
&lt;li&gt;The "code review" feature was 3x more expensive than we thought&lt;/li&gt;
&lt;li&gt;Sessions with no output (abandoned/looping) accounted for 15% of spend&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Kill switch
&lt;/h2&gt;

&lt;p&gt;One thing we added after a background agent burned $50 in a weekend: a Slack kill switch. When spend velocity spikes, you get a message with a "Pause" button. One click stops that specific agent session. Auto-&lt;br&gt;
resumes after 30 min.&lt;/p&gt;

&lt;h2&gt;
  
  
  Free for individuals
&lt;/h2&gt;

&lt;p&gt;The SDK and MCP server are free. Team features (reports, routing, budgets) start at $99/month.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Website: &lt;a href="https://costlens.dev" rel="noopener noreferrer"&gt;https://costlens.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;SDK: npm install costlens&lt;/li&gt;
&lt;li&gt;MCP: &lt;a href="https://www.npmjs.com/package/@costlens/mcp-server" rel="noopener noreferrer"&gt;https://www.npmjs.com/package/@costlens/mcp-server&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Would love feedback — especially from teams spending $1K+/month on AI. What would make you try this?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>devtools</category>
      <category>openai</category>
    </item>
    <item>
      <title>Is Your OpenAI Bill Giving You Nightmares? I Built a Tool to Help</title>
      <dc:creator>Joao Romao</dc:creator>
      <pubDate>Wed, 22 Oct 2025 10:19:39 +0000</pubDate>
      <link>https://dev.to/jrmromao/is-your-openai-bill-giving-you-nightmares-i-built-a-tool-to-help-4pnn</link>
      <guid>https://dev.to/jrmromao/is-your-openai-bill-giving-you-nightmares-i-built-a-tool-to-help-4pnn</guid>
      <description>&lt;p&gt;Let's be honest: playing with large language models is amazing, but seeing that OpenAI API bill at the end of the month can be... painful. 😅&lt;/p&gt;

&lt;p&gt;I've been working with the GPT-4 and GPT-3.5 APIs a lot, and I noticed how quickly the costs can spiral out of control. A simple task routed to GPT-4 by mistake, an inefficient prompt, or running the same query over and over—it all adds up.&lt;/p&gt;

&lt;p&gt;I kept thinking there &lt;em&gt;had&lt;/em&gt; to be a smarter, more automated way to manage this without rewriting all my code.&lt;/p&gt;

&lt;p&gt;That's why I built &lt;strong&gt;CostLens&lt;/strong&gt;, a simple SDK I'm hoping can help other developers who are facing the same problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is CostLens?
&lt;/h2&gt;

&lt;p&gt;At its core, CostLens is a drop-in SDK that automatically helps you cut your AI costs. The goal is to make it a "set it and forget it" tool that starts saving you money in minutes.&lt;/p&gt;

&lt;p&gt;It works by wrapping your existing OpenAI client. Once it's installed, it automatically does three key things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Smart Model Routing:&lt;/strong&gt; This is the big one. It analyzes your requests and automatically routes simple tasks to cheaper models (like GPT-3.5) while saving the expensive, powerful models (like GPT-4) for the complex tasks that actually need them.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Prompt Optimization:&lt;/strong&gt; The tool's AI rewrites your prompts &lt;em&gt;before&lt;/em&gt; they hit the OpenAI API to be 40-60% shorter. You get the same (or better) quality results for a fraction of the token cost.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Response Caching:&lt;/strong&gt; If your application sends the same request multiple times, CostLens will catch it and return the cached result instantly, for free.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How It Works (It's Quick)
&lt;/h2&gt;

&lt;p&gt;I really wanted to make this as easy as possible, with no need to refactor your existing logic.&lt;/p&gt;

&lt;p&gt;You just import the client, wrap your existing OpenAI instance, and... that's it. You can keep using the OpenAI client just like you always have, and the optimizations happen in the background.&lt;/p&gt;

&lt;p&gt;It looks something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;CostLens&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;costlens&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;OpenAI&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// 1. Initialize CostLens&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;costlens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CostLens&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;COSTLENS_KEY&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// 2. Wrap your existing OpenAI client one time&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;costlens&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wrapOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;

&lt;span class="c1"&gt;// 3. Use it exactly as you did before!&lt;/span&gt;
&lt;span class="c1"&gt;// Savings happen automatically&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gpt-4&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...]&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// ✅ This request is automatically optimized,&lt;/span&gt;
&lt;span class="c1"&gt;// ✅ routed to the best-priced model,&lt;/span&gt;
&lt;span class="c1"&gt;// ✅ and cached for future use.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  I'm Looking for Testers and Feedback!
&lt;/h2&gt;

&lt;p&gt;This started as a personal project, but I think it could be genuinely useful for other indie devs, startups, or anyone who wants to keep their AI experiments affordable.&lt;/p&gt;

&lt;p&gt;I just launched a &lt;strong&gt;Free plan&lt;/strong&gt; that lets you optimize up to $100/month of AI spend, so you can try it out without any risk.&lt;/p&gt;

&lt;p&gt;If you're a developer using the OpenAI API, I would be incredibly grateful if you'd give it a try and let me know what you think.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Does it actually save you money?&lt;/li&gt;
&lt;li&gt;Is the setup as easy as I think it is?&lt;/li&gt;
&lt;li&gt;What features are missing?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can check it out and get started here: &lt;strong&gt;&lt;a href="https://costlens.dev/" rel="noopener noreferrer"&gt;https://costlens.dev/&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;My goal is just to build something that helps the community, so any and all feedback (good or bad) is super welcome.&lt;/p&gt;

&lt;p&gt;Let me know what you think in the comments!&lt;/p&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>npm</category>
      <category>openai</category>
    </item>
  </channel>
</rss>
