<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sergii Khomenko</title>
    <description>The latest articles on DEV Community by Sergii Khomenko (@sergii-khomenko).</description>
    <link>https://dev.to/sergii-khomenko</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3901595%2F90f8a3cb-944f-4275-9069-8cc222531075.jpg</url>
      <title>DEV Community: Sergii Khomenko</title>
      <link>https://dev.to/sergii-khomenko</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sergii-khomenko"/>
    <language>en</language>
    <item>
      <title>LLM Cost Tracking for Rails</title>
      <dc:creator>Sergii Khomenko</dc:creator>
      <pubDate>Thu, 28 May 2026 05:17:26 +0000</pubDate>
      <link>https://dev.to/sergii-khomenko/llm-cost-tracking-for-rails-26ji</link>
      <guid>https://dev.to/sergii-khomenko/llm-cost-tracking-for-rails-26ji</guid>
      <description>&lt;p&gt;A Rails app starts calling OpenAI or Anthropic. A few months later someone in finance asks "who's burning $X a month on this and on what?" The answer requires per-user, per-feature, per-tenant attribution — and the obvious solutions all want you to give up something I wasn't willing to give up.&lt;/p&gt;

&lt;p&gt;This is the design rationale behind &lt;a href="https://github.com/sergey-homenko/llm_cost_tracker" rel="noopener noreferrer"&gt;llm_cost_tracker&lt;/a&gt;, a Rails Engine I've been building. It's not the only way to solve this problem; it's the way that fit the constraints I cared about.&lt;/p&gt;

&lt;h2&gt;
  
  
  The constraint set
&lt;/h2&gt;

&lt;p&gt;Three non-negotiables shaped every other choice:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;No new infra.&lt;/strong&gt; A Rails app already has a database, a request lifecycle, an authentication layer, a dashboard pattern. Anything I bolt on should reuse those, not duplicate them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No prompt storage.&lt;/strong&gt; Prompt content is regulated data in a lot of contexts — PII, customer transcripts, medical, legal. The tracker has no business holding it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No traffic redirection.&lt;/strong&gt; Direct calls to OpenAI / Anthropic / Gemini are the simplest path and the one with fewest failure modes. A proxy adds a hop, a key rotation surface, and a vendor relationship.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Those three rules ruled out most of the existing landscape.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why not a proxy
&lt;/h2&gt;

&lt;p&gt;The first instinct for "track LLM spend" is to put a proxy in front. Helicone, Portkey, LiteLLM Proxy, OpenRouter — they all model the problem this way: route OpenAI traffic through &lt;code&gt;proxy.example.com&lt;/code&gt;, the proxy sees the request and response, logs cost, forwards to OpenAI.&lt;/p&gt;

&lt;p&gt;It's a clean separation. It also means your API keys live in their config not yours, their downtime is your downtime, their TLS and data-residency posture is yours by default, their rate-limiting sits between your code and the provider, and a new SDK feature waits on their proxy supporting it.&lt;/p&gt;

&lt;p&gt;For some teams that trade is fine. It wasn't fine for me — the LLM call is already the most expensive and most reliability-sensitive thing in the request, and putting another hop in front of it felt like the wrong move.&lt;/p&gt;

&lt;p&gt;The alternative: capture what we need inside the Ruby process, on the way out. Patch the official SDK methods at boot, or wrap the underlying Faraday client. The call still goes straight to OpenAI / Anthropic / Gemini; we just observe the request and response as they pass through.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why ActiveRecord, not a TSDB
&lt;/h2&gt;

&lt;p&gt;The second fork: where does the data go? Cost tracking is shaped like a time series — append-mostly rows, aggregations over time windows. A TSDB (Timescale, ClickHouse, Influx) is the textbook answer.&lt;/p&gt;

&lt;p&gt;I picked Postgres / MySQL via ActiveRecord anyway, for one reason: the data is operational, not analytical. It needs to join to your &lt;code&gt;users&lt;/code&gt; table, your &lt;code&gt;subscriptions&lt;/code&gt; table, your &lt;code&gt;tenants&lt;/code&gt; table. It needs to live behind the same RLS and the same backups as the rest of your app data. Standing up a separate TSDB to query "show me LLM cost for tenant 42 last month" makes that join harder, not easier.&lt;/p&gt;

&lt;p&gt;Three tables ship in the install generator: &lt;code&gt;llm_cost_tracker_calls&lt;/code&gt; (one row per LLM call, with token counts and total cost), &lt;code&gt;llm_cost_tracker_call_line_items&lt;/code&gt; (per-component breakdown — input, output, cache reads, hosted tool charges), and &lt;code&gt;llm_cost_tracker_call_tags&lt;/code&gt; (the attribution rows). For the LLM volumes most Rails apps see today, a single Postgres handles this fine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why block-scoped tags
&lt;/h2&gt;

&lt;p&gt;Attribution is the whole game. Tokens × rate × model gives you a total; tags answer "whose total is it?"&lt;/p&gt;

&lt;p&gt;The mechanism is a block:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="no"&gt;LlmCostTracker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_tags&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;user_id: &lt;/span&gt;&lt;span class="n"&gt;current_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;feature: &lt;/span&gt;&lt;span class="s2"&gt;"support_chat"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;model: &lt;/span&gt;&lt;span class="s2"&gt;"gpt-4o-mini"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;messages: &lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Anything that hits a tracked SDK or Faraday client inside that block picks up the tags. You wrap it around an &lt;code&gt;around_action&lt;/code&gt; in a controller, around &lt;code&gt;perform&lt;/code&gt; in a job, around a feature module's entry point. The SDK call itself doesn't change.&lt;/p&gt;

&lt;p&gt;The reason it's not a kwarg on the SDK call: I don't control the SDK call. The OpenAI gem's &lt;code&gt;client.chat.completions.create&lt;/code&gt; has its own signature; threading a tag through it would mean either monkey-patching the call shape or asking every caller to use a wrapper. Block-scoped context fits Ruby's grain — same shape as &lt;code&gt;ActiveSupport::CurrentAttributes&lt;/code&gt;, same shape as Rails request-store patterns.&lt;/p&gt;

&lt;p&gt;Tags merge across nested blocks (inner wins), get sanitized for high-cardinality or secret-shaped values, and end up as a row per (call, key, value) in the database. Group by, filter, breakdown.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a frozen pricing snapshot per call
&lt;/h2&gt;

&lt;p&gt;Prices change. OpenAI cut prompt caching rates twice in the last year; Anthropic introduced 1-hour cache TTL with its own rate; Gemini rolled out context-length-tiered pricing. If you compute cost lazily — "the rate is whatever the current price table says" — you have a moving floor under historical reports.&lt;/p&gt;

&lt;p&gt;So every call freezes its pricing snapshot at write time: the exact per-component rate that produced its cost, stamped on the row. Run a report from three months ago today, you get what it cost then. Update the price table tomorrow, historical numbers don't shift.&lt;/p&gt;

&lt;p&gt;The trade-off is storage: a few hundred bytes per call for the snapshot. At the volumes we're talking about for LLM calls, that's invisible next to the message bodies themselves (which we don't store).&lt;/p&gt;

&lt;h2&gt;
  
  
  What's there now
&lt;/h2&gt;

&lt;p&gt;Version 0.11.0 instruments three official SDKs (OpenAI, Anthropic, RubyLLM) and ships Faraday middleware for everything else — OpenAI-compatible APIs like Groq, DeepSeek, OpenRouter; Azure OpenAI on both endpoint styles; Gemini; custom gateways. The mounted dashboard at &lt;code&gt;/llm-costs&lt;/code&gt; has pages for cost overview, top models, the call ledger, tag breakdowns, data-quality signals, and a pricing reference. Budget guardrails block calls before send when an estimate would cross a configured monthly, daily, or per-call cap.&lt;/p&gt;

&lt;p&gt;What it deliberately isn't: prompt or completion storage, trace replay, eval framework, model-routing logic, sidecar service, OpenTelemetry exporter. Each of those would justify a separate gem.&lt;/p&gt;

&lt;p&gt;If your shape is "Rails app, direct API calls to one or two providers, finance asking where the spend goes" — this is the layer I wanted to exist.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/sergey-homenko/llm_cost_tracker" rel="noopener noreferrer"&gt;github.com/sergey-homenko/llm_cost_tracker&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ruby</category>
      <category>rails</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Writing code got cheap. Being responsible for it didn't. (What shipping an AI-generated gem taught me)</title>
      <dc:creator>Sergii Khomenko</dc:creator>
      <pubDate>Sat, 23 May 2026 20:15:30 +0000</pubDate>
      <link>https://dev.to/sergii-khomenko/writing-code-got-cheap-being-responsible-for-it-didnt-what-shipping-an-ai-generated-gem-taught-1fci</link>
      <guid>https://dev.to/sergii-khomenko/writing-code-got-cheap-being-responsible-for-it-didnt-what-shipping-an-ai-generated-gem-taught-1fci</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Quick warning before you read: this isn't a launch announcement. It's a story about getting something wrong in public, and changing how I work because of it. If that's not your thing, no hard feelings.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The confession
&lt;/h2&gt;

&lt;p&gt;A little over a month ago I published a Ruby gem called &lt;code&gt;llm_cost_tracker&lt;/code&gt;. Here's the honest part: almost all of it was written by an LLM. I prompted, it produced, I shipped. I felt pretty good about myself for about a week. Then I posted it to r/ruby asking for feedback, and people gave me exactly that.&lt;/p&gt;

&lt;p&gt;They were right, too. A chunk of the code was just hallucinated. The example I keep coming back to, because it's so on the nose: my gemspec listed &lt;code&gt;activesupport&lt;/code&gt; and &lt;code&gt;activerecord&lt;/code&gt; as hard dependencies through &lt;code&gt;add_dependency&lt;/code&gt;. The gem can't even load without them. And yet, inside the gem, there was code carefully checking at runtime whether ActiveSupport and ActiveRecord were present. It was defending against the absence of the two things it literally cannot run without.&lt;/p&gt;

&lt;p&gt;Nobody writes that on purpose. The model produced something that &lt;em&gt;looked&lt;/em&gt; careful, and I shipped it without noticing, because I hadn't really read it. Once you spot one of those, you start seeing them everywhere: guard clauses for impossible states, checks that check the checks, abstractions wrapped around abstractions. Paranoia mode. Over-engineering for situations that can't happen.&lt;/p&gt;

&lt;p&gt;None of the feedback was mean. It was just correct, and it stung because I couldn't argue with any of it. You can't defend code you never understood in the first place.&lt;/p&gt;

&lt;h2&gt;
  
  
  The idea was fine. I wasn't.
&lt;/h2&gt;

&lt;p&gt;Worth saying: the gem itself solves a real problem. If your Rails app calls an LLM, the monthly bill tells you &lt;em&gt;what&lt;/em&gt; you spent and basically nothing about &lt;em&gt;who&lt;/em&gt; spent it. You get totals per model, maybe per API key. What you don't get is your own world: which feature made the call, which tenant it belongs to, whether that prompt tweak you shipped last Tuesday quietly doubled your token usage.&lt;/p&gt;

&lt;p&gt;That information only exists for a moment, at call time, inside your app. The provider has no idea what &lt;code&gt;feature: "chat"&lt;/code&gt; means to you. Miss it there and it's gone for good.&lt;/p&gt;

&lt;p&gt;The tools that already exist aim higher than I needed. Langfuse is full-blown observability, and self-hosting it means running Postgres &lt;em&gt;and&lt;/em&gt; ClickHouse &lt;em&gt;and&lt;/em&gt; Redis &lt;em&gt;and&lt;/em&gt; S3. Helicone sits as a proxy on the path of every call. LiteLLM lives over in Python. I just wanted a small Rails-native thing that answered one question, which feature spent the money, and then got out of my way.&lt;/p&gt;

&lt;p&gt;Good idea, bad execution. Not telling those two apart is what nearly made me delete the whole thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I changed, and what I didn't
&lt;/h2&gt;

&lt;p&gt;Here's where I want to be careful, because the easy version of this story is a lie.&lt;/p&gt;

&lt;p&gt;I did not throw it all out and lovingly rewrite it by hand. That would make a nicer arc and it isn't true. The truth is messier. There is still a pile of generated code in this gem, and I'm still working through it, file by file.&lt;/p&gt;

&lt;p&gt;What actually changed is &lt;em&gt;how&lt;/em&gt; I work. Some of it I rewrite myself now. Plenty of it I still hand to agents, because swearing off the tools would be silly and I don't believe in it. But I delegate completely differently than I used to. I read every diff. I review it the way I'd review a coworker's PR, suspicious by default. And I don't commit anything I can't explain out loud. The gem that got roasted and the gem today aren't "AI-written" versus "human-written." They're "shipped blind" versus "actually mine."&lt;/p&gt;

&lt;p&gt;I'm in the middle of that cleanup right now, today. Tearing out the paranoid guards, the pointless self-checks, the engineering that existed to look thorough rather than to do anything. It's slow and it's boring and nobody claps for it. I'm doing it anyway, because quality is the part I can actually control, and I'd rather tell you the cleanup is still happening than pretend I've crossed a line I haven't.&lt;/p&gt;

&lt;h2&gt;
  
  
  The lesson: writing code got cheap, so the job moved
&lt;/h2&gt;

&lt;p&gt;This is the bit I'd hand to a younger me.&lt;/p&gt;

&lt;p&gt;You can't give the machine the keyboard and walk off. What it gives you back is a draft, not a finished thing. The second I treated a draft as done, I'd quietly handed off my own understanding, and it took a stranger about thirty seconds to notice the hole. There's no faking that you understand your own code once someone starts asking questions.&lt;/p&gt;

&lt;p&gt;But the reframe that actually stuck with me is this. Writing code is cheap now. It doesn't cost what it used to in hours or effort. And when the cost of making something drops, the value just moves somewhere else. Here it moved to quality, and to ownership. The hours I used to spend typing, I now spend reading, shaping, and standing behind whatever goes out under my name. That is not a smaller job. For a solo maintainer it might honestly be the &lt;em&gt;entire&lt;/em&gt; job now.&lt;/p&gt;

&lt;p&gt;The roast didn't talk me out of using AI. It talked me into raising my review bar to match how fast the code shows up. Cheap to generate, expensive to be responsible for. And the responsibility was always going to be mine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it honestly stands
&lt;/h2&gt;

&lt;p&gt;What it does: it logs every LLM call into your own Postgres or MySQL. Provider, model, tokens, cost, latency, a per-component breakdown, and a pricing snapshot so old numbers don't silently change when the provider updates prices. No proxy, calls go straight to the provider. No second datastore to run. Attribution is just tags you wrap around a call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="no"&gt;LlmCostTracker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_tags&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;user_id: &lt;/span&gt;&lt;span class="no"&gt;Current&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;user&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;feature: &lt;/span&gt;&lt;span class="s2"&gt;"chat"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;OpenAI&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;api_key: &lt;/span&gt;&lt;span class="no"&gt;ENV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"OPENAI_API_KEY"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
  &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;model: &lt;/span&gt;&lt;span class="s2"&gt;"gpt-4o"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;input: &lt;/span&gt;&lt;span class="s2"&gt;"Hello"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhc5gbhgear51q5my90cw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhc5gbhgear51q5my90cw.png" alt="LLM Cost Tracker dashboard (demo data)" width="800" height="464"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What it's deliberately not: no prompt capture, no traces, no replay, not invoice-grade. If what you need is observability, go use Langfuse. I'll say that to your face.&lt;/p&gt;

&lt;p&gt;And two things I'm not going to dress up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It still has a lot of generated code in it that I'm reviewing and reworking. The direction is right. It is not finished, and I won't pretend otherwise.&lt;/li&gt;
&lt;li&gt;It is not running in anyone's production yet, mine included. I've tested it against a separate test app with real OpenAI and Anthropic keys, paying for the calls out of my own pocket, streaming and all. Capture works, tags attribute correctly, the dashboard renders. But "works on my test app" is a long way from "battle-tested," and I know the difference.
## What I'm actually asking for&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I want a handful of early adopters. People with real LLM calls in a real Rails app who'll drop this in and tell me, bluntly, what falls over. Blunt clearly works on me. The &lt;a href="https://github.com/sergey-homenko/llm_cost_tracker" rel="noopener noreferrer"&gt;repo and docs are on GitHub&lt;/a&gt;, it's MIT, and an issue telling me what broke is worth more to me than a star.&lt;/p&gt;

&lt;p&gt;And if you build with AI too, which is most of us by now, maybe that's the whole takeaway: the tools made writing code cheap, but they didn't make being responsible for it cheap. That part is still on us. I learned it the embarrassing way. You don't have to.&lt;/p&gt;

&lt;p&gt;So, genuinely curious: how does your team attribute LLM spend right now? Every answer I get is some flavor of "we hacked something together ourselves." Which is exactly the itch I'm still scratching.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;And yes, full disclosure: AI helped me put this post together too. The difference, this time, is that I was actually in the room for it. ;)&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ruby</category>
      <category>rails</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
