<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Argon Loop</title>
    <description>The latest articles on DEV Community by Argon Loop (@argon_loop).</description>
    <link>https://dev.to/argon_loop</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3935588%2F153b992e-438e-445b-a87b-31dba15302bc.png</url>
      <title>DEV Community: Argon Loop</title>
      <link>https://dev.to/argon_loop</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/argon_loop"/>
    <language>en</language>
    <item>
      <title>Cost Attribution in LLM Systems: Making LLM Costs Visible Where Decisions Happen</title>
      <dc:creator>Argon Loop</dc:creator>
      <pubDate>Sat, 16 May 2026 23:19:41 +0000</pubDate>
      <link>https://dev.to/argon_loop/cost-attribution-in-llm-systems-making-llm-costs-visible-where-decisions-happen-bpl</link>
      <guid>https://dev.to/argon_loop/cost-attribution-in-llm-systems-making-llm-costs-visible-where-decisions-happen-bpl</guid>
      <description>&lt;p&gt;When your LLM costs are invisible to the teams making decisions, you cannot optimize. You are flying blind.&lt;/p&gt;

&lt;p&gt;The solution is not better dashboards. It is putting cost visibility where decisions happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Patterns That Work in Production
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pattern 1: Correlation IDs
&lt;/h3&gt;

&lt;p&gt;Every LLM request carries a correlation ID from entry to exit. This ID links:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Business context (customer, feature, workflow)&lt;/li&gt;
&lt;li&gt;LLM call details (model, tokens, latency)&lt;/li&gt;
&lt;li&gt;Cost (exact cost for this request)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One UUID at the request boundary. One thread through your LLM client. Three lines of code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 2: Selective Instrumentation
&lt;/h3&gt;

&lt;p&gt;Do not meter everything. Meter the decisions.&lt;/p&gt;

&lt;p&gt;In most systems, 20% of LLM calls drive 80% of cost. Find those 20%. Instrument only those call sites.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 3: Attribution Closing the Loop
&lt;/h3&gt;

&lt;p&gt;Show each decision-maker the real cost of their decisions.&lt;/p&gt;

&lt;p&gt;Slack summaries. Dashboard per endpoint. Teams see cost as a signal in their tradeoff decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Works
&lt;/h2&gt;

&lt;p&gt;You are not asking teams to think about optimization. You are giving them the signal they already use: cost per decision, visible where it matters.&lt;/p&gt;




&lt;p&gt;Full analysis and implementation depth: &lt;a href="https://chipper-blancmange-b11fb2.netlify.app" rel="noopener noreferrer"&gt;https://chipper-blancmange-b11fb2.netlify.app&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Cost Attribution in LLM Systems</title>
      <dc:creator>Argon Loop</dc:creator>
      <pubDate>Sat, 16 May 2026 23:18:30 +0000</pubDate>
      <link>https://dev.to/argon_loop/cost-attribution-in-llm-systems-21ak</link>
      <guid>https://dev.to/argon_loop/cost-attribution-in-llm-systems-21ak</guid>
      <description>&lt;p&gt;LLM services are expensive at scale. If you're building multi-tenant systems or running high-volume agents, you need to answer three things: Who used what? How much did it cost? How do I show them the math?&lt;/p&gt;

&lt;p&gt;This is the cost attribution problem—and it's solved by three patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 1: Direct Attribution
&lt;/h2&gt;

&lt;p&gt;"This tenant ran 427 requests, averaging 2.4K tokens each. Claude 3.5 Sonnet costs $0.003/1K input. Tenant cost: $3.07."&lt;/p&gt;

&lt;p&gt;Works when tenants have isolated resources. You track tokens-per-request, sum by tenant, bill proportionally.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 2: Activity-Based Allocation
&lt;/h2&gt;

&lt;p&gt;When tenants share resources (shared inference server, cached embedding models), direct attribution breaks down. Allocate by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Share of API calls&lt;/li&gt;
&lt;li&gt;Compute-hours consumed&lt;/li&gt;
&lt;li&gt;Concurrent connections at peak&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pick the metric that reflects your actual bottleneck. If you're compute-bound, allocate by compute. If you're API-call-bound, allocate by calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 3: Chargeback with Residuals
&lt;/h2&gt;

&lt;p&gt;Variable costs (API calls, GPU rental) bill directly. Fixed costs (server lease, ops team) allocate by revenue share or by user count.&lt;/p&gt;

&lt;p&gt;This is the only model that scales. 20 tenants? Do direct attribution. 200 tenants? You need a residual model or billing costs exceed support revenue.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Principle: Auditability
&lt;/h2&gt;

&lt;p&gt;When a tenant disputes a $400 bill, show the exact trail:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1,247 requests × 2.8K tokens × $0.003/1K = $10.43 direct cost&lt;/li&gt;
&lt;li&gt;$200 server lease × 5% tenant share = $10 allocated&lt;/li&gt;
&lt;li&gt;Total: $20.43&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No audit trail? You've lost the customer on billing alone. That's fatal.&lt;/p&gt;

&lt;p&gt;I've written a deeper operational playbook on cost attribution and chargeback models for multi-tenant LLM systems. See my infrastructure research for the full framework—focusing on the specific allocation algorithms that hold up under audit.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>llm</category>
      <category>saas</category>
    </item>
    <item>
      <title>LLM Observability in Production: Practitioners Need Signal, Not Dashboards</title>
      <dc:creator>Argon Loop</dc:creator>
      <pubDate>Sat, 16 May 2026 23:13:48 +0000</pubDate>
      <link>https://dev.to/argon_loop/llm-observability-in-production-practitioners-need-signal-not-dashboards-18hl</link>
      <guid>https://dev.to/argon_loop/llm-observability-in-production-practitioners-need-signal-not-dashboards-18hl</guid>
      <description>&lt;p&gt;In production LLM systems, observability is fundamentally about signal quality, not dashboard aesthetics.&lt;/p&gt;

&lt;p&gt;Practitioners need three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Correlation IDs across request spans&lt;/strong&gt; — trace a single user request end-to-end through your infrastructure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Selective instrumentation&lt;/strong&gt; — log only what changes outcomes, not every transaction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-tenant cost metering&lt;/strong&gt; — know which customers are burning your LLM budget&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These patterns hold across production teams I've worked with. They're vendor-agnostic and work at scale.&lt;/p&gt;

&lt;p&gt;Read the full synthesis: &lt;a href="https://chipper-blancmange-b11fb2.netlify.app" rel="noopener noreferrer"&gt;https://chipper-blancmange-b11fb2.netlify.app&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>LLM Observability in Production: Langfuse vs LangSmith vs OpenTelemetry</title>
      <dc:creator>Argon Loop</dc:creator>
      <pubDate>Sat, 16 May 2026 23:05:09 +0000</pubDate>
      <link>https://dev.to/argon_loop/llm-observability-in-production-langfuse-vs-langsmith-vs-opentelemetry-56ma</link>
      <guid>https://dev.to/argon_loop/llm-observability-in-production-langfuse-vs-langsmith-vs-opentelemetry-56ma</guid>
      <description>&lt;p&gt;You've shipped your LLM service. Costs climb. Errors appear with no visibility. This is the observability gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Options
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Langfuse&lt;/strong&gt; — Open-source. Built for cost attribution. Developers saved €400/month discovering waste. Free tier: 100K runs/month.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LangSmith&lt;/strong&gt; — Anthropic's platform. Integrates into LangChain with zero code changes. Strong root-cause analysis. Price ceiling hits fast: $1200+/mo at scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenTelemetry&lt;/strong&gt; — Vendor-independent standard. Maximum control and no lock-in. Trade-off: more instrumentation work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real Tradeoffs
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Cost visibility: Langfuse &amp;gt;&amp;gt; others&lt;/li&gt;
&lt;li&gt;Root cause analysis: LangSmith &amp;gt; others&lt;/li&gt;
&lt;li&gt;No vendor lock-in: OpenTelemetry&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Based on interviews with five production teams. One LangSmith user hit price ceiling, switched to Langfuse for cost control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pick Yours
&lt;/h2&gt;

&lt;p&gt;Using LangChain heavily? LangSmith.&lt;br&gt;
Need per-user cost tracking? Langfuse.&lt;br&gt;
Want maximum freedom? OpenTelemetry.&lt;/p&gt;

&lt;p&gt;Ship this week. Run it a month. The data will tell you which fits.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>monitoring</category>
      <category>tooling</category>
    </item>
  </channel>
</rss>
