<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: skarL007</title>
    <description>The latest articles on DEV Community by skarL007 (@skarl007).</description>
    <link>https://dev.to/skarl007</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3862944%2Fdc6d5833-1054-4b4d-b606-bfac5e97b0c8.jpeg</url>
      <title>DEV Community: skarL007</title>
      <link>https://dev.to/skarl007</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/skarl007"/>
    <language>en</language>
    <item>
      <title>I built an open-source FinOps layer for GenAI APIs real-time USD cost per OTel span, multi-tenant isolation, zero prompt logging</title>
      <dc:creator>skarL007</dc:creator>
      <pubDate>Mon, 06 Apr 2026 01:26:46 +0000</pubDate>
      <link>https://dev.to/skarl007/i-built-an-open-source-finops-layer-for-genai-apis-real-time-usd-cost-per-otel-span-multi-tenant-26m5</link>
      <guid>https://dev.to/skarl007/i-built-an-open-source-finops-layer-for-genai-apis-real-time-usd-cost-per-otel-span-multi-tenant-26m5</guid>
      <description>&lt;h1&gt;
  
  
  I opened my LLM invoice and had no idea where the money went. So I built LumenAI.
&lt;/h1&gt;

&lt;p&gt;Last month I opened my cloud bill and saw $4,847.23 in LLM API costs. For a single week.&lt;/p&gt;

&lt;p&gt;I had no idea which tenant caused the spike. No idea which agent ran 11,000 times. No idea whether &lt;code&gt;claude-opus-4-6&lt;/code&gt; at $15/M tokens was firing in places where &lt;code&gt;claude-haiku-4-5&lt;/code&gt; at $0.25/M would have worked just as well.&lt;/p&gt;

&lt;p&gt;My logs said "LLM call completed". That was all.&lt;/p&gt;

&lt;p&gt;Standard observability tools were not built for the economics of generative AI. OTel gives you traces but no USD cost. Logging gives you messages but no token accounting. Multi-tenant SaaS means costs are pooled but contracts are per-customer.&lt;/p&gt;

&lt;p&gt;So I built LumenAI to fix this — and today I'm releasing it as my first open source project.&lt;/p&gt;




&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;Drop it into your existing OpenTelemetry setup, call &lt;code&gt;LumenAI.init()&lt;/code&gt; once, and every AI span automatically gets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tenant attribution — which customer triggered this call&lt;/li&gt;
&lt;li&gt;Real-time USD cost — down to 8 decimal places, per span&lt;/li&gt;
&lt;li&gt;Canonical event normalization — one schema across every provider&lt;/li&gt;
&lt;li&gt;Redis Streams delivery — queryable by tenant, model, and time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No prompt logging. No PII. Pure metadata. One function call.&lt;/p&gt;




&lt;h2&gt;
  
  
  How it works under the hood
&lt;/h2&gt;

&lt;p&gt;LumenAI inserts three lightweight processors into your existing OTel &lt;code&gt;TracerProvider&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TenantSpanProcessor&lt;/strong&gt; reads a Python &lt;code&gt;ContextVar&lt;/code&gt; set by your middleware and stamps every span with &lt;code&gt;LumenAI.tenant_id&lt;/code&gt;. Thread-safe, async-safe, zero allocations on the hot path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CostComputingSpanProcessor&lt;/strong&gt; fires on every &lt;code&gt;span.end()&lt;/code&gt;. It reads &lt;code&gt;gen_ai.usage.input_tokens&lt;/code&gt;, &lt;code&gt;gen_ai.usage.output_tokens&lt;/code&gt;, and &lt;code&gt;gen_ai.usage.cache_read.input_tokens&lt;/code&gt; from the span, looks up the model in the pricing table, and stores the computed USD cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;EventNormalizerProcessor&lt;/strong&gt; assembles a canonical 19-field event dict and calls your configured exporter — default: Redis Streams, keyed as &lt;code&gt;LumenAI:events:{tenant_id}&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Your app code is never touched. Your prompts are never read. The entire pipeline adds roughly 50–200 microseconds per span.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quickstart in 5 lines
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;lumen_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LumenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;lumen_ai.processors.tenant&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;lumen_tenant&lt;/span&gt;

&lt;span class="n"&gt;LumenAI&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;service_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;redis_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;redis://localhost:6379/0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;lumen_tenant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;client-abc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_your_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# span automatically tagged and costed
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  FastAPI middleware pattern
&lt;/h2&gt;

&lt;p&gt;The real power is in the middleware — set the tenant once and everything downstream inherits it automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;lumen_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LumenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;lumen_ai.processors.tenant&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;set_tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_current_tenant&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nd"&gt;@app.middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tenant_middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;call_next&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-Tenant-ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anonymous&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set_tenant_id&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;call_next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;_current_tenant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every OTel span created during the request — including library-level spans from OpenLIT — inherits the tenant. No changes to your route handlers needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real use cases
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Per-tenant billing&lt;/strong&gt; — aggregate from Redis Streams at month end and charge each customer accurately:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_tenant_cost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;days&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;redis_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;decode_responses&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;entries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xrange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LumenAI:events:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;min&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cutoff&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cost_usd&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;entries&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Budget guardrail&lt;/strong&gt; — automatically block a tenant when they hit their plan limit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;get_tenant_cost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;plan_limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;HTTPException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Monthly AI budget exhausted. Upgrade your plan.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Live cost dashboard&lt;/strong&gt; — stream events to the browser via SSE:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/dashboard/live/{tenant_id}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;live_costs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;last_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xread&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LumenAI:events:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;last_id&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;entries&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;entry_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fields&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="n"&gt;last_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;entry_id&lt;/span&gt;
                        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;StreamingResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;media_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text/event-stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Why not just use Langfuse or OpenLIT?
&lt;/h2&gt;

&lt;p&gt;Both are great tools — OpenLIT is even available as an optional bridge inside LumenAI. But neither was built with FinOps as the primary concern:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Langfuse&lt;/th&gt;
&lt;th&gt;OpenLIT&lt;/th&gt;
&lt;th&gt;LumenAI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;USD cost per span&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;Provider-dependent&lt;/td&gt;
&lt;td&gt;Built-in pricing table (~30 models)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-tenant isolation&lt;/td&gt;
&lt;td&gt;Via project&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;ContextVar-native, async-safe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Celery support&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Signal hooks, zero task changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt logging&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No — privacy by design&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom exporters&lt;/td&gt;
&lt;td&gt;OTLP only&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;ABC — Postgres, ClickHouse, Kafka&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;LumenAI is designed to sit beside your observability stack, not replace it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;lumen-ai-core
pip &lt;span class="nb"&gt;install &lt;/span&gt;lumen-ai-celery   &lt;span class="c"&gt;# optional: Celery instrumentation&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;lumen-ai-openlit  &lt;span class="c"&gt;# optional: 60+ LLM providers via OpenLIT&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What I learned building this
&lt;/h2&gt;

&lt;p&gt;This is my first open source project. A few honest things I picked up along the way:&lt;/p&gt;

&lt;p&gt;The cold start problem is the hardest part. Architecture was the easy bit — getting someone to actually install and use it is the real work.&lt;/p&gt;

&lt;p&gt;The decision to never log prompts was both technical and ethical. A lot of teams can't send production data to third-party tools. Privacy by design isn't a feature, it's a requirement.&lt;/p&gt;

&lt;p&gt;Redis Streams was the right default for the exporter. Append-only with XRANGE queryable by timestamp means you can recalculate historical costs with a new pricing table without touching any instrumentation. Someone in my Reddit thread pointed this out as a key differentiator — they'd tried Langfuse and Helicone and ran into exactly this problem when pricing changed mid-month.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;The v0.2 roadmap includes a &lt;code&gt;billing_unit&lt;/code&gt; field for workspace/user/org granularity, a Prometheus metrics exporter, and a &lt;code&gt;lumen-ai-postgres&lt;/code&gt; package with Alembic migrations.&lt;/p&gt;

&lt;p&gt;If LumenAI saves you money or debugging time, a star on GitHub means a lot when you're just getting started.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/skarL007/-lumen-ai-sdk" rel="noopener noreferrer"&gt;https://github.com/skarL007/-lumen-ai-sdk&lt;/a&gt;&lt;br&gt;
Interactive demo: &lt;a href="https://skarl007.github.io/-lumen-ai-sdk/lumen-demo.html" rel="noopener noreferrer"&gt;https://skarl007.github.io/-lumen-ai-sdk/lumen-demo.html&lt;/a&gt;&lt;br&gt;
Discord: skar1v9&lt;/p&gt;

</description>
      <category>python</category>
      <category>opentelemetry</category>
      <category>opensource</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
