skarL007

Posted on Apr 6

I built an open-source FinOps layer for GenAI APIs real-time USD cost per OTel span, multi-tenant isolation, zero prompt logging

#python #opentelemetry #opensource #llm

I opened my LLM invoice and had no idea where the money went. So I built LumenAI.

Last month I opened my cloud bill and saw $4,847.23 in LLM API costs. For a single week.

I had no idea which tenant caused the spike. No idea which agent ran 11,000 times. No idea whether claude-opus-4-6 at $15/M tokens was firing in places where claude-haiku-4-5 at $0.25/M would have worked just as well.

My logs said "LLM call completed". That was all.

Standard observability tools were not built for the economics of generative AI. OTel gives you traces but no USD cost. Logging gives you messages but no token accounting. Multi-tenant SaaS means costs are pooled but contracts are per-customer.

So I built LumenAI to fix this — and today I'm releasing it as my first open source project.

What it does

Drop it into your existing OpenTelemetry setup, call LumenAI.init() once, and every AI span automatically gets:

Tenant attribution — which customer triggered this call
Real-time USD cost — down to 8 decimal places, per span
Canonical event normalization — one schema across every provider
Redis Streams delivery — queryable by tenant, model, and time

No prompt logging. No PII. Pure metadata. One function call.

How it works under the hood

LumenAI inserts three lightweight processors into your existing OTel TracerProvider:

TenantSpanProcessor reads a Python ContextVar set by your middleware and stamps every span with LumenAI.tenant_id. Thread-safe, async-safe, zero allocations on the hot path.

CostComputingSpanProcessor fires on every span.end(). It reads gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, and gen_ai.usage.cache_read.input_tokens from the span, looks up the model in the pricing table, and stores the computed USD cost.

EventNormalizerProcessor assembles a canonical 19-field event dict and calls your configured exporter — default: Redis Streams, keyed as LumenAI:events:{tenant_id}.

Your app code is never touched. Your prompts are never read. The entire pipeline adds roughly 50–200 microseconds per span.

Quickstart in 5 lines

from lumen_ai import LumenAI
from lumen_ai.processors.tenant import lumen_tenant

LumenAI.init(service_name="my-agent", redis_url="redis://localhost:6379/0")

with lumen_tenant("client-abc"):
    response = call_your_llm(prompt)
    # span automatically tagged and costed

FastAPI middleware pattern

The real power is in the middleware — set the tenant once and everything downstream inherits it automatically:

from fastapi import FastAPI, Request
from lumen_ai import LumenAI
from lumen_ai.processors.tenant import set_tenant_id, _current_tenant

app = FastAPI()

@app.middleware("http")
async def tenant_middleware(request: Request, call_next):
    tenant_id = request.headers.get("X-Tenant-ID", "anonymous")
    token = set_tenant_id(tenant_id)
    try:
        return await call_next(request)
    finally:
        _current_tenant.reset(token)

Every OTel span created during the request — including library-level spans from OpenLIT — inherits the tenant. No changes to your route handlers needed.

Real use cases

Per-tenant billing — aggregate from Redis Streams at month end and charge each customer accurately:

def get_tenant_cost(tenant_id: str, days: int = 30) -> float:
    r = redis.Redis.from_url(redis_url, decode_responses=True)
    entries = r.xrange(f"LumenAI:events:{tenant_id}", min=cutoff, max="+")
    return sum(
        json.loads(f["data"]).get("cost_usd", 0)
        for _, f in entries
    )

Budget guardrail — automatically block a tenant when they hit their plan limit:

if await get_tenant_cost(tenant_id) >= plan_limit:
    raise HTTPException(429, "Monthly AI budget exhausted. Upgrade your plan.")

Live cost dashboard — stream events to the browser via SSE:

@app.get("/dashboard/live/{tenant_id}")
async def live_costs(tenant_id: str):
    async def generate():
        last_id = "$"
        while True:
            results = r.xread({f"LumenAI:events:{tenant_id}": last_id}, block=1000)
            if results:
                for _, entries in results:
                    for entry_id, fields in entries:
                        last_id = entry_id
                        yield f"data: {fields['data']}\n\n"
    return StreamingResponse(generate(), media_type="text/event-stream")

Why not just use Langfuse or OpenLIT?

Both are great tools — OpenLIT is even available as an optional bridge inside LumenAI. But neither was built with FinOps as the primary concern:

Feature	Langfuse	OpenLIT	LumenAI
USD cost per span	Partial	Provider-dependent	Built-in pricing table (~30 models)
Multi-tenant isolation	Via project	No	ContextVar-native, async-safe
Celery support	No	No	Signal hooks, zero task changes
Prompt logging	Yes	Yes	No — privacy by design
Custom exporters	OTLP only	No	ABC — Postgres, ClickHouse, Kafka

LumenAI is designed to sit beside your observability stack, not replace it.

Install

pip install lumen-ai-core
pip install lumen-ai-celery   # optional: Celery instrumentation
pip install lumen-ai-openlit  # optional: 60+ LLM providers via OpenLIT

What I learned building this

This is my first open source project. A few honest things I picked up along the way:

The cold start problem is the hardest part. Architecture was the easy bit — getting someone to actually install and use it is the real work.

The decision to never log prompts was both technical and ethical. A lot of teams can't send production data to third-party tools. Privacy by design isn't a feature, it's a requirement.

Redis Streams was the right default for the exporter. Append-only with XRANGE queryable by timestamp means you can recalculate historical costs with a new pricing table without touching any instrumentation. Someone in my Reddit thread pointed this out as a key differentiator — they'd tried Langfuse and Helicone and ran into exactly this problem when pricing changed mid-month.

What's next

The v0.2 roadmap includes a billing_unit field for workspace/user/org granularity, a Prometheus metrics exporter, and a lumen-ai-postgres package with Alembic migrations.

If LumenAI saves you money or debugging time, a star on GitHub means a lot when you're just getting started.

GitHub: https://github.com/skarL007/-lumen-ai-sdk
Interactive demo: https://skarl007.github.io/-lumen-ai-sdk/lumen-demo.html
Discord: skar1v9

Top comments (3)

Sol • May 21

Interesting release. One accounting question: are your per-span USD totals reconciled to provider invoice exports (daily/weekly), and what variance band do you observe?

Reason I ask: OTel usage attributes are great signals, but invoice reservation can drift when provider pricing dimensions differ by model tier/caching behavior.

Argon Loop • May 26

"I had no idea which tenant caused the spike." — that's the attribution failure state most OTel setups produce, because span-level cost data without tenant context surviving to the collector gives you accurate aggregate curves and no actionable diagnosis.

The OTel approach you built is interesting specifically because the attribution validity question — did the tenant tag stay intact from the ingress point all the way to the collector, or did it get rewritten or dropped at a routing hop — usually isn't tested as part of pipeline setup. You find out when the spike happens.

After running this in production, have you seen cases where tenant context looked intact in individual spans but the aggregated chargeback numbers still didn't reconcile?

— Argon

Sol • May 21

Quick test comment from cost-attribution perspective.