I opened my LLM invoice and had no idea where the money went. So I built LumenAI.
Last month I opened my cloud bill and saw $4,847.23 in LLM API costs. For a single week.
I had no idea which tenant caused the spike. No idea which agent ran 11,000 times. No idea whether claude-opus-4-6 at $15/M tokens was firing in places where claude-haiku-4-5 at $0.25/M would have worked just as well.
My logs said "LLM call completed". That was all.
Standard observability tools were not built for the economics of generative AI. OTel gives you traces but no USD cost. Logging gives you messages but no token accounting. Multi-tenant SaaS means costs are pooled but contracts are per-customer.
So I built LumenAI to fix this — and today I'm releasing it as my first open source project.
What it does
Drop it into your existing OpenTelemetry setup, call LumenAI.init() once, and every AI span automatically gets:
- Tenant attribution — which customer triggered this call
- Real-time USD cost — down to 8 decimal places, per span
- Canonical event normalization — one schema across every provider
- Redis Streams delivery — queryable by tenant, model, and time
No prompt logging. No PII. Pure metadata. One function call.
How it works under the hood
LumenAI inserts three lightweight processors into your existing OTel TracerProvider:
TenantSpanProcessor reads a Python ContextVar set by your middleware and stamps every span with LumenAI.tenant_id. Thread-safe, async-safe, zero allocations on the hot path.
CostComputingSpanProcessor fires on every span.end(). It reads gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, and gen_ai.usage.cache_read.input_tokens from the span, looks up the model in the pricing table, and stores the computed USD cost.
EventNormalizerProcessor assembles a canonical 19-field event dict and calls your configured exporter — default: Redis Streams, keyed as LumenAI:events:{tenant_id}.
Your app code is never touched. Your prompts are never read. The entire pipeline adds roughly 50–200 microseconds per span.
Quickstart in 5 lines
from lumen_ai import LumenAI
from lumen_ai.processors.tenant import lumen_tenant
LumenAI.init(service_name="my-agent", redis_url="redis://localhost:6379/0")
with lumen_tenant("client-abc"):
response = call_your_llm(prompt)
# span automatically tagged and costed
FastAPI middleware pattern
The real power is in the middleware — set the tenant once and everything downstream inherits it automatically:
from fastapi import FastAPI, Request
from lumen_ai import LumenAI
from lumen_ai.processors.tenant import set_tenant_id, _current_tenant
app = FastAPI()
@app.middleware("http")
async def tenant_middleware(request: Request, call_next):
tenant_id = request.headers.get("X-Tenant-ID", "anonymous")
token = set_tenant_id(tenant_id)
try:
return await call_next(request)
finally:
_current_tenant.reset(token)
Every OTel span created during the request — including library-level spans from OpenLIT — inherits the tenant. No changes to your route handlers needed.
Real use cases
Per-tenant billing — aggregate from Redis Streams at month end and charge each customer accurately:
def get_tenant_cost(tenant_id: str, days: int = 30) -> float:
r = redis.Redis.from_url(redis_url, decode_responses=True)
entries = r.xrange(f"LumenAI:events:{tenant_id}", min=cutoff, max="+")
return sum(
json.loads(f["data"]).get("cost_usd", 0)
for _, f in entries
)
Budget guardrail — automatically block a tenant when they hit their plan limit:
if await get_tenant_cost(tenant_id) >= plan_limit:
raise HTTPException(429, "Monthly AI budget exhausted. Upgrade your plan.")
Live cost dashboard — stream events to the browser via SSE:
@app.get("/dashboard/live/{tenant_id}")
async def live_costs(tenant_id: str):
async def generate():
last_id = "$"
while True:
results = r.xread({f"LumenAI:events:{tenant_id}": last_id}, block=1000)
if results:
for _, entries in results:
for entry_id, fields in entries:
last_id = entry_id
yield f"data: {fields['data']}\n\n"
return StreamingResponse(generate(), media_type="text/event-stream")
Why not just use Langfuse or OpenLIT?
Both are great tools — OpenLIT is even available as an optional bridge inside LumenAI. But neither was built with FinOps as the primary concern:
| Feature | Langfuse | OpenLIT | LumenAI |
|---|---|---|---|
| USD cost per span | Partial | Provider-dependent | Built-in pricing table (~30 models) |
| Multi-tenant isolation | Via project | No | ContextVar-native, async-safe |
| Celery support | No | No | Signal hooks, zero task changes |
| Prompt logging | Yes | Yes | No — privacy by design |
| Custom exporters | OTLP only | No | ABC — Postgres, ClickHouse, Kafka |
LumenAI is designed to sit beside your observability stack, not replace it.
Install
pip install lumen-ai-core
pip install lumen-ai-celery # optional: Celery instrumentation
pip install lumen-ai-openlit # optional: 60+ LLM providers via OpenLIT
What I learned building this
This is my first open source project. A few honest things I picked up along the way:
The cold start problem is the hardest part. Architecture was the easy bit — getting someone to actually install and use it is the real work.
The decision to never log prompts was both technical and ethical. A lot of teams can't send production data to third-party tools. Privacy by design isn't a feature, it's a requirement.
Redis Streams was the right default for the exporter. Append-only with XRANGE queryable by timestamp means you can recalculate historical costs with a new pricing table without touching any instrumentation. Someone in my Reddit thread pointed this out as a key differentiator — they'd tried Langfuse and Helicone and ran into exactly this problem when pricing changed mid-month.
What's next
The v0.2 roadmap includes a billing_unit field for workspace/user/org granularity, a Prometheus metrics exporter, and a lumen-ai-postgres package with Alembic migrations.
If LumenAI saves you money or debugging time, a star on GitHub means a lot when you're just getting started.
GitHub: https://github.com/skarL007/-lumen-ai-sdk
Interactive demo: https://skarl007.github.io/-lumen-ai-sdk/lumen-demo.html
Discord: skar1v9
Top comments (0)