Working with Large Language Models (LLMs) in production is magic. The honeymoon phase usually lasts about a month—right until you get the inevitable API bill.
If you’ve ever accidentally put an LLM generation call inside a deeply nested background loop (don't lie, we've all done it), or if you just want to prevent one heavy user from eating your organization's daily budget, then you probably know the pain.
Current LLM observability platforms are either heavy SaaS products with their own per-event pricing, or they completely lack hard budget enforcement. I wanted something free, OpenTelemetry-native, and focused on hard budget limits for cost-constrained applications in Go.
So, I built Otellix.
What is Otellix?
Otellix is a production-grade LLM observability SDK for Go built entirely on top of OpenTelemetry. It wraps the official SDKs for Anthropic, OpenAI, Gemini, and Ollama, allowing you to intercept and track costs without drastically altering your codebase.
1. Unified Tracing & Metrics
Every LLM generation call emits unified OTel spans and Prometheus metrics. Wait exactly how many input tokens, output tokens, and cached tokens did that user generate last week? Otellix handles all of that out of the box, standardizing the response payloads of 4 completely different vendors.
2. Zero-Latency Cost Engine
Different LLM providers have very different pricing models depending on context-window caching. Otellix ships with an in-memory Cost Engine synced to 2026-era USD pricing models. The cost of a request is immediately available in the returning CallResult struct, meaning you deal with pennies, not abstract tokens.
res, err := otellix.Trace(ctx, cfg, provider, params)
fmt.Printf("That request cost $%.6f\n", res.CostUSD)
3. Hard Budget Guardrails
This is the core feature. What do you do when a user hits their $5/day LLM quota?
Instead of building complex Redis limiters yourself, you can plug in Otellix's BudgetEnforcer.
It ships with three fallback mechanisms:
-
FallbackBlock: Immediately stops the LLM execution and returns anotellix.ErrBudgetExceedederror instead of hitting the vendor API. -
FallbackNotify: Allows the request to go through, but triggers an asynchronousonLimitReachedcallback webhook (perfect for Slack alerts). -
FallbackDowngrade: (My personal favorite) If a user runs out of "premium" tokens forclaude-3-opus, Otellix will automatically intercept the request and execute it via a cheaper provider/model likegemini-1.5-flashor a locally hostedollamacontainer instead.
Code Example
Here's how easy it is to wrap your existing OpenAI logic:
store := otellix.NewInMemoryBudgetStore(map[string]float64{
"user_123": 2.50, // Hard limit of $2.50
})
enforcer := otellix.NewBudgetEnforcer(store, otellix.FallbackBlock, nil)
cfg := otellix.NewConfig(
otellix.WithBudgetEnforcer(enforcer),
)
// Your existing OpenAI client logic
p, _ := openai.NewClient("sk-...")
// Wrap the request in Otellix!
res, err := otellix.Trace(ctx, cfg, p, params)
if err != nil {
log.Fatalf("LLM tracking failed: %v", err)
}
Getting Started
Otellix is fully open source (MIT) and available now! Oh, and the repository even includes a full docker-compose environment complete with Prometheus and Grafana dashboards for local testing.
Drop a star if you find it useful:
👉 GitHub: oluwajubelo1/otellix
I’d love to know how you're currently dealing with rogue LLM costs in your Go backends—let me know in the comments below!
Top comments (1)
One surprising insight we've seen is that many teams underestimate the importance of integrating telemetry data directly into their prompts for dynamic budget management. Instead of just monitoring costs, use this data to adjust prompts and optimize agent behaviors in real-time. This approach can lead to significant savings and more efficient use of LLMs in production environments. - Ali Muwwakkil (ali-muwwakkil on LinkedIn)