DEV Community

Oluwajubelo
Oluwajubelo

Posted on

Stop bleeding money on LLMs: Introducing Otellix for Go

Working with Large Language Models (LLMs) in production is magic. The honeymoon phase usually lasts about a month—right until you get the inevitable API bill.

If you’ve ever accidentally put an LLM generation call inside a deeply nested background loop (don't lie, we've all done it), or if you just want to prevent one heavy user from eating your organization's daily budget, then you probably know the pain.

Current LLM observability platforms are either heavy SaaS products with their own per-event pricing, or they completely lack hard budget enforcement. I wanted something free, OpenTelemetry-native, and focused on hard budget limits for cost-constrained applications in Go.

So, I built Otellix.

What is Otellix?

Otellix is a production-grade LLM observability SDK for Go built entirely on top of OpenTelemetry. It wraps the official SDKs for Anthropic, OpenAI, Gemini, and Ollama, allowing you to intercept and track costs without drastically altering your codebase.

1. Unified Tracing & Metrics

Every LLM generation call emits unified OTel spans and Prometheus metrics. Wait exactly how many input tokens, output tokens, and cached tokens did that user generate last week? Otellix handles all of that out of the box, standardizing the response payloads of 4 completely different vendors.

2. Zero-Latency Cost Engine

Different LLM providers have very different pricing models depending on context-window caching. Otellix ships with an in-memory Cost Engine synced to 2026-era USD pricing models. The cost of a request is immediately available in the returning CallResult struct, meaning you deal with pennies, not abstract tokens.

res, err := otellix.Trace(ctx, cfg, provider, params)
fmt.Printf("That request cost $%.6f\n", res.CostUSD)
Enter fullscreen mode Exit fullscreen mode

3. Hard Budget Guardrails

This is the core feature. What do you do when a user hits their $5/day LLM quota?
Instead of building complex Redis limiters yourself, you can plug in Otellix's BudgetEnforcer.

It ships with three fallback mechanisms:

  • FallbackBlock: Immediately stops the LLM execution and returns an otellix.ErrBudgetExceeded error instead of hitting the vendor API.
  • FallbackNotify: Allows the request to go through, but triggers an asynchronous onLimitReached callback webhook (perfect for Slack alerts).
  • FallbackDowngrade: (My personal favorite) If a user runs out of "premium" tokens for claude-3-opus, Otellix will automatically intercept the request and execute it via a cheaper provider/model like gemini-1.5-flash or a locally hosted ollama container instead.

Code Example

Here's how easy it is to wrap your existing OpenAI logic:

store := otellix.NewInMemoryBudgetStore(map[string]float64{
    "user_123": 2.50, // Hard limit of $2.50
})
enforcer := otellix.NewBudgetEnforcer(store, otellix.FallbackBlock, nil)

cfg := otellix.NewConfig(
    otellix.WithBudgetEnforcer(enforcer),
)

// Your existing OpenAI client logic
p, _ := openai.NewClient("sk-...") 

// Wrap the request in Otellix!
res, err := otellix.Trace(ctx, cfg, p, params)
if err != nil {
    log.Fatalf("LLM tracking failed: %v", err)
}
Enter fullscreen mode Exit fullscreen mode

Getting Started

Otellix is fully open source (MIT) and available now! Oh, and the repository even includes a full docker-compose environment complete with Prometheus and Grafana dashboards for local testing.

Drop a star if you find it useful:
👉 GitHub: oluwajubelo1/otellix

I’d love to know how you're currently dealing with rogue LLM costs in your Go backends—let me know in the comments below!

Top comments (1)

Collapse
 
ali_muwwakkil_a776a21aa9c profile image
Ali Muwwakkil

One surprising insight we've seen is that many teams underestimate the importance of integrating telemetry data directly into their prompts for dynamic budget management. Instead of just monitoring costs, use this data to adjust prompts and optimize agent behaviors in real-time. This approach can lead to significant savings and more efficient use of LLMs in production environments. - Ali Muwwakkil (ali-muwwakkil on LinkedIn)