Adarsh Rao

Posted on Apr 27

Why I Built a Self-Hosted Alternative to Helicone (and What I Learned)

#llm #observability #selfhosted #devtools

Cover image:

I've spent 13 years as an SAP integration consultant. In the past 18 months, almost every project I touch has an LLM layer somewhere: summarizing documents, routing requests, generating structured outputs from unstructured data. The models work. The infrastructure around them doesn't.

The problem isn't the models. It's that once you deploy an LLM-powered feature, you go blind. You don't know:

Which model is actually being called in production
How much each feature is costing per day
Why a response was slow or wrong
Whether the same prompt returns consistent results

I started looking for tools to solve this. What I found was mostly cloud-hosted SaaS: Helicone, LangSmith, LangFuse. They're good products. But they have a fundamental problem for my use case: your prompts and responses flow through their servers.

For internal enterprise tools like HR queries, financial summaries, and customer data, that's a non-starter. Legal won't sign off on it. And even for personal projects, I didn't want my prompts indexed somewhere I don't control.

So I built Torrix.

What Torrix Does

Torrix is a self-hosted LLM observability platform. It runs as a single Docker container, stores everything in SQLite, and requires no external services. Your data never leaves your machine (or your server).

The core idea is simple: HTTP proxy. Instead of your app calling OpenAI directly, it calls Torrix, which forwards the request, measures latency, extracts the model and token counts, calculates cost, and saves the full prompt and response.

# Before
base_url = "https://api.openai.com/v1"

# After
base_url = "http://localhost:8088/proxy"
# Add one header: x-target-url: https://api.openai.com

That's it. No SDK wrapping, no code changes beyond the base URL. Works with any library that supports a custom endpoint: OpenAI Python/JS, LangChain, anything OpenAI-compatible.

The Dashboard

Once requests are flowing through the proxy, you get a live dashboard:

Runs list: every request with model name, token count, cost, latency, and a snippet of the prompt. You can filter by model, search by content, and click into any run for the full prompt and response.

Analytics: cost over time, token usage, p50/p95/p99 latency. The latency percentiles matter more than averages: a p99 of 8 seconds is a UX problem even if the p50 is 400ms.

Sessions: pass x-torrix-session: conversation-123 on your proxy requests and Torrix groups all turns in a conversation into a single thread view. Invaluable for debugging multi-turn flows.

Traces: pass x-torrix-trace: agent-run-456 and all LLM calls in a single agent run are shown as a timeline. You can see exactly which step took longest and what it was doing.

The Proxy Approach vs SDK Wrapping

Most observability tools use SDK wrapping: they monkey-patch openai.chat.completions.create() to intercept calls. This works but has downsides:

Language-specific: separate wrappers for Python, JavaScript, Go, etc.
Breaks if the SDK updates its internal structure
Doesn't work for libraries that call the API directly

The proxy approach is language-agnostic. It works for any HTTP client. The tradeoff is that streaming responses require buffering to measure tokens, which adds a small overhead, but in practice it's under 5ms.

Torrix also ships a Python SDK for cases where you want direct instrumentation without changing the API endpoint (useful when you're wrapping a client library that doesn't support custom base URLs).

import torrix
from openai import OpenAI

torrix.init(api_key="trxk_...", base_url="http://localhost:8088")
client = torrix.wrap(OpenAI())

# All calls are now logged
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize this document..."}]
)

Things I Didn't Expect to Be Useful

LLM judge. I added auto-scoring almost as an afterthought: let an LLM evaluate each run on quality, correctness, token efficiency, and reasoning depth. I expected it to be a novelty. Instead it became one of the features I use most. Running a batch score after a prompt change immediately shows which version is better.

Dataset export. You can filter runs by score (👍/👎) and export as CSV. That CSV is a labelled fine-tuning dataset. Two months of runs filtered to high-quality responses is a surprisingly clean training set.

Budget alerts. Setting a daily spend limit and getting a Slack alert (or an automatic 429 block) before the billing surprise has saved me twice.

Prometheus export. /metrics returns standard Prometheus format. Plugging this into an existing Grafana stack took about 10 minutes and now LLM cost and latency live next to all my other service metrics.

What I Compared It To

Helicone: excellent product, great UI, cloud-only on the free tier. Self-hosted option exists but requires more infrastructure than I wanted for a side project.

Langfuse: very powerful, great for complex tracing, but requires Postgres and more setup. Overkill for most projects.

LangSmith: deeply integrated with LangChain, less useful if you're not using LangChain.

OpenLIT: open source, OTel-native, but requires more configuration work.

None of them are wrong choices, just different tradeoffs. Torrix targets the "I want this running in 5 minutes on a single machine with no external dependencies" use case.

Getting Started

Docker (simplest):

docker run -d \
  -p 8088:8088 \
  -v torrix-data:/data \
  torrixai/torrix:latest

Open http://localhost:8088, create an account, and grab an API key.

Docker Compose (recommended for persistence):

services:
  torrix:
    image: torrixai/torrix:latest
    ports:
      - "8088:8088"
    volumes:
      - torrix-data:/data
    restart: unless-stopped

volumes:
  torrix-data:

Then in your app:

from openai import OpenAI

client = OpenAI(
    api_key="trxk_your_torrix_api_key",
    base_url="http://localhost:8088/proxy",
    default_headers={
        "x-target-url": "https://api.openai.com",
        "x-torrix-session": "user-123",  # optional: group by conversation
    }
)

Works with Anthropic, Gemini, Ollama, OpenRouter, LiteLLM, and anything that accepts a custom base URL.

What's Next

I'm building Torrix as a product, not just an internal tool. The community edition is free with no time limit. A Pro tier ($19/month) adds unlimited run history, extended retention, and team support.

On the roadmap: MCP tool call tracing (so you can see exactly what tools your Claude/GPT-4 agent called and with what arguments), cost forecasting, and integrations for LangGraph, CrewAI, and the Vercel AI SDK.

If you're building anything LLM-powered and want visibility into what's actually happening, give it a try.

GitHub / install docs: https://github.com/torrix-ai/install

Happy to answer questions in the comments about the implementation or specific use cases.

Adarsh — SAP Integration consultant turned LLM tooling builder. Building Torrix in public.