DEV Community

Adarsh Rao
Adarsh Rao

Posted on

I loaded 30 days of real LLM traces into a live demo. Here is what they reveal

If you have been building with LLMs, you have probably had one of these moments:

  1. A surprise bill at the end of the month
  2. A model silently returning garbage without an error
  3. No idea which of your services is driving the cost spike

I built Torrix to fix that. A self-hosted LLM observability platform that logs every call, calculates costs token by token, and flags anomalies automatically.

The problem with self-hosted tools: you can't easily try before you install. You need Docker, a server, credentials, 10 minutes of setup. Most people bounce.

So I built a live demo. No signup. No Docker. No installation. Just click and explore.

Here is what's in it.

The setup

The demo loads 30 days of LLM traces across 3 simulated projects:

Production API: GPT-4o and Claude Sonnet handling user requests
Data Pipeline: batch summarisation, GPT-4o-mini doing the heavy lifting
Customer Support Bot: mixed model routing, Haiku for simple queries, Sonnet for complex ones
640 runs. 5 models. Real cost and token data. All read-only.

What you'll find

๐Ÿ”ต The cost spike. On days 14 and 15, call volume tripled overnight โ€” 55 requests per day vs the normal 18. Every anomalous run is flagged with a SPIKE badge automatically. One click shows the exact prompt, model, and token count behind each outlier.

๐Ÿ”ต The expensive model hiding in plain sight. claude-3-5-sonnet handles 35% of traffic at $3.00/$15.00 per million tokens and drives the majority of spend. gpt-4o-mini handles 20% of traffic at $0.15/$0.60 โ€” 20ร— cheaper. The breakdown is instant in the Analytics tab. No exporting, no SQL needed.

๐Ÿ”ต A 5-step agent trace, every step logged. The demo includes a full pipeline: Orchestrator โ†’ Researcher โ†’ Synthesizer โ†’ Formatter โ†’ Validator. Every step timed, every prompt logged, the full reasoning chain in one view.

๐Ÿ”ต Eval results on 3 test datasets.

Capital Cities Quiz: 70% pass rate
Customer FAQ: 87.5% pass rate
Email Classification: 75% pass rate
Exact failing rows, expected vs actual, side by side.

๐Ÿ”ต Live SQL against the trace data. Run any SELECT against the underlying SQLite:

SELECT model, COUNT(*) AS runs, SUM(cost_usd) AS total_cost
FROM runs GROUP BY model ORDER BY total_cost DESC
Enter fullscreen mode Exit fullscreen mode

Export to CSV. Schema browser built in.

How demo mode works (for the curious)

TORRIX_DEMO=true env flag. On startup, a seeded SQLite database is copied into place. All write endpoints return 403 โ€” nothing can be changed. Deployed on Fly.io, resets on every deploy so it never drifts from the seed data.

Try it

โ†’ Live Demo

Self-host your own instance: single Docker container, zero external dependencies:

docker run -d -p 8088:8088 -v torrix_data:/data torrixai/torrix:latest
Enter fullscreen mode Exit fullscreen mode

โ†’ Websiteยท
โ†’ Github

Top comments (0)