Modern LLMs generate impressive results, but the most interesting part isn’t the final answer — it’s everything that happens before the answer appears.
Token probabilities, confidence shifts, reasoning traces, tool calls, divergences between models… all of this is usually hidden.
I wanted a way to see these internals clearly, locally, and without relying on cloud APIs.
That’s how LLMxRay started.
🧠 What LLMxRay does
LLMxRay is a local-first observability tool for LLMs.
It works with Ollama, LM Studio, llama.cpp, and any endpoint that streams tokens.
It gives you a real-time view of:
- Token-by-token generation with confidence heatmaps
- Reasoning traces (when the model exposes them)
- Side-by-side model comparison
- Tool/function call execution
- Latency and cost breakdowns
- Agent behavior introspection
- A built-in Tools Workshop to design and test function-calling
- flows
Everything runs locally.
No cloud, no telemetry, no accounts.
🔍 Why I built it
Working with local models, I often needed to answer questions like:
• Why did this model choose this token?
• Where did the reasoning diverge?
• Why does Q4_K_M behave differently from Q6_K?
• What exactly happened during a tool call?
• How do two models respond to the same prompt internally?
Existing UIs focus on chat experience, not introspection.
Debugging required custom scripts, logs, or guesswork.
LLMxRay tries to make this transparent.
🛠️ How it works
LLMxRay sits between you and your local model:
• It captures the token stream
• It records probabilities and reasoning (if available)
• It visualizes everything in a clean, interactive UI
• It stores traces so you can compare runs
• It supports multiple models and endpoints
You can run it with:
Or clone the repo and run it locally.
📦 Links
• GitHub: https://github.com/lognebudo/llmxray
• Demo / docs: https://lognebudo.github.io/llmxray/
💬 What’s next
I’m working on:
• better comparison tools
• an education pillar with kits for teachers and students
• improved reasoning visualization
• support for more local runtimes
If you work with local models, I’d love to hear how you debug or introspect them — and what features would help you the most.
Top comments (0)