Stop Guessing, Start Seeing: Multi-Model Observability with LLMxRay 🕵️‍♂️

Ivan Stankovic — Fri, 03 Apr 2026 20:55:02 +0000

Have you ever wondered why the same prompt costs more in one language than another? Or why a model feels "smarter" in English but struggles with Arabic or Chinese?

When working with LLMs, we often treat the response as a black box. We see the output, but we don't see the mechanics—the tokenization, the side-by-side comparison of different model families, or how different writing systems affect performance.

I built LLMxRay to pull back the curtain.

What is LLMxRay?

LLMxRay is an open-source observability tool designed to help developers inspect how different LLMs handle the exact same prompt in real-time. Whether you are using local models via Ollama/LM Studio or cloud-based APIs, LLMxRay gives you a "side-by-side" X-ray view of your prompt's journey.

Why use it?

Multi-Model Comparison: Run one prompt against multiple models simultaneously. See how Llama 3 compares to Mistral or GPT-4o in one view.

Multilingual Deep-Dive: This was a big focus for me. The tool supports 4 languages:

English 🇺🇸
French 🇫🇷
Arabic 🇸🇦 (RTL support)
Chinese 🇨🇳

Tokenization Transparency: See exactly how your text is being chopped up into tokens. This is crucial for debugging cost, context window limits, and model "reasoning" quality across different writing systems.

Why 4 Languages?

Tokenization isn't equal. A single concept might be 1 token in English but 3 tokens in another language. By supporting Latin, RTL (Arabic), and character-based (Chinese) scripts, LLMxRay lets you see the economic and technical difference of running multilingual apps.

Try it out

The project is early-stage and open for feedback! You can connect it to your local environment or use your API keys to start comparing models immediately.

👉 Check out the repo here:
https://github.com/LogneBudo/llmxray
or website and docs here:
https://lognebudo.github.io/llmxray/

I’d love to hear from the DEV community:

Which model families do you want to see compared next?

Are there specific visualizations that would help your LLM workflow?

Drop a comment below or open an issue on GitHub! 🚀

LLMxRay: A Local Observatory for Understanding How LLMs Think

Ivan Stankovic — Tue, 17 Mar 2026 11:47:10 +0000

Modern LLMs generate impressive results, but the most interesting part isn’t the final answer — it’s everything that happens before the answer appears.
Token probabilities, confidence shifts, reasoning traces, tool calls, divergences between models… all of this is usually hidden.
I wanted a way to see these internals clearly, locally, and without relying on cloud APIs.
That’s how LLMxRay started.

🧠 What LLMxRay does
LLMxRay is a local-first observability tool for LLMs.
It works with Ollama, LM Studio, llama.cpp, and any endpoint that streams tokens.
It gives you a real-time view of:

Token-by-token generation with confidence heatmaps
Reasoning traces (when the model exposes them)
Side-by-side model comparison
Tool/function call execution
Latency and cost breakdowns
Agent behavior introspection
A built-in Tools Workshop to design and test function-calling
flows

Everything runs locally.
No cloud, no telemetry, no accounts.

🔍 Why I built it

Working with local models, I often needed to answer questions like:
• Why did this model choose this token?
• Where did the reasoning diverge?
• Why does Q4_K_M behave differently from Q6_K?
• What exactly happened during a tool call?
• How do two models respond to the same prompt internally?

Existing UIs focus on chat experience, not introspection.
Debugging required custom scripts, logs, or guesswork.
LLMxRay tries to make this transparent.

🛠️ How it works

LLMxRay sits between you and your local model:

• It captures the token stream
• It records probabilities and reasoning (if available)
• It visualizes everything in a clean, interactive UI
• It stores traces so you can compare runs
• It supports multiple models and endpoints
You can run it with:

Or clone the repo and run it locally.

📦 Links

• GitHub: https://github.com/lognebudo/llmxray
• Demo / docs: https://lognebudo.github.io/llmxray/

💬 What’s next
I’m working on:

• better comparison tools
• an education pillar with kits for teachers and students
• improved reasoning visualization
• support for more local runtimes

If you work with local models, I’d love to hear how you debug or introspect them — and what features would help you the most.

DEV Community: Ivan Stankovic

Stop Guessing, Start Seeing: Multi-Model Observability with LLMxRay 🕵️‍♂️

LLMxRay: A Local Observatory for Understanding How LLMs Think