DEV Community

Ivan Stankovic
Ivan Stankovic

Posted on

Stop Guessing, Start Seeing: Multi-Model Observability with LLMxRay πŸ•΅οΈβ€β™‚οΈ

Have you ever wondered why the same prompt costs more in one language than another? Or why a model feels "smarter" in English but struggles with Arabic or Chinese?

When working with LLMs, we often treat the response as a black box. We see the output, but we don't see the mechanicsβ€”the tokenization, the side-by-side comparison of different model families, or how different writing systems affect performance.

I built LLMxRay to pull back the curtain.

What is LLMxRay?

LLMxRay is an open-source observability tool designed to help developers inspect how different LLMs handle the exact same prompt in real-time. Whether you are using local models via Ollama/LM Studio or cloud-based APIs, LLMxRay gives you a "side-by-side" X-ray view of your prompt's journey.

Why use it?

Multi-Model Comparison: Run one prompt against multiple models simultaneously. See how Llama 3 compares to Mistral or GPT-4o in one view.

Multilingual Deep-Dive: This was a big focus for me. The tool supports 4 languages:

  • English πŸ‡ΊπŸ‡Έ
  • French πŸ‡«πŸ‡·
  • Arabic πŸ‡ΈπŸ‡¦ (RTL support)
  • Chinese πŸ‡¨πŸ‡³

Tokenization Transparency: See exactly how your text is being chopped up into tokens. This is crucial for debugging cost, context window limits, and model "reasoning" quality across different writing systems.

Why 4 Languages?

Tokenization isn't equal. A single concept might be 1 token in English but 3 tokens in another language. By supporting Latin, RTL (Arabic), and character-based (Chinese) scripts, LLMxRay lets you see the economic and technical difference of running multilingual apps.

Try it out

The project is early-stage and open for feedback! You can connect it to your local environment or use your API keys to start comparing models immediately.

πŸ‘‰ Check out the repo here:
https://github.com/LogneBudo/llmxray
or website and docs here:
https://lognebudo.github.io/llmxray/

I’d love to hear from the DEV community:

Which model families do you want to see compared next?

Are there specific visualizations that would help your LLM workflow?

Drop a comment below or open an issue on GitHub! πŸš€

Top comments (0)