<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ivan Stankovic</title>
    <description>The latest articles on DEV Community by Ivan Stankovic (@lognebudo).</description>
    <link>https://dev.to/lognebudo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3829335%2F909c5a1e-dcd9-4dbd-b691-b976cb0c1bbd.jpg</url>
      <title>DEV Community: Ivan Stankovic</title>
      <link>https://dev.to/lognebudo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lognebudo"/>
    <language>en</language>
    <item>
      <title>Stop Guessing, Start Seeing: Multi-Model Observability with LLMxRay 🕵️‍♂️</title>
      <dc:creator>Ivan Stankovic</dc:creator>
      <pubDate>Fri, 03 Apr 2026 20:55:02 +0000</pubDate>
      <link>https://dev.to/lognebudo/stop-guessing-start-seeing-multi-model-observability-with-llmxray-1djh</link>
      <guid>https://dev.to/lognebudo/stop-guessing-start-seeing-multi-model-observability-with-llmxray-1djh</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmu8knu7mylfhykwoe7ty.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmu8knu7mylfhykwoe7ty.png" alt=" " width="800" height="442"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Have you ever wondered why the same prompt costs more in one language than another? Or why a model feels "smarter" in English but struggles with Arabic or Chinese?&lt;/p&gt;

&lt;p&gt;When working with LLMs, we often treat the response as a black box. We see the output, but we don't see the mechanics—the tokenization, the side-by-side comparison of different model families, or how different writing systems affect performance.&lt;/p&gt;

&lt;p&gt;I built LLMxRay to pull back the curtain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is LLMxRay?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;LLMxRay is an open-source observability tool designed to help developers inspect how different LLMs handle the exact same prompt in real-time. Whether you are using local models via Ollama/LM Studio or cloud-based APIs, LLMxRay gives you a "side-by-side" X-ray view of your prompt's journey.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why use it?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Multi-Model Comparison: Run one prompt against multiple models simultaneously. See how Llama 3 compares to Mistral or GPT-4o in one view.&lt;/p&gt;

&lt;p&gt;Multilingual Deep-Dive: This was a big focus for me. The tool supports 4 languages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;English 🇺🇸&lt;/li&gt;
&lt;li&gt;French 🇫🇷&lt;/li&gt;
&lt;li&gt;Arabic 🇸🇦 (RTL support)&lt;/li&gt;
&lt;li&gt;Chinese 🇨🇳&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tokenization Transparency: See exactly how your text is being chopped up into tokens. This is crucial for debugging cost, context window limits, and model "reasoning" quality across different writing systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why 4 Languages?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Tokenization isn't equal. A single concept might be 1 token in English but 3 tokens in another language. By supporting Latin, RTL (Arabic), and character-based (Chinese) scripts, LLMxRay lets you see the economic and technical difference of running multilingual apps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try it out&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The project is early-stage and open for feedback! You can connect it to your local environment or use your API keys to start comparing models immediately.&lt;/p&gt;

&lt;p&gt;👉 Check out the repo here: &lt;br&gt;
&lt;a href="https://github.com/LogneBudo/llmxray" rel="noopener noreferrer"&gt;https://github.com/LogneBudo/llmxray&lt;/a&gt;&lt;br&gt;
or website and docs here:&lt;br&gt;
&lt;a href="https://lognebudo.github.io/llmxray/" rel="noopener noreferrer"&gt;https://lognebudo.github.io/llmxray/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I’d love to hear from the DEV community:&lt;/p&gt;

&lt;p&gt;Which model families do you want to see compared next?&lt;/p&gt;

&lt;p&gt;Are there specific visualizations that would help your LLM workflow?&lt;/p&gt;

&lt;p&gt;Drop a comment below or open an issue on GitHub! 🚀&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>webdev</category>
      <category>productivity</category>
    </item>
    <item>
      <title>LLMxRay: A Local Observatory for Understanding How LLMs Think</title>
      <dc:creator>Ivan Stankovic</dc:creator>
      <pubDate>Tue, 17 Mar 2026 11:47:10 +0000</pubDate>
      <link>https://dev.to/lognebudo/llmxray-a-local-observatory-for-understanding-how-llms-think-4pj8</link>
      <guid>https://dev.to/lognebudo/llmxray-a-local-observatory-for-understanding-how-llms-think-4pj8</guid>
      <description>&lt;p&gt;Modern LLMs generate impressive results, but the most interesting part isn’t the final answer — it’s everything that happens before the answer appears.&lt;br&gt;
Token probabilities, confidence shifts, reasoning traces, tool calls, divergences between models… all of this is usually hidden.&lt;br&gt;
I wanted a way to see these internals clearly, locally, and without relying on cloud APIs.&lt;br&gt;
That’s how LLMxRay started.&lt;/p&gt;

&lt;p&gt;🧠 What LLMxRay does&lt;br&gt;
LLMxRay is a local-first observability tool for LLMs.&lt;br&gt;
It works with Ollama, LM Studio, llama.cpp, and any endpoint that streams tokens.&lt;br&gt;
It gives you a real-time view of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Token-by-token generation with confidence heatmaps&lt;/li&gt;
&lt;li&gt;Reasoning traces (when the model exposes them)&lt;/li&gt;
&lt;li&gt;Side-by-side model comparison&lt;/li&gt;
&lt;li&gt;Tool/function call execution&lt;/li&gt;
&lt;li&gt;Latency and cost breakdowns&lt;/li&gt;
&lt;li&gt;Agent behavior introspection&lt;/li&gt;
&lt;li&gt;A built-in Tools Workshop to design and test function-calling &lt;/li&gt;
&lt;li&gt;flows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything runs locally.&lt;br&gt;
No cloud, no telemetry, no accounts.&lt;/p&gt;

&lt;p&gt;🔍 Why I built it&lt;/p&gt;

&lt;p&gt;Working with local models, I often needed to answer questions like:&lt;br&gt;
• Why did this model choose this token?&lt;br&gt;
• Where did the reasoning diverge?&lt;br&gt;
• Why does Q4_K_M behave differently from Q6_K?&lt;br&gt;
• What exactly happened during a tool call?&lt;br&gt;
• How do two models respond to the same prompt internally?&lt;/p&gt;

&lt;p&gt;Existing UIs focus on chat experience, not introspection.&lt;br&gt;
Debugging required custom scripts, logs, or guesswork.&lt;br&gt;
LLMxRay tries to make this transparent.&lt;/p&gt;

&lt;p&gt;🛠️ How it works&lt;/p&gt;

&lt;p&gt;LLMxRay sits between you and your local model:&lt;/p&gt;

&lt;p&gt;• It captures the token stream&lt;br&gt;
• It records probabilities and reasoning (if available)&lt;br&gt;
• It visualizes everything in a clean, interactive UI&lt;br&gt;
• It stores traces so you can compare runs&lt;br&gt;
• It supports multiple models and endpoints&lt;br&gt;
You can run it with:&lt;/p&gt;

&lt;p&gt;Or clone the repo and run it locally.&lt;/p&gt;

&lt;p&gt;📦 Links&lt;/p&gt;

&lt;p&gt;• GitHub: &lt;a href="https://github.com/lognebudo/llmxray" rel="noopener noreferrer"&gt;https://github.com/lognebudo/llmxray&lt;/a&gt;&lt;br&gt;
• Demo / docs: &lt;a href="https://lognebudo.github.io/llmxray/" rel="noopener noreferrer"&gt;https://lognebudo.github.io/llmxray/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;💬 What’s next&lt;br&gt;
I’m working on:&lt;/p&gt;

&lt;p&gt;• better comparison tools&lt;br&gt;
• an education pillar with kits for teachers and students&lt;br&gt;
• improved reasoning visualization&lt;br&gt;
• support for more local runtimes&lt;/p&gt;

&lt;p&gt;If you work with local models, I’d love to hear how you debug or introspect them — and what features would help you the most.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>learning</category>
      <category>news</category>
    </item>
  </channel>
</rss>
