<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: greedy</title>
    <description>The latest articles on DEV Community by greedy (@whonixnetworks).</description>
    <link>https://dev.to/whonixnetworks</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3952371%2F55f33fea-6b5e-438b-a85e-6c5d9f75aeb9.png</url>
      <title>DEV Community: greedy</title>
      <link>https://dev.to/whonixnetworks</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/whonixnetworks"/>
    <language>en</language>
    <item>
      <title>How I built a multi-model Ollama comparison tool with zero dependencies</title>
      <dc:creator>greedy</dc:creator>
      <pubDate>Tue, 26 May 2026 10:31:55 +0000</pubDate>
      <link>https://dev.to/whonixnetworks/how-i-built-a-multi-model-ollama-comparison-tool-with-zero-dependencies-3g3o</link>
      <guid>https://dev.to/whonixnetworks/how-i-built-a-multi-model-ollama-comparison-tool-with-zero-dependencies-3g3o</guid>
      <description>&lt;h2&gt;
  
  
  How I built a multi-model Ollama comparison tool with zero dependencies
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The problem
&lt;/h3&gt;

&lt;p&gt;I kept opening multiple terminal tabs running &lt;code&gt;ollama run&lt;/code&gt; to compare&lt;br&gt;
how different local models handled the same prompt. Copy-pasting between&lt;br&gt;
terminals doesn't scale. I needed a proper comparison tool — but everything&lt;br&gt;
I found was either a heavy web UI or required a dozen pip packages.&lt;/p&gt;
&lt;h3&gt;
  
  
  The approach
&lt;/h3&gt;

&lt;p&gt;Build a curses TUI that talks to Ollama directly. Python standard library&lt;br&gt;
only — &lt;code&gt;urllib&lt;/code&gt; for the API, &lt;code&gt;curses&lt;/code&gt; for the interface, &lt;code&gt;json&lt;/code&gt;/&lt;code&gt;re&lt;/code&gt; for&lt;br&gt;
parsing. One file. No pip install.&lt;/p&gt;
&lt;h3&gt;
  
  
  How it works
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Side-by-side streaming.&lt;/strong&gt; The streaming viewer renders tokens from&lt;br&gt;
multiple models simultaneously in a curses grid. Each model gets a panel&lt;br&gt;
with live stats (elapsed time, tokens generated, tokens/second).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Evaluation modes beyond comparison.&lt;/strong&gt; Simple side-by-side viewing wasn't&lt;br&gt;
enough, so I added structured evaluation modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;RALPH&lt;/strong&gt; — self-review loop. Model answers, then critiques and improves&lt;br&gt;
its own answer. Uses cosine similarity via Ollama embeddings to detect&lt;br&gt;
convergence (stops when the model can't meaningfully improve).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Council&lt;/strong&gt; — multi-persona debate. Three personas (Domain Expert,&lt;br&gt;
Skeptic, Devil's Advocate) debate a question across rounds, then a&lt;br&gt;
synthesis verdict follows. Each persona can run on a different model.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tribunal&lt;/strong&gt; — adversarial cross-examination. A defender model makes&lt;br&gt;
claims, a prosecutor model challenges them. An arbiter model rules on&lt;br&gt;
each challenge.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Benchmark mode.&lt;/strong&gt; Runs 20 standardised tests against any model:&lt;br&gt;
counting, arithmetic, web search, URL fetching, shell commands, Python&lt;br&gt;
execution, file reading, RSS parsing, embeddings, and more. Each test&lt;br&gt;
checks if the model called the right tool or produced the right output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool orchestration.&lt;/strong&gt; Models can be armed with external tools — all&lt;br&gt;
opt-in, checked at startup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Web search (SearXNG or DuckDuckGo)&lt;/li&gt;
&lt;li&gt;URL fetching with summarisation&lt;/li&gt;
&lt;li&gt;Shell command execution (sandboxed)&lt;/li&gt;
&lt;li&gt;Python code execution&lt;/li&gt;
&lt;li&gt;File reading with configurable root&lt;/li&gt;
&lt;li&gt;Safe calculator (AST-based whitelist evaluation)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Architecture
&lt;/h3&gt;

&lt;p&gt;The source is modular under &lt;code&gt;src/prompter/&lt;/code&gt; with 12 modules. A bundler&lt;br&gt;
assembles them into a single &lt;code&gt;prompter.py&lt;/code&gt; for distribution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;src/prompter/
  header.py       # Shebang + module docstring
  config.py       # Environment variables, constants
  palette.py      # curses colour init + helpers
  tools.py        # Tool definitions, checks, execution
  api.py          # Ollama REST API (urllib)
  utils.py        # Text processing, regex patterns
  tui/
    utils.py      # Safe drawing wrappers
    viewer.py     # Streaming viewer widget
  modes/
    dataclasses.py # CouncilSession, BullySession
    runners.py    # run_*_mode() functions
  markdown.py     # Output file generation
  main.py         # Screen state machine + entry point
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The bundler strips relative imports, deduplicates stdlib imports, and&lt;br&gt;
concatenates in dependency order.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I learned
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;curses is deceptively complex.&lt;/strong&gt; Safe drawing wrappers are essential —&lt;br&gt;
out-of-bounds writes crash silently. Every &lt;code&gt;addstr&lt;/code&gt; call needs a bounds&lt;br&gt;
check.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Token streaming over urllib is straightforward.&lt;/strong&gt; Ollama's streaming&lt;br&gt;
endpoint sends newline-delimited JSON chunks. You just read lines and&lt;br&gt;
parse.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Context windows are a real constraint.&lt;/strong&gt; In multi-round debate modes,&lt;br&gt;
the message history grows fast. I implemented &lt;code&gt;_truncate_to_context()&lt;/code&gt;&lt;br&gt;
that estimates token count and keeps the first and last halves of content.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cosine similarity works for self-review convergence.&lt;/strong&gt; By embedding&lt;br&gt;
consecutive answers and comparing similarity, RALPH mode can detect&lt;br&gt;
when further review rounds won't improve the answer.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Try it
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;wget https://raw.githubusercontent.com/whonixnetworks/prompter/main/prompter.py
&lt;span class="nb"&gt;chmod&lt;/span&gt; +x prompter.py
python3 prompter.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Python 3.7+ and Ollama running locally. That's it.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/whonixnetworks/prompter" rel="noopener noreferrer"&gt;https://github.com/whonixnetworks/prompter&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ollama</category>
      <category>ai</category>
      <category>python</category>
      <category>cli</category>
    </item>
  </channel>
</rss>
