<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Cheryl A</title>
    <description>The latest articles on DEV Community by Cheryl A (@ctechdiva).</description>
    <link>https://dev.to/ctechdiva</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F464182%2Fea126cc5-46ad-41ba-88b9-57408609d8ab.jpg</url>
      <title>DEV Community: Cheryl A</title>
      <link>https://dev.to/ctechdiva</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ctechdiva"/>
    <language>en</language>
    <item>
      <title>Side-by-Side Testing Platform for AI</title>
      <dc:creator>Cheryl A</dc:creator>
      <pubDate>Sun, 25 Jan 2026 20:42:11 +0000</pubDate>
      <link>https://dev.to/ctechdiva/side-by-side-testing-platform-for-ai-3b2h</link>
      <guid>https://dev.to/ctechdiva/side-by-side-testing-platform-for-ai-3b2h</guid>
      <description>&lt;p&gt;Like most developers right now, I'm constantly switching between ChatGPT, Claude, and various other models trying to figure out which one actually gives me the best response. Opening five different tabs, copying the same prompt over and over, trying to remember what each one said it got old fast.&lt;/p&gt;

&lt;p&gt;I wanted one place where I could throw a prompt at multiple models and see the results side by side. So I built it: &lt;a href="https://llmcode.ai" rel="noopener noreferrer"&gt;llmcode.ai&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What you can do with it
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Standard Lab&lt;/strong&gt; - Pick any combination of up to 5 models from different providers and run the same prompt across all of them. You'll see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;8 pre-configured models from Hugging Face (Llama, Qwen, Gemma, DeepSeek) that work immediately with a free API token&lt;/li&gt;
&lt;li&gt;Option to add GPT-5.2, Claude 4.5, and Gemini 3 with your own API keys&lt;/li&gt;
&lt;li&gt;21 test categories specifically designed for testing things like bias, toxicity, safety, hallucinations, and PII handling&lt;/li&gt;
&lt;li&gt;PDF upload if you need to give the models context&lt;/li&gt;
&lt;li&gt;Export to clipboard or PDF&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;MCP Testing Lab&lt;/strong&gt; - This one's newer. It demonstrates the Claude's Model Context Protocol by connecting to Brave Search in real time. You can see the difference between Claude with and without access to live data, or compare it against other models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model Availability&lt;/strong&gt; - Before you waste time debugging why something isn't working, this feature checks if models are actually online and validates whether your API keys are working correctly.&lt;/p&gt;

&lt;h2&gt;
  
  
  The privacy bit
&lt;/h2&gt;

&lt;p&gt;Everything happens in your browser. Your prompts, your API keys, your uploaded files—none of it touches my servers because there are no servers. API keys go straight to browser local storage. No registration, no cookies, no data collection. Hit "Clear Session" when you're done and it's all gone.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I built it with
&lt;/h2&gt;

&lt;p&gt;The whole thing runs on React/TypeScript and is deployed on Vercel. I used Claude for pair programming on some of the architecture decisions and GitHub Copilot for code suggestions.&lt;/p&gt;

&lt;p&gt;The PR reviews from Copilot turned out to be more useful than I expected. I generally do a line-by-line code review, which can be tedious and time-consuming. Copilot saves time by suggesting cleaner ways to structure things. When you're moving fast on features, having another set of eyes pointing out refactoring opportunities is practical.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it's free
&lt;/h2&gt;

&lt;p&gt;Hugging Face provides free API access (no credit card needed), which covers the majority of use cases. For the premium models, you bring your own keys and connect directly to OpenAI, Anthropic, or Google. I'm just sharing a method I developed to make the comparison easier.&lt;/p&gt;

&lt;p&gt;Check it out: &lt;a href="https://llmcode.ai" rel="noopener noreferrer"&gt;llmcode.ai&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  LLM #AI #Models
&lt;/h1&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>productivity</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
