DEV Community

Cover image for Side-by-Side Testing Platform for AI
Cheryl A
Cheryl A

Posted on

Side-by-Side Testing Platform for AI

Like most developers right now, I'm constantly switching between ChatGPT, Claude, and various other models trying to figure out which one actually gives me the best response. Opening five different tabs, copying the same prompt over and over, trying to remember what each one said it got old fast.

I wanted one place where I could throw a prompt at multiple models and see the results side by side. So I built it: llmcode.ai

What you can do with it

Standard Lab - Pick any combination of up to 5 models from different providers and run the same prompt across all of them. You'll see:

  • 8 pre-configured models from Hugging Face (Llama, Qwen, Gemma, DeepSeek) that work immediately with a free API token
  • Option to add GPT-5.2, Claude 4.5, and Gemini 3 with your own API keys
  • 21 test categories specifically designed for testing things like bias, toxicity, safety, hallucinations, and PII handling
  • PDF upload if you need to give the models context
  • Export to clipboard or PDF

MCP Testing Lab - This one's newer. It demonstrates the Claude's Model Context Protocol by connecting to Brave Search in real time. You can see the difference between Claude with and without access to live data, or compare it against other models.

Model Availability - Before you waste time debugging why something isn't working, this feature checks if models are actually online and validates whether your API keys are working correctly.

The privacy bit

Everything happens in your browser. Your prompts, your API keys, your uploaded files—none of it touches my servers because there are no servers. API keys go straight to browser local storage. No registration, no cookies, no data collection. Hit "Clear Session" when you're done and it's all gone.

What I built it with

The whole thing runs on React/TypeScript and is deployed on Vercel. I used Claude for pair programming on some of the architecture decisions and GitHub Copilot for code suggestions.

The PR reviews from Copilot turned out to be more useful than I expected. I generally do a line-by-line code review, which can be tedious and time-consuming. Copilot saves time by suggesting cleaner ways to structure things. When you're moving fast on features, having another set of eyes pointing out refactoring opportunities is practical.

Why it's free

Hugging Face provides free API access (no credit card needed), which covers the majority of use cases. For the premium models, you bring your own keys and connect directly to OpenAI, Anthropic, or Google. I'm just sharing a method I developed to make the comparison easier.

Check it out: llmcode.ai

LLM #AI #Models

Top comments (0)