SIKOUTRIS

Posted on Mar 11

Building a Real-Time AI Chatbot Comparison Engine: Architecture Decisions

#ai #webdev #php #architecture

When we set out to build AI Chat Bot Compare, the goal was straightforward: let users compare AI chatbots side-by-side with real, reproducible results. The execution turned out to be anything but simple.

This post covers the key technical decisions we made, what worked, and what we had to rethink along the way.

The Core Problem

Most AI chatbot comparisons online are opinion pieces. Someone tries ChatGPT and Claude for a few prompts, writes up their subjective take, and calls it a day. We wanted something more structured — a platform where users could see standardized benchmark results, pricing breakdowns, and feature matrices updated in near real-time.

Architecture Overview

The stack is intentionally boring:

Frontend: Server-side rendered PHP with vanilla JS for interactive comparison tables
Data Layer: MySQL with a caching layer for API responses
Crawling: Custom scrapers that pull pricing pages, changelogs, and documentation daily
Hosting: Shared hosting behind Cloudflare (yes, really — more on this below)

Why Not a SPA Framework?

SEO was a first-class requirement from day one. We needed every comparison page to be indexable, fast-loading, and crawlable without JavaScript execution. SSR with Next.js or Nuxt was an option, but plain PHP with clean HTML output gave us the fastest time-to-first-byte and zero hydration overhead.

For a content-heavy comparison site, this matters more than developer experience.

The Comparison Data Pipeline

Here is where things got interesting. Each AI chatbot vendor structures their information differently:

OpenAI publishes pricing on a clean page but buries model capabilities across blog posts
Anthropic has clear documentation but changes model names frequently
Google scatters Gemini information across multiple subdomains
Mistral, Cohere, etc. — each with their own quirks

We built a normalized schema:

CREATE TABLE chatbots (
    id INT PRIMARY KEY AUTO_INCREMENT,
    vendor VARCHAR(100),
    model_name VARCHAR(200),
    model_family VARCHAR(100),
    context_window INT,
    input_price_per_1m DECIMAL(10,4),
    output_price_per_1m DECIMAL(10,4),
    supports_vision BOOLEAN,
    supports_function_calling BOOLEAN,
    max_output_tokens INT,
    last_updated TIMESTAMP
);

A daily cron job runs scrapers that populate this table. When a scraper fails (which happens often — vendors love redesigning their pricing pages), it flags the entry for manual review instead of silently serving stale data.

Handling Comparison Logic

The naive approach would be a giant if-else tree. Instead, we built a scoring system with weighted dimensions:

function calculateScore($chatbot, $useCase) {
    $weights = $useCase->getWeights();
    $score = 0;
    $score += $weights[price] * normalizeCost($chatbot->input_price);
    $score += $weights[context] * normalizeContext($chatbot->context_window);
    $score += $weights[speed] * normalizeLatency($chatbot->avg_latency);
    $score += $weights[quality] * $chatbot->benchmark_score;
    return $score;
}

Users can select a use case (coding, writing, analysis, customer support) and the weights shift accordingly. A coding-focused comparison values function calling support and context window heavily; a customer support comparison prioritizes cost and latency.

Performance on Shared Hosting

People raise eyebrows when I mention shared hosting. But with Cloudflare caching aggressively and pages being largely static content regenerated every 24 hours, our TTFB sits under 200ms for cached pages. The monthly hosting cost is under 10 euros.

The key insight: comparison data does not change every second. A 24-hour cache with manual purge capability handles 99% of cases. When OpenAI drops a new model, we purge the relevant cache keys and regenerate.

Structured Data for SEO

Every comparison page outputs JSON-LD structured data. This was non-negotiable for getting rich results in search:

{
  "@type": "Product",
  "name": "ChatGPT Plus",
  "offers": {
    "@type": "Offer",
    "price": "20.00",
    "priceCurrency": "USD"
  },
  "review": {
    "@type": "Review",
    "reviewRating": {
      "@type": "Rating",
      "ratingValue": "4.5"
    }
  }
}

Lessons Learned

Vendor APIs are not reliable for pricing data. Scraping the public pricing page is often more accurate than what their API returns.
Users care about recency. Showing a "last updated" timestamp on every data point builds trust.
Simple tech scales further than you think. We handle thousands of daily visitors with a setup that costs less than a single AI API call.
Comparison tables need mobile-first design. Our first version was unusable on phones. Horizontal scrolling with sticky first columns solved it.

What is Next

We are working on letting users submit their own benchmark prompts and see results across models. The challenge is rate limiting and cost management — running a prompt through 8 different APIs adds up fast.

If you are building something similar, the biggest advice I can give: start with the data model. Get your normalization right, and the frontend almost builds itself.

Check out the live comparisons at aichatbotcompare.com

DEV Community