When we set out to build AI Chat Bot Compare, the goal was straightforward: let users compare AI chatbots side-by-side with real, reproducible results. The execution turned out to be anything but simple.
This post covers the key technical decisions we made, what worked, and what we had to rethink along the way.
The Core Problem
Most AI chatbot comparisons online are opinion pieces. Someone tries ChatGPT and Claude for a few prompts, writes up their subjective take, and calls it a day. We wanted something more structured — a platform where users could see standardized benchmark results, pricing breakdowns, and feature matrices updated in near real-time.
Architecture Overview
The stack is intentionally boring:
- Frontend: Server-side rendered PHP with vanilla JS for interactive comparison tables
- Data Layer: MySQL with a caching layer for API responses
- Crawling: Custom scrapers that pull pricing pages, changelogs, and documentation daily
- Hosting: Shared hosting behind Cloudflare (yes, really — more on this below)
Why Not a SPA Framework?
SEO was a first-class requirement from day one. We needed every comparison page to be indexable, fast-loading, and crawlable without JavaScript execution. SSR with Next.js or Nuxt was an option, but plain PHP with clean HTML output gave us the fastest time-to-first-byte and zero hydration overhead.
For a content-heavy comparison site, this matters more than developer experience.
The Comparison Data Pipeline
Here is where things got interesting. Each AI chatbot vendor structures their information differently:
- OpenAI publishes pricing on a clean page but buries model capabilities across blog posts
- Anthropic has clear documentation but changes model names frequently
- Google scatters Gemini information across multiple subdomains
- Mistral, Cohere, etc. — each with their own quirks
We built a normalized schema:
CREATE TABLE chatbots (
id INT PRIMARY KEY AUTO_INCREMENT,
vendor VARCHAR(100),
model_name VARCHAR(200),
model_family VARCHAR(100),
context_window INT,
input_price_per_1m DECIMAL(10,4),
output_price_per_1m DECIMAL(10,4),
supports_vision BOOLEAN,
supports_function_calling BOOLEAN,
max_output_tokens INT,
last_updated TIMESTAMP
);
A daily cron job runs scrapers that populate this table. When a scraper fails (which happens often — vendors love redesigning their pricing pages), it flags the entry for manual review instead of silently serving stale data.
Handling Comparison Logic
The naive approach would be a giant if-else tree. Instead, we built a scoring system with weighted dimensions:
function calculateScore($chatbot, $useCase) {
$weights = $useCase->getWeights();
$score = 0;
$score += $weights[price] * normalizeCost($chatbot->input_price);
$score += $weights[context] * normalizeContext($chatbot->context_window);
$score += $weights[speed] * normalizeLatency($chatbot->avg_latency);
$score += $weights[quality] * $chatbot->benchmark_score;
return $score;
}
Users can select a use case (coding, writing, analysis, customer support) and the weights shift accordingly. A coding-focused comparison values function calling support and context window heavily; a customer support comparison prioritizes cost and latency.
Performance on Shared Hosting
People raise eyebrows when I mention shared hosting. But with Cloudflare caching aggressively and pages being largely static content regenerated every 24 hours, our TTFB sits under 200ms for cached pages. The monthly hosting cost is under 10 euros.
The key insight: comparison data does not change every second. A 24-hour cache with manual purge capability handles 99% of cases. When OpenAI drops a new model, we purge the relevant cache keys and regenerate.
Structured Data for SEO
Every comparison page outputs JSON-LD structured data. This was non-negotiable for getting rich results in search:
{
"@type": "Product",
"name": "ChatGPT Plus",
"offers": {
"@type": "Offer",
"price": "20.00",
"priceCurrency": "USD"
},
"review": {
"@type": "Review",
"reviewRating": {
"@type": "Rating",
"ratingValue": "4.5"
}
}
}
Lessons Learned
- Vendor APIs are not reliable for pricing data. Scraping the public pricing page is often more accurate than what their API returns.
- Users care about recency. Showing a "last updated" timestamp on every data point builds trust.
- Simple tech scales further than you think. We handle thousands of daily visitors with a setup that costs less than a single AI API call.
- Comparison tables need mobile-first design. Our first version was unusable on phones. Horizontal scrolling with sticky first columns solved it.
What is Next
We are working on letting users submit their own benchmark prompts and see results across models. The challenge is rate limiting and cost management — running a prompt through 8 different APIs adds up fast.
If you are building something similar, the biggest advice I can give: start with the data model. Get your normalization right, and the frontend almost builds itself.
Check out the live comparisons at aichatbotcompare.com
Top comments (0)