kol kol

Posted on May 19

Building a Developer Knowledge Base That Scaled to 2,800+ Articles: Architecture & Lessons Learned

#codcompass #ai #knowledgebase #webdev

Building a Developer Knowledge Base That Scaled to 2,800+ Articles: Architecture & Lessons Learned

When we started building our developer knowledge base, we had a simple question: how do you serve thousands of technical articles with fast load times, great SEO, and maintainable architecture?

Nine months later, the answer covers Next.js, Supabase, smart caching, and some hard-won lessons about scale. Here's what actually worked.

The Numbers That Matter

2,849 published articles across 31 categories
1,815 pages indexed by Google (still growing)
Sub-2 second page loads on all content pages
Zero server costs for content delivery (static-first architecture)

These aren't vanity metrics — they're the result of deliberate architectural choices.

The Tech Stack

Next.js 15 — Static-First, Dynamic When Needed

We use Incremental Static Regeneration (ISR) as the default. Article pages are generated at build time, then revalidated on-demand when content changes.

/articles/slug → ISR (revalidate: 3600)
/kb → SSG with pagination
/api/* → Edge functions for search

The key insight: 95% of traffic hits content pages. Those should be static. Only search, newsletter signup, and checkout need dynamic rendering.

Supabase + Prisma — The Data Layer

All article metadata lives in PostgreSQL via Supabase. The actual content is stored as Markdown files in Git, with a metadata table for:

Article slugs and categories
SEO titles and descriptions (long-tail keyword optimized)
Publication dates and canonical URLs
Simhash fingerprints for duplicate detection

Why this split? Content in Git = version control, easy editing, free hosting. Metadata in DB = fast queries, faceted search, analytics.

Vercel — The Hosting

Edge network for static assets. Serverless functions for dynamic routes. Built-in ISR support. The combination is hard to beat for content-heavy sites.

Architecture Decisions That Paid Off

1. Pagination at the Database Level

With 2,849 articles, loading all titles at once was impossible. We implemented:

12 articles per page on knowledge base index
URL-based pagination (?page=2&category=ai-llm)
Category filtering in the same query

// Supabase query pattern
const { data, count } = await supabase
  .from('articles')
  .select('*', { count: 'exact' })
  .eq('published', true)
  .eq('category', selectedCategory)
  .order('publishedAt', { ascending: false })
  .range(offset, offset + 11);

2. SEO as a First-Class Feature

Every article has:

Long-tail keyword optimized titles (e.g., "How to Optimize PostgreSQL Queries in Supabase for Large Datasets" instead of "PostgreSQL Tips")
Meta descriptions generated per-article
JSON-LD structured data for rich search results
Canonical URLs to avoid duplicate content issues

Result: 1,815 pages indexed by Google with zero paid promotion.

3. Newsletter Integration Without Complexity

A compact newsletter signup form on every article page, writing to a Supabase table. Simple, effective, growing organically.

What Didn't Work (And What We Changed)

Mistake 1: Dynamic Rendering Everything

Initially, every page was SSR. This meant every request hit the database. At 100 articles, fine. At 2,849, expensive and slow.

Fix: Moved to ISR for content pages, SSR only for search and user-specific pages.

Mistake 2: No Duplicate Detection

When we batch-generated content, duplicates slipped through — same topic, slightly different angle, indexed separately.

Fix: Implemented simhash-based deduplication in the batch pipeline. New articles are fingerprinted before publishing.

Mistake 3: Ignoring Category Imbalance

Some categories had 87 articles. Others had 5. This hurt both user experience and SEO.

Fix: Targeted content generation to bring every category to 40+ articles. Now 31 categories, all with minimum 28 articles.

The SEO Results (Without Paid Promotion)

1,815 indexed pages on Google
Sitemap auto-generated and submitted to Search Console
AI crawlers (GPTBot, PerplexityBot, ClaudeBot) all allowed via robots.txt
Long-tail keyword strategy: "How to [specific problem] in [specific technology]" titles outperform generic ones by 3-5x in click-through rate

The Cost

Here's the honest breakdown:

Hosting (Vercel): $0 (Pro tier covers our traffic)
Database (Supabase): $0 (Free tier, 500MB, plenty for metadata)
Domain: $12/year
Total: Essentially free for serving 2,849 articles

Lessons for Anyone Building Content at Scale

Static > Dynamic for content. ISR is your friend.
Invest in SEO metadata early. Retrofitting 2,000 articles is painful.
Deduplicate before you publish. Simhash is cheap and effective.
Balance your categories. Both users and search engines notice.
Keep the stack boring. Next.js + Supabase + Vercel is not sexy. It works.

What's Next

We're working on:

AI-powered article recommendations based on reading patterns
Community contributions with PR-based workflow
Multi-language support (currently en/zh)
Analytics dashboard for content performance

Built with Next.js 15, Supabase, and a lot of coffee. If you're building a content platform or knowledge base, happy to share more details — drop a comment.

DEV Community

Building a Developer Knowledge Base That Scaled to 2,800+ Articles: Architecture & Lessons Learned

Building a Developer Knowledge Base That Scaled to 2,800+ Articles: Architecture & Lessons Learned

The Numbers That Matter

The Tech Stack

Next.js 15 — Static-First, Dynamic When Needed

Supabase + Prisma — The Data Layer

Vercel — The Hosting

Architecture Decisions That Paid Off

1. Pagination at the Database Level

2. SEO as a First-Class Feature

3. Newsletter Integration Without Complexity

What Didn't Work (And What We Changed)

Mistake 1: Dynamic Rendering Everything

Mistake 2: No Duplicate Detection

Mistake 3: Ignoring Category Imbalance

The SEO Results (Without Paid Promotion)

The Cost

Lessons for Anyone Building Content at Scale

What's Next

Top comments (0)