DEV Community

Apex Stack
Apex Stack

Posted on • Originally published at Medium

I Built a 287,000-Page Website. Here's What I Learned About Programmatic SEO.

Most SEO advice boils down to the same thing: pick a keyword, write an article, wait three months, repeat. If you want 10x the traffic, you write 10x the content. That math doesn't work if you're one person.

About a year ago I started experimenting with a different approach. Instead of writing articles one by one, I built a system that generates pages programmatically — one template, one data pipeline, thousands of output pages. Each page targets a specific long-tail keyword. The effort goes into building the machine, not feeding it.

The site I built is a stock comparison engine. You type in two tickers and get a side-by-side breakdown: financials, dividends, growth metrics, the works. Simple concept, but at scale it covers every meaningful stock pair — across 12 languages.

287,000 pages. One person. No content team.

Here's what actually happened.

The Stack

Nothing exotic. Astro for static site generation — fast, SEO-friendly, handles thousands of routes without breaking. Supabase PostgreSQL for the data layer. yfinance API for pulling financial data. A local Llama 3 instance for generating the narrative sections on each page.

Total monthly cost: under $50.

The key insight was separating data from presentation. The database holds structured financial data for 8,000+ tickers. The templates define how that data gets rendered into comparison pages. The AI fills in the gaps — generating human-readable analysis that's unique to each pair.

I won't go deep on the code here, but the architecture looks like this:

Data Layer (Supabase PostgreSQL)
    ↓
ETL Pipeline (Python + yfinance)
    ↓
Content Generation (Llama 3 local)
    ↓
Static Site Generator (Astro)
    ↓
CDN (Cloudflare)
Enter fullscreen mode Exit fullscreen mode

Each layer is independent. If I want to swap Astro for Next.js, the data layer doesn't care. If I want to switch from Llama 3 to Claude, the templates don't change. This modularity ended up being really important when I needed to iterate fast.

What Worked

Long-tail coverage is insane. When your site has a page for literally every stock pair, you're catching searches nobody else bothers to target. "[Random small-cap] vs [other random small-cap]" has almost zero competition. Volume per page is tiny. Volume across 287,000 pages adds up.

Multilingual was easier than expected. Once you have the template and data pipeline, translating to 12 languages is mostly a matter of translating the template strings and regenerating. The data — numbers, tickers, percentages — stays the same. I used AI translation for the initial pass, then cleaned up the most important pages manually.

Schema markup at scale pays off. Every page gets FinancialProduct schema, FAQ schema, and BreadcrumbList markup. Programmatic means you write the schema once, it applies everywhere. This is one of those things that would be insane to do manually across 287k pages but is trivial when it's built into the template.

The build pipeline is surprisingly stable. I expected constant breakdowns with this many pages. But because everything is static and generated from structured data, there's very little that can go wrong at runtime. The site just serves HTML files. No server-side processing, no database queries on page load, nothing to crash.

What Didn't Work

Here's the thing nobody warns you about with programmatic SEO: Google doesn't want to index 287,000 pages from a brand new domain with zero backlinks.

Out of 287k pages, Google indexed about 2,500. That's a 0.9% index rate. Brutal.

The root causes were obvious in hindsight:

No domain authority. New domain, no backlinks, no brand recognition. Google had no reason to allocate crawl budget to my site when established financial sites exist.

Content similarity. While each page had unique financial data, the narrative sections followed similar patterns. Google's helpful content system flagged some pages as thin — not because they lacked information, but because the structure was too uniform across pages.

Crawl budget is real. Google physically will not crawl 287k pages on a new domain. I was watching Search Console and seeing Googlebot visit maybe 200-500 pages per day. At that rate, it would take years to crawl everything — and Google's not going to crawl pages it doesn't think are worth indexing anyway.

The Fix (What I'm Doing Now)

The solution is counterintuitive: fewer pages, not more.

1. Cut pages ruthlessly. I'm going from 287k down to 5,000-30,000 pages per language. Only keeping comparison pairs that have real search demand — validated with Search Console data and keyword research tools. If nobody searches for "[obscure penny stock] vs [other obscure penny stock]," that page doesn't need to exist.

2. Thicken remaining pages. Each surviving page needs genuine unique value beyond plugging different numbers into the same template. I'm adding sector context, historical trend analysis, dividend deep-dives, and custom AI-generated insights that are actually specific to each stock pair. The goal is that every page could stand on its own as a useful resource.

3. Build backlinks. There's no shortcut here. I started with directory submissions (boring but necessary), moved to industry-specific directories, and I'm now doing targeted outreach. The goal is DR 15+ within 6 months.

4. Optimize crawl budget. Better sitemap strategy — instead of one massive sitemap, I broke it into smaller topic-based sitemaps. Improved internal linking so Googlebot can discover important pages through the site structure, not just the sitemap. Removed low-value pages from the index with noindex tags.

The Numbers

Let me be transparent about where things stand:

Metric Current
Total pages 287,000
Pages indexed ~2,500
Index rate 0.9%
Domain Rating 0
Backlinks 0
Monthly revenue $0
Monthly cost ~$50

Not pretty. But here's why I'm still bullish on this approach:

The infrastructure works. The data pipeline works. The content generation works. The site loads fast, passes Core Web Vitals, and has proper schema markup on every page. The only thing broken is Google's trust — and that's a solvable problem with time and backlinks.

Once indexing improves, the growth should compound quickly. Each indexed page targets keywords that virtually nobody else is competing for. And with 12 languages, the addressable market is massive.

Lessons If You're Considering This

Start smaller than you think. If I could do it over, I'd launch with 5,000 pages instead of 287,000. Get those indexed, prove the model works, then scale up. Launching with hundreds of thousands of pages on a new domain is just asking Google to ignore you.

Your data source is your moat. Anyone can build a template. The hard part is finding a data source that's both comprehensive and accessible. Financial data via yfinance was a good choice because it's free, structured, and covers thousands of entities. Think about what data you can get that's hard for others to replicate at scale.

The template + AI hybrid is the sweet spot. Pure template-based pages (just plugging in data) get flagged as thin. Pure AI-generated pages are expensive and inconsistent. The hybrid — structured data rendered by templates with AI-generated narrative sections — hits the balance of unique content at scale.

Budget for backlink building from day one. I made the mistake of assuming great content would earn links naturally. It doesn't when you have zero domain authority. Build backlink acquisition into your launch plan, not as an afterthought.

Programmatic SEO is a patience game. This isn't a "launch and rank tomorrow" strategy. It's an infrastructure play. You're building a machine that compounds over time. The first 6 months will feel slow. That's normal.

What's Next

I'm documenting everything as I go — the full technical architecture, every prompt I use for content generation, the monetization roadmap, and all the mistakes so you don't repeat them.

I've packaged it all into a guide called the Programmatic SEO Blueprint. It covers niche selection, data architecture, AI content generation, the Astro/Next.js implementation, SEO infrastructure, the indexing problem and how to solve it, and monetization strategy. All code examples are MIT licensed.

If you're thinking about building something like this, I'd say go for it. Just start smaller than I did.


If you found this useful, follow me for more on building programmatic SEO sites. I'll be posting updates as the indexing situation improves.

Top comments (0)