Spent the past week running an experiment: can a programmatic SEO directory site survive Google's 2024 Helpful Content Update if every page is AI-generated?
Rather than pick one niche and bet the farm, I built three parallel sites with the same stack — different content categories, identical architecture, single content generation pipeline. They share a monorepo, deploy independently to Vercel, and refresh nightly via one GitHub Actions cron.
This post is the architecture write-up. I'll publish actual revenue/traffic numbers in a follow-up after 6 months.
The three sites
- 🤖 Top AI Tools — ~500 open-source models from HuggingFace, with Claude-generated summaries, use cases, and FAQ
- 🎮 Find Games Like — "games like X" recommendations for indie titles on Steam, with AI-curated similarity reasoning and "avoid if" caveats
- 🛠 Open Alternative To — open-source replacements for ~80 popular SaaS products, refreshed daily from GitHub stars and last-pushed timestamps
All three are static-generated, all three rebuild nightly, all three share editorial logic from a single TypeScript package.
Why three sites instead of one
Three reasons:
- Cheap insurance against niche choice failure. I don't know which niche Google will tolerate in 2026. Three uncorrelated bets > one big one.
- Shared ETL infrastructure. The cost of running site #2 and #3 is mostly the marginal Claude API tokens (~$2/site/month). Hosting and code is amortized.
- A/B testing categories. AI-tools is the most saturated PSEO niche on Earth. SaaS-alternative goes head-to-head with alternativeto.net (DA80+, 20 years authority). Indie games is the underdog with the cleanest niche fit. After 6 months, the data tells me which thesis was right.
Stack at a glance
| Layer | Tool | Why |
|---|---|---|
| Site framework | Astro 5 (SSG) | 100% static output, no runtime cost |
| Styling | Tailwind v4 | Newer engine, faster builds, smaller CSS |
| Content gen | Claude Haiku 4.5 via Anthropic SDK | Cheap, fast, sufficient for directory copy |
| Data store | Turso (libSQL) | ETL state + idempotency tracking |
| Cron | GitHub Actions matrix job | Free, version-controlled, reliable |
| Hosting | Vercel Pro | Fast SSG deploys, image optimization |
| Monorepo | pnpm + Turborepo | Workspace-aware builds, cached output |
Total monthly cost
| Item | Cost |
|---|---|
| Vercel Pro | $20 |
| Anthropic API (Haiku 4.5, daily refresh) | ~$5 |
| Turso (free tier 500MB) | $0 |
| GitHub Actions (under 2k min/mo free quota) | $0 |
| Domains (Vercel subdomains until validation) | $0 |
| Total | ~$25/month |
Repo layout
seo-farm/
├── apps/
│ ├── ai-tools/ # topaitools.vercel.app
│ ├── indie-games/ # findgameslike.vercel.app
│ ├── oss-alternatives/ # openalternativeto.vercel.app
│ └── dashboard/ # internal status page
├── packages/
│ ├── shared/ # Anthropic client, DB schema, monetization helpers
│ └── publish/ # the script that posted this article
└── .github/workflows/
└── refresh-content.yml # nightly cron
The three sites are intentionally separate Vercel projects (different roots), but share @seo-farm/shared for the Claude client, libSQL helpers, AdSense + Amazon affiliate components, and structured-data builders. Each site has its own ETL — HuggingFace API for ai-tools, Steam Web API + RAWG for indie-games, GitHub repo discovery for oss-alternatives.
Content generation pipeline
The cron runs daily at 02:00 UTC and processes one app at a time (matrix max-parallel: 1 to stay below Anthropic burst limits):
strategy:
matrix:
app: [ai-tools, indie-games, oss-alternatives]
max-parallel: 1
env:
ETL_LIMIT: "500"
GENERATE_LIMIT: "300"
Per app, the pipeline is:
- ETL stage — fetch source data (HuggingFace models / Steam games / GitHub repos), upsert into Turso
-
Detect missing content — find rows where
generated_atis null or older than 30 days - Generate with Haiku 4.5 — batch ~5 entries per call, one prompt per content type
-
Cache & dedupe — write back to Turso with new
generated_attimestamp - Trigger build — only if content changed, push commit and let Vercel rebuild
The hard part is step 3 prompt design. Generic "summarize this tool" prompts produce slop. What worked:
- One prompt per content type (summary / use cases / FAQ / pros-cons), never a single mega-prompt
- Strict format constraints ("3-5 bullet points, max 12 words each, no marketing language")
- Source-grounded context — only use info from the provided model card; refuse to fabricate benchmarks
- An "avoid if" caveat for game recs — most game directories only gush. Find Games Like prompt explicitly asks Claude to be honest about limitations
For Find Games Like, the "avoid if" produces lines like:
Celeste — avoid if you're uncomfortable with themes of anxiety and panic attacks
Hades — avoid if you dislike permadeath roguelikes or need linear story progression
That's one of the few moments AI summaries actually beat human-written game directories, which all default to marketing-speak.
Ranking strategy (or: things Google may kill anyway)
I'm not delusional about this. Google's March 2024 update specifically targeted "scaled content abuse" and de-indexed thousands of programmatic SEO sites. The bet here:
- Source-grounded content — every detail page links to canonical authoritative source (HuggingFace model card, Steam store page, GitHub repo). Reader can verify in one click.
- Real utility — directory + comparison tables that genuinely save time vs reading 30 docs pages
- Honest framing — "AI-generated, here's the source" disclosed in footer, no hiding
-
Per-page structured data —
SoftwareApplication,VideoGame,ProductJSON-LD on every detail page - Low quantity, daily freshness — 880 total pages, refreshed daily so star counts and modification dates stay current. Not 100k pages of stale garbage.
I genuinely don't know if any of this is enough. The whole point of the experiment is to find out. If two of three sites get deindexed by month 3, that itself is a useful data point.
What's wired up
- AdSense site-wide (currently in review — sites are <2 weeks old)
-
Amazon Associates with category-relevant search links per page (no fake recs, just
amzn.tosearch-by-keyword links to actual related books/peripherals) - GA4 per site with separate properties
- Newsletter via Beehiiv iframe ("Indie Discovery Weekly")
- Sitemaps + robots.txt + llms.txt generated at build time
Milestones I'm watching
| Month | Question | Threshold |
|---|---|---|
| 1 | Did Google index the pages? | Search Console impressions >1k/day = yes |
| 3 | Is organic traffic growing? | >50% organic share of GA4 sessions = healthy |
| 6 | Is monetization viable? | AdSense approved, RPM measurable, decision: delete or double down |
| 12 | Is it sellable? | 3 consecutive months of profit > 0 = list on Empire Flippers / similar |
Open questions for readers
I'd genuinely value feedback on:
- PSEO survivors of March 2024 — what categories are still ranking?
- Schema markup — am I missing anything obvious for directory sites?
- AI content disclosure — how transparent is too transparent? Does the footer "AI-generated" disclosure help or hurt?
- Niche durability — which of the three do you think survives 12 months? Place your bets.
Repo isn't public yet — I might open source it after the 6-month checkpoint depending on how the experiment goes. Happy to share specific snippets in the comments if anyone's curious about a particular piece (Astro content collection layout, the Claude prompts, the structured-data helper, the Vercel Pro deploy config).
Next update in 30 days with actual numbers, regardless of how ugly they look.
Top comments (0)