DEV Community

Cover image for How I rebuilt Astro slug pages after AdSense flagged three sites for scaled content abuse
MORINAGA
MORINAGA

Posted on

How I rebuilt Astro slug pages after AdSense flagged three sites for scaled content abuse

Three weeks ago I described how AdSense rejected my three directory sites because they were deployed on *.vercel.app subdomains. The fix was straightforward: add custom domains, redirect, resubmit. I did that. The domains are live. On May 11th I got the results of the resubmission.

The domain filter cleared. The content filter didn't.

AdSense cited "scaled content abuse / low value content." That's the harder rejection to fight because it requires actually changing how the pages work, not just flipping a setting.

This article documents exactly what I changed and why I think it addresses the filter — with the caveat that I haven't received a re-review decision yet, so this is a hypothesis being tested rather than a confirmed fix.

What "scaled content abuse" actually means

Google's March 2024 spam update formalized what the documentation calls "sites that have been produced at scale (whether through automation or otherwise) primarily to boost search ranking." The key phrase is "primarily to boost search ranking" — scale itself is not the policy target.

The operative question from a reviewer's perspective is: does each page offer something a user couldn't get from any other page on the site, or are they all structural clones with a different noun substituted in?

My slug pages were, if I'm honest, mostly structural clones with different nouns.

For aiappdex.com, every AI model page opened with the same paragraph structure regardless of whether the model was a 70B LLM, a 22M embedding model, or a 1.5B audio classifier. The "How we look at this model" section used identical text with the model name swapped in. The data columns changed; the editorial voice didn't.

For findindiegame.com, the situation was worse. The "About" section on every game page was a verbatim copy of the Steam short_description field. That's not curated content — it's scraped content with a heading on it. I knew this was a risk when I shipped it; I shipped it anyway because writing original prose for hundreds of games wasn't in the initial build plan. That tradeoff came due.

Why template text fails the uniqueness test

"Make content unique" sounds obvious and is harder to act on than it appears.

Template text fails not because it says anything false — a templated intro can be accurate. It fails because it's interchangeable. If you can swap one page's intro onto a different page and it reads equally well, the text is not specific to that page. It's formatting around data, not writing about data.

Google can detect this structurally: if two pages share large n-gram overlap in sentence structure while differing only in named entities, the pattern is a signal even when no individual sentence is duplicated. The more pages in the site, the stronger the statistical evidence.

The fix isn't more words. A longer templated paragraph is still a template. The fix is making each page's prose depend on database fields that actually vary between entries in ways that change the editorial meaning of what's being said.

The AI tools rewrite: pipeline type + download tier

The aiappdex.com slug page now does three things it didn't before.

First, the "When does this fit?" section branches by pipeline type. The pipeline tag from HuggingFace that was previously used only for filtering now drives what gets written. An LLM page gets different advice than an embedding page than a vision model page:

const isLLM = /text-generation|conversational|chat/i.test(model.pipeline_tag ?? "");
const isEmbedding = /sentence-similarity|feature-extraction/i.test(model.pipeline_tag ?? "");
const isVision = /image|vision|object-detection|depth-estimation/i.test(model.pipeline_tag ?? "");
const isAudio = /audio|speech|whisper|tts/i.test(model.pipeline_tag ?? "");
Enter fullscreen mode Exit fullscreen mode

Each branch generates structurally different decision paths — not different adjectives for the same advice, but different advice altogether. An LLM page talks about quantization and GGUF build tradeoffs. An embedding page talks about dimensionality and multilingual training data. A vision page talks about ONNX export and edge device constraints. An audio page talks about VAD front-ends and punctuation post-processing. These are different topics, not different words for the same topic.

Second, the "How we look at this" section branches by download tier. Ten million-plus downloads gets a paragraph about community knowledge depth and StackOverflow coverage. One hundred thousand to ten million gets a paragraph about tutorial availability vs. source-reading requirements. Under one thousand gets a paragraph about treating the model as research-grade rather than production-ready. The content is editorially different because the context is editorially different.

Third, there's a new "Real-world usage signals" section that reports the likes-to-downloads engagement ratio, the tag count as a proxy for model versatility, and the author's organizational scale from HuggingFace metadata. None of those values are the same across models. The section can't be templated because its inputs are never constant.

The indie games rewrite: audienceWidth × priceTier

The findindiegame.com rewrite had an extra constraint: I had to remove the Steam description entirely, not just add around it.

Keeping a verbatim Steam description and surrounding it with original text doesn't solve the scraped content problem — you still have a verbatim block from a third party sitting in the middle of the page. I dropped the entire "About" section. Everything that now appears is derived from our own stored data: the good_for list, the avoid_if list, the similar_games connections, and the ETL-generated metadata. The ETL pipeline that ingests Steam and itch.io data collects more fields than I was previously using in the page templates.

The "How we look at this" section now generates from a 4×6 matrix: four audience width categories derived from how many good_for entries exist, crossed with six price tier categories derived from the stored price number:

const audienceWidth =
  game.good_for.length >= 4 ? "broad"
    : game.good_for.length >= 2 ? "moderate"
    : game.good_for.length >= 1 ? "niche"
    : "unspecified";

const priceTier =
  priceNum === null ? "unknown"
    : priceNum === 0 ? "free"
    : priceNum < 10 ? "budget"
    : priceNum < 25 ? "mid"
    : priceNum < 60 ? "premium-indie"
    : "aaa-priced";
Enter fullscreen mode Exit fullscreen mode

Each cell in the matrix references game.name and primaryGenre in the middle of the text, so no two pages share the same sentence even when they hit the same cell. Combined with the howWeLookText concatenation, the paragraph differs by construction.

A new "Why pick this game" section fires conditionally based on five data points: audience breadth, price tier, whether the developer self-publishes (developer === publisher), similar_games count, and release year. The section is omitted entirely when none of the conditions are true — which avoids generating filler for entries with sparse data.

What I'd do differently from the start

Three things.

Never paste third-party content verbatim. The Steam description shortcut was a code smell I shipped anyway. The correct ETL behavior was to send each description through the shared Claude Haiku client to generate an original summary in our voice, store that, and never touch the source text on the page. I had the infrastructure. I didn't use it for the initial batch because token cost at launch volume felt high. The actual cost of not doing it was three months of AdSense delays and two weeks of emergency refactoring.

Use every stored field in the prose, not just in the UI. I stored pipeline tags, download counts, license information, and price data from day one. What I didn't do was connect those fields to the editorial text. The template generation was separate from the data model and stayed separate longer than it should have. Every field you store is a dimension on which pages can differ; fields you store but ignore in prose are missed uniqueness opportunities.

Budget content uniqueness into launch scope, not post-launch polish. I treated this work as optional cleanup. It was actually blocking monetization. The static SSG architecture means complex derivation logic at build time costs nothing at request time — Astro component scripts run once per page during the build, not on each visitor request. There's no runtime performance reason to keep the derivation logic simple.

What I'm not sure about

Whether this is enough.

The rewrite makes each page's text depend on database values that differ between entries. That's the correct direction. But I don't know if the ratio of dynamic prose to shared structure now clears Google's threshold, or still falls short.

The page structure — H2 headings, related games sidebar, AdSlot placement, Newsletter component — is still shared across all pages. That's inherent to any maintained directory and not, I think, the target of the scaled content policy. But I'm not certain.

I'm also not publishing interim traffic numbers. I'll have honest data to report in 30 days. What I have now is one resubmission pending review, IndexNow pushing the updated pages to Bing immediately, and the sitemap pipeline surfacing the changes to Google's crawler on the next crawl cycle.

FAQ

Does removing the Steam description hurt ranking for game-specific queries?

Probably not. The game name, genre tags, good_for caveats, and structured data are still present and specific. Steam descriptions are marketing copy written by the developer — any ranking signal came from the surrounding structured data, not the verbatim description text.

Why not just generate unique summaries with Claude for every entry?

That's the right long-term answer and it's what the Claude Haiku ETL client is for. The constraint was running it over several hundred existing entries at once. With system-prompt caching and batched requests the unit cost is low, but total volume at launch wasn't trivial. I'm backfilling existing entries in priority order by page quality score and routing all new entries through generation from day one.

Why were all three sites reviewed together?

AdSense account structure. When you add multiple sites to one account for review, they're evaluated as a bundle. If any site fails, the whole application is rejected. There's no UI to remove a single site from an in-progress application — you can only remove sites after the account is approved. So all three had to pass before any could proceed.

What's the monetization fallback if AdSense still rejects?

Amazon Associates, RunPod referrals, Vast.ai referrals, and PartnerStack SaaS affiliates are already running. They were running during the AdSense attempt and will keep running if it never succeeds. The affiliate revenue is currently the actual revenue layer; AdSense was additive, not foundational.


Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.

Top comments (0)