DEV Community

GasPriceCheck
GasPriceCheck

Posted on

Templating got me to 33,620 pages. Indexing them was the hard part.

I had a fantasy.

The fantasy was that programmatic SEO would let me skip the part where you write 200 individual articles and trick Google into ranking them. I had a structured dataset (US ZIP codes, cities, states) and a templated page generator. So I shipped 33,620 pages overnight. Then I sat back and waited for the long-tail traffic to roll in.

That's not what happened.

A few months later, I have a more nuanced view. Here are the five lessons I would have paid actual money for at the start.

The numbers, so we're working from data

Out of 33,620 templated pages on my site:

  • ~6,200 are indexed
  • ~18,000 are in "Crawled, currently not indexed" (CCNI)
  • ~8,500 are "Discovered, currently not indexed"
  • ~700 are flagged as soft 404
  • The rest are recently submitted and still in queue

That's an indexation rate of about 18%. Sounds low. It kind of is. But here's the catch: the 6,200 indexed pages drive the vast majority of organic traffic. The CCNI pile is mostly low-traffic ZIPs in low-population areas. Google has decided those aren't worth indexing, and honestly they're probably right.

If you came into programmatic SEO expecting "ship N pages, get N pages of traffic," recalibrate. The real math is closer to "ship N pages, get 15-25% indexed, and the indexed pile drives 95% of your traffic."

Lesson 1: Templating gives you the structure, not the content

This is the biggest one and the one most programmatic SEO tutorials skip.

My page generator spit out pages with this rough structure:

  1. Hero (city, state, ZIP)
  2. Search box
  3. List of nearby gas stations
  4. State average price comparison
  5. "How to save on gas in {city}" section
  6. Footer

Sounds reasonable. The problem: when I put two pages side by side (one for ZIPs in California, one for ZIPs in Maine), they were 95% identical text. The hero changed. The station list was different. Everything else was the same paragraph with one word swapped.

Google's templated-content detection is real and it's specifically looking for this. There's a metric (referenced in their public ranking patents as "boilerplate ratio") that measures what fraction of a page is shared with other pages on the same site. High boilerplate ratio plus thin unique content equals a soft 404 verdict.

The fix was to make the templated content actually vary. I built a state-context module with hand-written paragraphs for the 15 highest-traffic states. Each is 80 to 120 words covering the state's gas tax, refinery capacity, regulatory regime, and seasonal pricing patterns. The other 35 states got a parameterized template with state-specific variables (avg price, neighboring states, gas tax rate). After this shipped, indexation rates climbed about 15% over six weeks.

The 80/20 here: hand-writing the top 15 of 50 states covered about 75% of my traffic. The other 35 got the templated treatment, which is fine because they're not driving meaningful traffic anyway.

If I'd known this at the start, I would have spent the first weekend writing 15 paragraphs by hand and the second weekend generating the rest. Instead I spent four weekends generating everything and then went back to hand-write the top tier. Same end state, slower path.

Lesson 2: Internal URL types compete with each other

I had two URL types covering the same entity. A programmatic city page (/houston-tx) and a blog post about Houston gas prices (/blog/cheapest-gas-houston). Different formats, different content, both about Houston gas.

Google saw these as competitors. It picked one to index (the blog post) and put the other in CCNI. Then it picked the other for some sister cities (city page won, blog post in CCNI). The pattern was inconsistent and frustrating.

My first instinct was wrong. I tried to "fix" this by submitting the CCNI page for re-validation, hoping Google would just index both. That's not how this works. Google has a per-site indexation budget and a canonical-decision system that picks one URL per topic per intent. When you ask it to re-evaluate, it just makes the same decision again.

The actual fix was to differentiate the intent of each URL type:

  • Blog post: opinionated guide, current events, "why are gas prices high in Houston this month"
  • City page: factual reference, station list, ZIP grid, state context, "where to find cheap gas in Houston"

Once the content categories were clearly different, Google's canonical decisions became consistent. Blog posts started ranking for query-with-intent searches ("why are gas prices high in Houston"). City pages ranked for quantity-intent searches ("cheap gas Houston"). They stopped competing for the same impressions.

If you have two URL types covering similar entities, they're probably competing whether you intended it or not. Audit your URLs and make sure each type has a clear, different job.

Lesson 3: Word count is a proxy, not a target

I spent some time obsessing over word counts. I'd seen "1,500 word minimum for ranking" quoted in SEO guides. So I padded pages with filler to hit the number.

Google did not care. The pages that broke through to indexed weren't the ones with 1,500 words. They were the ones with 800 words of actually useful, page-specific content.

Word count is a proxy for depth. If you can get to 800 words by writing things that are genuinely useful and page-specific, you're done. If you're padding to hit a target, Google can tell, and the padding doesn't help. The detection isn't a single signal, it's a combination: sentence-level perplexity, semantic distinctness from other pages on your site, contextual relevance to the query intent. Modern ranking models are sophisticated enough to recognize stuffing.

I went from a 640-word baseline to an 810-word "after" state by adding things that were genuinely useful: the per-state context, an SSR-rendered nearby ZIP grid, a top-stations summary. None of it was filler. The +170 words mattered because of what was in them, not because of the count.

Lesson 4: GSC verdicts have memory

If a page gets flagged as soft 404, fixing the page doesn't immediately clear the verdict. The recovery flow is:

  1. Fix the page
  2. Submit a re-validation request in GSC
  3. Wait 1 to 4 weeks for the re-crawl
  4. Accept that some URLs won't come back even after you fix them

GSC's verdict is sticky. It's not because Google is malicious. It's because the indexation queue is finite and pages with prior negative verdicts get deprioritized. If a page has been "soft 404" for three months and you fix it, Google doesn't drop everything to re-evaluate. It works through the queue, and your fixed page joins the back.

The implication: when planning a programmatic SEO site, get the page quality right before you submit URLs to GSC. Submitting a thin page early creates a verdict you'll fight for months. Better to wait, polish, and submit clean.

I did the opposite. I submitted everything immediately because I was excited. About 700 pages got soft-404'd, and clearing those verdicts took about 8 weeks of patient post-mortem work even after the underlying content was fixed.

Lesson 5: Branded search beats backlinks for indexation

This one surprised me.

Pages that started getting branded search impressions (users typing "gas price check Houston" into Google) got indexed faster than pages that had inbound backlinks from other domains. The branded queries didn't even have to convert. Just the impressions seemed to register.

Google's signal here seems to be: "people are looking for this specific page on this site, by name." That signal is hard to fake (it requires real users typing branded queries) so Google trusts it heavily. A backlink can be paid for or manufactured. Branded search volume cannot.

The implication: if you want to accelerate indexation on a specific page, drive direct traffic to it from anywhere. Social posts, email, bio links, even paid ads briefly. The traffic sends a "this page is real and people want it" signal that Google's ranking model picks up.

I tested this informally. I posted a link to one of my CCNI city pages on Reddit. Got about 80 visits in a day. Within two weeks the page moved from CCNI to indexed, with a top-50 ranking on its target query. I didn't change the page at all. Just sent traffic.

I'm not claiming this works every time. But the pattern was strong enough that I now bias toward "drive traffic to anything stuck in CCNI" over "build more backlinks to anything stuck in CCNI."

What I'd do differently

If I were starting over:

  1. Start with 50 hand-written pages, not 50,000 templated ones. Get those indexed and ranking before generating the long tail. The long tail benefits from established authority.
  2. Differentiate URL types from day one. Decide what each URL type is for and don't blur the line.
  3. Pre-compute everything. If you're going to template at scale, all unique-per-page content (geocoding, internal links, neighbor lists) should be resolved at build time, not runtime. Runtime fetches show up as "loading..." placeholders in the SSR HTML, and Google reads those as thin content.
  4. Check GSC weekly, not monthly. Soft 404 verdicts compound silently. By the time you have 700 of them, recovery is a multi-month project.
  5. Save the URL submissions for last. Polish first. Submit when the page is actually good. Once a verdict lands, it's sticky.

The bottom line

Programmatic SEO works. But the math isn't "ship 33,620 pages, get 33,620 pages of traffic." It's closer to "ship 33,620 pages, get 6,200 indexed, get traffic on the top 2,000 of those, and the rest are infrastructure." The work isn't generating the pages. The work is earning the right to keep them indexed.

If you've shipped a programmatic site at scale, what's your indexation rate? I'd love to hear what others are seeing. My hypothesis is that 15-25% indexation is typical for first-year programmatic sites, and the rate climbs over time as the site builds authority. But I have a sample size of one, so I'd love more data points.

Top comments (0)