DEV Community

Bilal Ahmed
Bilal Ahmed

Posted on

CI/CD SEO Testing: Catch SEO Regressions Before They Hit Production

You push a "safe" refactor. A meta tag moves to a component. Someone renames a route. A redirect gets dropped. Two weeks later, traffic drops 40%.

This happens constantly — not because developers don't care about SEO, but because there's no SEO gate in the deployment pipeline. We lint for code style, run unit tests, check types — but nothing catches when <title> disappears from a page template or a canonical URL starts pointing at localhost.

This article fixes that. We'll build an automated SEO test suite that runs on every PR and blocks deploys when it breaks things.


What Kind of SEO Breaks in Deploys?

Before writing tests, it helps to know what actually regresses. Here are the most common culprits:

  • Missing <title> or <meta description> — someone extracts a layout component and forgets to keep meta tags
  • Broken canonicalscanonical pointing to staging URLs, localhost, or the wrong page
  • Noindex leaking to production<meta name="robots" content="noindex"> left in from dev/staging config
  • Sitemap URL changes — routes renamed but sitemap not updated
  • Broken redirects — 301s removed or changed to 302s during refactors
  • Missing structured data — JSON-LD dropped when templates are restructured
  • robots.txt accidentally blocking everything — happened to Cloudflare's own site once

These aren't exotic edge cases. They're Tuesday afternoon bugs.


The Stack We'll Use

  • Lighthouse CI — automated Core Web Vitals + SEO score audits
  • A custom Node.js SEO assertion script — catches the specific things Lighthouse misses
  • GitHub Actions — runs everything on every PR
  • axe-core (optional) — accessibility checks that overlap with SEO signals

Step 1: Custom SEO Assertions Script

Lighthouse gives you a score, but it won't tell you "this specific page is missing a canonical tag." Write your own assertions.

Install the dependencies:

npm install --save-dev node-fetch cheerio
Enter fullscreen mode Exit fullscreen mode

Create scripts/seo-check.js:

import fetch from 'node-fetch';
import * as cheerio from 'cheerio';

const BASE_URL = process.env.CHECK_URL || 'http://localhost:3000';

// Pages to test — add all critical routes
const PAGES = [
  { path: '/', name: 'Homepage' },
  { path: '/blog', name: 'Blog index' },
  { path: '/about', name: 'About' },
  // Add your most critical pages here
];

const MAX_TITLE_LENGTH = 60;
const MAX_DESCRIPTION_LENGTH = 160;
const MIN_DESCRIPTION_LENGTH = 50;

async function checkPage(path, name) {
  const url = `${BASE_URL}${path}`;
  const errors = [];
  const warnings = [];

  let html;
  try {
    const res = await fetch(url, { redirect: 'manual' });

    // Check for accidental redirects
    if (res.status >= 300 && res.status < 400) {
      errors.push(`Unexpected redirect (${res.status}) to ${res.headers.get('location')}`);
      return { url, name, errors, warnings };
    }

    if (res.status !== 200) {
      errors.push(`Non-200 status: ${res.status}`);
      return { url, name, errors, warnings };
    }

    html = await res.text();
  } catch (err) {
    errors.push(`Fetch failed: ${err.message}`);
    return { url, name, errors, warnings };
  }

  const $ = cheerio.load(html);

  // --- Title ---
  const title = $('title').text().trim();
  if (!title) {
    errors.push('Missing <title> tag');
  } else if (title.length > MAX_TITLE_LENGTH) {
    warnings.push(`Title too long (${title.length} chars, max ${MAX_TITLE_LENGTH}): "${title}"`);
  }

  // --- Meta description ---
  const description = $('meta[name="description"]').attr('content')?.trim();
  if (!description) {
    errors.push('Missing <meta name="description">');
  } else if (description.length > MAX_DESCRIPTION_LENGTH) {
    warnings.push(`Description too long (${description.length} chars)`);
  } else if (description.length < MIN_DESCRIPTION_LENGTH) {
    warnings.push(`Description too short (${description.length} chars)`);
  }

  // --- Canonical ---
  const canonical = $('link[rel="canonical"]').attr('href');
  if (!canonical) {
    errors.push('Missing <link rel="canonical">');
  } else {
    // Check for localhost/staging URLs leaking to prod
    if (/localhost|127\.0\.0\.1|staging\.|.dev\b/i.test(canonical)) {
      errors.push(`Canonical points to non-production URL: ${canonical}`);
    }
    const expectedCanonical = `${process.env.PROD_URL || ''}${path}`;
    if (process.env.PROD_URL && canonical !== expectedCanonical) {
      warnings.push(`Canonical mismatch. Expected: ${expectedCanonical}, Got: ${canonical}`);
    }
  }

  // --- Robots meta ---
  const robots = $('meta[name="robots"]').attr('content')?.toLowerCase();
  if (robots && (robots.includes('noindex') || robots.includes('nofollow'))) {
    errors.push(`Page has robots meta: "${robots}" — is this intentional?`);
  }

  // --- Open Graph ---
  const ogTitle = $('meta[property="og:title"]').attr('content');
  const ogDescription = $('meta[property="og:description"]').attr('content');
  const ogImage = $('meta[property="og:image"]').attr('content');
  if (!ogTitle) warnings.push('Missing og:title');
  if (!ogDescription) warnings.push('Missing og:description');
  if (!ogImage) warnings.push('Missing og:image');

  // --- H1 ---
  const h1Count = $('h1').length;
  if (h1Count === 0) {
    errors.push('Missing <h1> tag');
  } else if (h1Count > 1) {
    warnings.push(`Multiple <h1> tags found (${h1Count})`);
  }

  // --- JSON-LD structured data ---
  const jsonLd = $('script[type="application/ld+json"]');
  if (jsonLd.length === 0) {
    warnings.push('No JSON-LD structured data found');
  } else {
    jsonLd.each((_, el) => {
      try {
        JSON.parse($(el).html());
      } catch {
        errors.push('Invalid JSON in JSON-LD block — will be ignored by Google');
      }
    });
  }

  // --- Images without alt ---
  const imagesWithoutAlt = [];
  $('img').each((_, el) => {
    const alt = $(el).attr('alt');
    const src = $(el).attr('src') || $(el).attr('data-src') || '[unknown src]';
    if (alt === undefined || alt === null) {
      imagesWithoutAlt.push(src);
    }
  });
  if (imagesWithoutAlt.length > 0) {
    warnings.push(`${imagesWithoutAlt.length} image(s) missing alt attribute`);
  }

  return { url, name, errors, warnings };
}

async function run() {
  console.log(`\n🔍 Running SEO checks against: ${BASE_URL}\n`);
  console.log('='.repeat(60));

  const results = await Promise.all(
    PAGES.map(({ path, name }) => checkPage(path, name))
  );

  let totalErrors = 0;
  let totalWarnings = 0;

  for (const result of results) {
    const hasErrors = result.errors.length > 0;
    const hasWarnings = result.warnings.length > 0;
    const status = hasErrors ? '' : hasWarnings ? '⚠️ ' : '';

    console.log(`\n${status} ${result.name} (${result.url})`);

    for (const err of result.errors) {
      console.log(`   ✗ [ERROR] ${err}`);
      totalErrors++;
    }
    for (const warn of result.warnings) {
      console.log(`   ⚠ [WARN]  ${warn}`);
      totalWarnings++;
    }
    if (!hasErrors && !hasWarnings) {
      console.log('   All checks passed');
    }
  }

  console.log('\n' + '='.repeat(60));
  console.log(`\nSummary: ${results.length} pages checked`);
  console.log(`  Errors:   ${totalErrors}`);
  console.log(`  Warnings: ${totalWarnings}`);

  if (totalErrors > 0) {
    console.log('\n💥 SEO check FAILED — fix errors before deploying\n');
    process.exit(1);
  } else {
    console.log('\n✅ SEO check PASSED\n');
  }
}

run();
Enter fullscreen mode Exit fullscreen mode

Add this to package.json:

{
  "scripts": {
    "seo:check": "node scripts/seo-check.js",
    "seo:check:prod": "CHECK_URL=https://yoursite.com node scripts/seo-check.js"
  }
}
Enter fullscreen mode Exit fullscreen mode

Step 2: Add Lighthouse CI

Lighthouse CI automates Core Web Vitals and gives you a full audit score diff on every PR — including a dedicated SEO category.

Install:

npm install --save-dev @lhci/cli
Enter fullscreen mode Exit fullscreen mode

Create lighthouserc.js in your project root:

export default {
  ci: {
    collect: {
      startServerCommand: 'npm run start',
      startServerReadyPattern: 'ready on',
      url: [
        'http://localhost:3000/',
        'http://localhost:3000/blog',
        'http://localhost:3000/about',
      ],
      numberOfRuns: 3,
    },
    assert: {
      assertions: {
        'categories:seo': ['error', { minScore: 0.9 }],
        'categories:performance': ['warn', { minScore: 0.8 }],
        'meta-description': 'error',
        'document-title': 'error',
        'html-has-lang': 'error',
        'canonical': 'error',
        'robots-txt': 'warn',
        'largest-contentful-paint': ['warn', { maxNumericValue: 2500 }],
        'total-blocking-time': ['warn', { maxNumericValue: 300 }],
        'cumulative-layout-shift': ['warn', { maxNumericValue: 0.1 }],
      },
    },
    upload: {
      target: 'temporary-public-storage',
    },
  },
};
Enter fullscreen mode Exit fullscreen mode

Step 3: GitHub Actions Workflow

Create .github/workflows/seo.yml:

name: SEO Tests

on:
  pull_request:
    branches: [main, develop]
  push:
    branches: [main]

jobs:
  seo-assertions:
    name: SEO Assertions
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Build app
        run: npm run build

      - name: Start app
        run: npm run start &
        env:
          NODE_ENV: production
          PORT: 3000

      - name: Wait for app to be ready
        run: npx wait-on http://localhost:3000 --timeout 60000

      - name: Run SEO assertion checks
        run: npm run seo:check
        env:
          CHECK_URL: http://localhost:3000
          PROD_URL: https://yoursite.com

  lighthouse:
    name: Lighthouse CI
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Build app
        run: npm run build

      - name: Run Lighthouse CI
        run: npx lhci autorun
        env:
          LHCI_GITHUB_APP_TOKEN: ${{ secrets.LHCI_GITHUB_APP_TOKEN }}
Enter fullscreen mode Exit fullscreen mode

Install the Lighthouse CI GitHub App to get inline PR comments with score diffs — it looks like this:

Category    Before  After   Delta
SEO         95      72      -23 ❌
Performance 88      91      +3  ✅
Enter fullscreen mode Exit fullscreen mode

Step 4: Test Your Sitemap and Robots.txt

// scripts/seo-infrastructure.js
import fetch from 'node-fetch';

const BASE_URL = process.env.CHECK_URL || 'http://localhost:3000';

async function checkRobotsTxt() {
  const res = await fetch(`${BASE_URL}/robots.txt`);
  const text = await res.text();

  if (res.status !== 200) {
    return { error: `robots.txt returned ${res.status}` };
  }

  if (/Disallow:\s*\/\s*$/.test(text) && !/Allow:/.test(text)) {
    return { error: 'robots.txt is blocking ALL crawlers (Disallow: /) — check this immediately!' };
  }

  if (/User-agent:\s*Googlebot[\s\S]*?Disallow:\s*\//i.test(text)) {
    return { warning: 'Googlebot appears to be blocked in robots.txt' };
  }

  return { ok: true };
}

async function checkSitemap() {
  const robotsRes = await fetch(`${BASE_URL}/robots.txt`);
  const robotsText = await robotsRes.text();

  const sitemapMatch = robotsText.match(/Sitemap:\s*(.+)/i);
  const sitemapUrl = sitemapMatch ? sitemapMatch[1].trim() : `${BASE_URL}/sitemap.xml`;

  const res = await fetch(sitemapUrl);
  if (res.status !== 200) {
    return { error: `Sitemap not found at ${sitemapUrl} (${res.status})` };
  }

  const text = await res.text();
  if (!text.includes('<urlset') && !text.includes('<sitemapindex')) {
    return { error: 'Sitemap response does not look like valid XML' };
  }

  const urlCount = (text.match(/<url>/g) || []).length;
  if (urlCount === 0) {
    return { warning: 'Sitemap appears to have 0 URLs' };
  }

  return { ok: true, urlCount };
}

async function run() {
  console.log('\n🗺️  Checking SEO infrastructure...\n');

  const [robots, sitemap] = await Promise.all([checkRobotsTxt(), checkSitemap()]);

  let failed = false;

  if (robots.error) { console.log(`❌ robots.txt: ${robots.error}`); failed = true; }
  else if (robots.warning) { console.log(`⚠️  robots.txt: ${robots.warning}`); }
  else { console.log('✅ robots.txt looks good'); }

  if (sitemap.error) { console.log(`❌ sitemap: ${sitemap.error}`); failed = true; }
  else if (sitemap.warning) { console.log(`⚠️  sitemap: ${sitemap.warning}`); }
  else { console.log(`✅ Sitemap valid (${sitemap.urlCount} URLs)`); }

  if (failed) process.exit(1);
}

run();
Enter fullscreen mode Exit fullscreen mode

Step 5: Redirect Testing (The Silent Killer)

A 301 becoming a 302 costs you link equity. A redirect chain added by accident costs you crawl budget.

// scripts/seo-redirects.js
import fetch from 'node-fetch';

const BASE_URL = process.env.CHECK_URL || 'http://localhost:3000';

// Format: [from, to, expectedStatus]
const REDIRECT_MAP = [
  ['/old-blog', '/blog', 301],
  ['/home', '/', 301],
  ['/products/old-slug', '/products/new-slug', 301],
];

async function checkRedirect([from, to, expectedStatus]) {
  const res = await fetch(`${BASE_URL}${from}`, { redirect: 'manual' });
  const actualStatus = res.status;
  const location = res.headers.get('location');

  if (actualStatus !== expectedStatus) {
    return { from, error: `Expected ${expectedStatus}, got ${actualStatus}` };
  }

  const normalizedLocation = location?.replace(BASE_URL, '') || '';
  if (normalizedLocation !== to) {
    return { from, error: `Expected redirect to ${to}, got ${normalizedLocation}` };
  }

  return { from, ok: true };
}

async function run() {
  console.log('\n🔀 Checking redirects...\n');

  const results = await Promise.all(REDIRECT_MAP.map(checkRedirect));
  let failed = false;

  for (const result of results) {
    if (result.error) {
      console.log(`❌ ${result.from}: ${result.error}`);
      failed = true;
    } else {
      console.log(`✅ ${result.from} → OK`);
    }
  }

  if (failed) process.exit(1);
}

run();
Enter fullscreen mode Exit fullscreen mode

Putting It All Together

{
  "scripts": {
    "seo:check": "node scripts/seo-check.js",
    "seo:infra": "node scripts/seo-infrastructure.js",
    "seo:redirects": "node scripts/seo-redirects.js",
    "seo:all": "npm run seo:infra && npm run seo:redirects && npm run seo:check",
    "seo:lighthouse": "lhci autorun"
  }
}
Enter fullscreen mode Exit fullscreen mode

Your CI workflow calls npm run seo:all before any deploy step.


What This Catches (That Your Current Pipeline Doesn't)

Regression Caught by
Missing title/description Custom assertions
Noindex leaking to prod Custom assertions
Canonical pointing to localhost Custom assertions
robots.txt blocking all crawlers Infrastructure check
Broken sitemap Infrastructure check
301 changed to 302 Redirect tests
Missing redirect after route rename Redirect tests
SEO score drop from JS change Lighthouse CI
LCP regression from new image Lighthouse CI
Missing JSON-LD Custom assertions

Next Steps

Once this is running, consider:

  • Adding your most critical pages to the assertion script — especially any page with structured data
  • Setting up a LHCI server (it's open source) for a full historical score dashboard
  • Running seo:check:prod after every production deploy as a smoke test
  • Slack/Discord notifications when SEO tests fail in CI

The goal isn't to replace an SEO audit tool — it's to stop shipping SEO regressions by accident. Once this pipeline is in place, the "my refactor broke our rankings" conversation stops happening.


What SEO regressions have you been bitten by in production? Drop them in the comments — some of the best edge cases I know came from war stories.

Top comments (0)