Adding SEO Checks to CI/CD Without Slowing Down Your Pipeline

#seo #cicd #devops #automation

We had a deploy last quarter that removed the canonical tags from about 200 pages. Nobody noticed for three weeks. By the time we caught it, Google had indexed duplicate versions of every page, and our organic traffic dipped 15%.

The fix took 10 minutes. The recovery took 6 weeks.

This is why i think SEO checks belong in CI/CD. But every time i bring this up, the reaction from other devs is the same: "We tried running Lighthouse in CI and it added 4 minutes to every build."

Yeah. Dont do that.

Why Lighthouse in CI Is the Wrong Approach

Lighthouse is a browser-based audit tool. Running it in CI means spinning up a headless Chrome instance, loading every page, running a full performance audit, accessibility checks, SEO checks, and generating reports. It is comprehensive and also incredibly slow.

For a CI pipeline that runs on every PR, you dont need comprehensive. You need fast and focused.

According to web.dev's Lighthouse documentation, a single Lighthouse run takes 15-45 seconds per page. If your checking 10 pages, thats 3-7 minutes added to your pipeline. Most teams will just skip it.

What to Actually Check in CI

Here's the thing. Most SEO disasters from code changes fall into a small number of categories:

Missing or changed title tags
Missing or changed meta descriptions
Broken canonical tags
Noindex tags accidentally added
Broken internal links
Missing alt text on images
Changed URL slugs without redirects
Removed structured data

You dont need Lighthouse for any of these. You need a simple HTML parser that checks specific elements. And that runs in seconds, not minutes.

// SEO linting for CI/CD - runs in under 10 seconds
import { JSDOM } from 'jsdom';
import * as fs from 'fs';
import * as path from 'path';

interface SEOIssue {
  file: string;
  severity: 'error' | 'warning';
  message: string;
}

function lintHTMLForSEO(filePath: string): SEOIssue[] {
  const issues: SEOIssue[] = [];
  const html = fs.readFileSync(filePath, 'utf-8');
  const dom = new JSDOM(html);
  const doc = dom.window.document;

  // Check title tag
  const title = doc.querySelector('title');
  if (!title || !title.textContent?.trim()) {
    issues.push({
      file: filePath,
      severity: 'error',
      message: 'Missing or empty title tag',
    });
  } else if (title.textContent.length > 60) {
    issues.push({
      file: filePath,
      severity: 'warning',
      message: `Title too long (${title.textContent.length} chars, max 60)`,
    });
  }

  // Check meta description
  const metaDesc = doc.querySelector('meta[name="description"]');
  if (!metaDesc || !metaDesc.getAttribute('content')?.trim()) {
    issues.push({
      file: filePath,
      severity: 'error',
      message: 'Missing meta description',
    });
  }

  // Check for accidental noindex
  const robots = doc.querySelector('meta[name="robots"]');
  if (robots?.getAttribute('content')?.includes('noindex')) {
    issues.push({
      file: filePath,
      severity: 'error',
      message: 'Page has noindex directive',
    });
  }

  // Check canonical
  const canonical = doc.querySelector('link[rel="canonical"]');
  if (!canonical) {
    issues.push({
      file: filePath,
      severity: 'error',
      message: 'Missing canonical tag',
    });
  }

  // Check images for alt text
  const images = doc.querySelectorAll('img');
  images.forEach((img, i) => {
    if (!img.getAttribute('alt')) {
      issues.push({
        file: filePath,
        severity: 'warning',
        message: `Image ${i + 1} missing alt text`,
      });
    }
  });

  return issues;
}

That runs in milliseconds per file. Even for a site with 500 pages, your looking at maybe 5-10 seconds total.

The GitHub Action Setup

Here's a minimal GitHub Action that catches the most common SEO regressions:

# .github/workflows/seo-lint.yml
name: SEO Lint
on:
  pull_request:
    paths:
      - 'src/**/*.html'
      - 'src/**/*.tsx'
      - 'src/**/*.jsx'
      - 'content/**/*.md'
      - 'public/**'

jobs:
  seo-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Build site
        run: npm run build

      - name: Run SEO linter
        run: npx ts-node scripts/seo-lint.ts ./out

      - name: Check for URL changes
        run: |
          # Compare sitemap against main branch
          git diff origin/main -- public/sitemap.xml > /tmp/sitemap-diff.txt
          if [ -s /tmp/sitemap-diff.txt ]; then
            echo "::warning::Sitemap has changed - verify redirects for removed URLs"
          fi

The key is the paths filter. This only runs when files that could affect SEO are changed. No point linting SEO on a backend API change.

Catching URL Changes

This is the one people always miss. Someone renames a route or changes a slug, and the old URL returns a 404. No redirect. And if that old URL had backlinks or was ranking for anything, thats just gone.

// Compare URLs between builds to catch missing redirects
interface URLDiff {
  removed: string[];
  added: string[];
  changed: string[];
}

function compareURLSets(
  previousURLs: string[],
  currentURLs: string[]
): URLDiff {
  const prevSet = new Set(previousURLs);
  const currSet = new Set(currentURLs);

  const removed = previousURLs.filter(url => !currSet.has(url));
  const added = currentURLs.filter(url => !prevSet.has(url));

  return { removed, added, changed: [] };
}

function validateRedirects(
  removedURLs: string[],
  redirectRules: Map<string, string>
): string[] {
  const missing: string[] = [];

  for (const url of removedURLs) {
    if (!redirectRules.has(url)) {
      missing.push(url);
    }
  }

  return missing; // These need redirects before deploy
}

You can store your previous build's URL list as an artifact and compare against the current build. If URLs were removed without redirects, fail the build. Simple as that.

The Structured Data Check

If your using JSON-LD structured data (and you should be), validate it in CI. Broken structured data means losing rich snippets in search results.

// Basic JSON-LD validation
function validateStructuredData(html: string): SEOIssue[] {
  const issues: SEOIssue[] = [];
  const dom = new JSDOM(html);
  const scripts = dom.window.document.querySelectorAll(
    'script[type="application/ld+json"]'
  );

  if (scripts.length === 0) {
    issues.push({
      file: '',
      severity: 'warning',
      message: 'No structured data found',
    });
    return issues;
  }

  scripts.forEach((script, i) => {
    try {
      const data = JSON.parse(script.textContent || '');
      if (!data['@context'] || !data['@type']) {
        issues.push({
          file: '',
          severity: 'error',
          message: `Structured data block ${i + 1}: missing @context or @type`,
        });
      }
    } catch (e) {
      issues.push({
        file: '',
        severity: 'error',
        message: `Structured data block ${i + 1}: invalid JSON`,
      });
    }
  });

  return issues;
}

Google's structured data guidelines are strict about valid JSON-LD. A single syntax error means the entire block is ignored.

Performance Budget (the Smart Way)

Instead of running full Lighthouse in CI, set a performance budget based on file size and resource count:

// Performance budget checker - instant, no browser needed
interface Budget {
  maxHTMLSize: number;     // bytes
  maxTotalJSSize: number;   // bytes
  maxImageCount: number;
  maxThirdPartyScripts: number;
}

const DEFAULT_BUDGET: Budget = {
  maxHTMLSize: 100_000,      // 100KB
  maxTotalJSSize: 500_000,   // 500KB
  maxImageCount: 20,
  maxThirdPartyScripts: 5,
};

This isnt as thorough as a real Lighthouse audit, but it catches the biggest performance regressions (someone added a 2MB library, someone embedded 50 images) without the overhead.

The Point

SEO checks in CI shouldnt be comprehensive. They should be fast and catch the things that will actually hurt you. A 5 second lint that catches missing titles and broken canonicals is worth way more than a 7 minute Lighthouse run that nobody waits for.

Start small. Add the HTML linter. Add the URL comparison. Run it only on relevant file changes. Your pipeline stays fast, and you stop deploying SEO regressions.

The canonical tag incident i mentioned at the start? Would have been caught by a 3 line check in CI. Three lines of code versus 6 weeks of recovery. Seems like a good trade.