Shipping a Zero-Maintenance SEO Health Check with GitHub Actions

#seo #serps #serpapi

Maintaining good SEO health for your website or documentation is an ongoing task. Broken links, missing meta tags, or declining performance can quietly hurt your search rankings and user experience if left unchecked. In this post, we’ll walk through creating a zero-maintenance SEO health check using GitHub Actions. Our workflow will run on a weekly schedule to automatically audit our site for common SEO and technical issues, including:

Meta tag presence and correctness – ensuring each page has essential tags like and meta description (and that they’re of reasonable length).
Broken link detection – catching any 404s or dead URLs in our content (since broken links frustrate users and hurt your site’s credibility ).
Lighthouse audits – getting scores for performance, accessibility, and SEO using Google’s Lighthouse tool (which reports metrics for these categories ).
SERP result verification – using the serpnode.com API (as one of our tools) to confirm that our site appears as expected in Google results for selected keywords.

We’ll use Node.js to script the checks, along with a few handy libraries. The GitHub Actions workflow will run these scripts on a schedule (e.g. every week) so you can “set it and forget it.” Let’s dive in!

GitHub Actions Workflow Setup

First, create a new workflow file in your repository (for example, .github/workflows/seo-check.yml). This YAML configuration will tell GitHub Actions to run our SEO audit on a schedule. We can also allow manual triggers or triggers on content changes, but for a zero-maintenance approach a scheduled run is key.

Here’s a sample workflow configuration:

name: SEO Health Check

# Run weekly on Sunday at 00:00 (adjust as needed)
on:
  schedule:
    - cron:  '0 0 * * 0'
  workflow_dispatch:  # allow manual trigger from GitHub UI (optional)

jobs:
  seo_audit:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout site code (if needed)
        uses: actions/checkout@v3

      - name: Setup Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '18'   # Use Node 18+ (has built-in fetch API)

      - name: Install dependencies
        run: npm install

      - name: Run SEO checks (Meta tags & Broken links & SERP)
        run: node scripts/seo-check.js
        env:
          BASE_URL: "https://your-site.com"            # URL of site to check
          SERPNODE_API_KEY: ${{ secrets.SERPNODE_API_KEY }}  # API key for serpnode (stored in repo secrets)

      - name: Run Lighthouse audit
        run: npx lighthouse https://your-site.com --only-categories=performance,accessibility,seo --quiet --chrome-flags="--headless"

Let’s break down what this does:

Trigger (on:) – We use a cron schedule to run the workflow once a week. We also include workflow_dispatch so that it can be triggered manually if needed.
Environment – The job runs on the latest Ubuntu runner. We set up Node.js (using Node 18 here for convenience, since it includes the Fetch API natively) and install any npm dependencies our scripts need.
Steps – We then run our Node script seo-check.js to perform meta tag checks, link checks, and SERP verification. We pass in the base URL of the site via an environment variable. We also provide our serpnode API key via a secret (make sure to add SERPNODE_API_KEY in your repository’s Secrets settings).
Lighthouse – Finally, we run Lighthouse using its CLI via npx. We specify only the categories we care about (performance, accessibility, SEO) and run Chrome in headless mode. This will output a report summary to the console. (You could also output results to a file or use Lighthouse CI for more advanced use cases.)

With the workflow in place, let’s create the seo-check.js script that will handle meta tag validation, link checking, and SERP API calls.

Checking Meta Tags with Node.js

One of the simplest yet important SEO checks is verifying that each page has a

tag and a meta description, and that they’re of appropriate length. Search engines typically display up to about 50–60 characters of a title tag and ~155–160 characters of a meta description in results. Anything too long will be truncated, and missing or identical tags across pages can hurt SEO.

We can automate this check using Node.js by fetching the page HTML and parsing it. We’ll use axios for HTTP requests and cheerio (a jQuery-like HTML parser) to easily query the DOM:

const axios = require('axios');
const cheerio = require('cheerio');

// Fetch a page and check for title and meta description
async function checkMetaTags(pageUrl) {
  try {
    const { data: html } = await axios.get(pageUrl);
    const $ = cheerio.load(html);
    const titleText = $('title').text() || "";
    const metaDesc = $('meta[name="description"]').attr('content') || "";

    if (!titleText) {
      console.error(`❌ [Meta] Missing <title> tag on ${pageUrl}`);
    } else if (titleText.length > 60) {
      console.warn(`⚠️ [Meta] Title is too long (${titleText.length} chars) on ${pageUrl}`);
    } else {
      console.log(`✅ [Meta] Title tag looks good (${titleText.length} chars)`);
    }

    if (!metaDesc) {
      console.error(`❌ [Meta] Missing meta description on ${pageUrl}`);
    } else if (metaDesc.length < 50 || metaDesc.length > 160) {
      console.warn(`⚠️ [Meta] Meta description length (${metaDesc.length} chars) might be suboptimal on ${pageUrl}`);
    } else {
      console.log(`✅ [Meta] Meta description looks good (${metaDesc.length} chars)`);
    }
  } catch (err) {
    console.error(`Error fetching ${pageUrl}: ${err.message}`);
  }
}

In the code above, we load the page and then use CSS selectors to grab the <title> and the <meta name="description">. We print out a success message if they meet our criteria, or warnings/errors if something is missing or out of bounds. You can adjust the length conditions based on current SEO best practices (e.g. Google’s guidelines recommend ~50–160 characters for descriptions ).

Usage: If you call checkMetaTags("https://your-site.com"), it will output to the console whether the page has a valid title and description. In a real project, you might want to aggregate these results or fail the action if a crucial tag is missing.

Detecting Broken Links (404s)

Broken links can negatively impact both user experience and SEO. Next, we’ll scan our pages for any hyperlinks that lead to a 404 error. To keep things simple, we’ll start from a base URL (like the homepage or docs index) and check all the links on that page. You could extend this to crawl the entire site, but be cautious with very large sites to avoid overly long runs.

We’ll reuse axios to attempt HTTP HEAD requests on each link we find:

// Check all links on a given page for broken URLs
async function checkBrokenLinks(pageUrl) {
  try {
    const { data: html } = await axios.get(pageUrl);
    const $ = cheerio.load(html);
    const links = $('a[href]').map((_, a) => $(a).attr('href')).get();

    for (const link of links) {
      // Only check absolute or same-site links
      if (!link || link.startsWith('#') || link.startsWith('mailto:')) continue;
      let fullLink = link;
      if (link.startsWith('/')) {
        // convert relative link to absolute using base URL
        const base = new URL(pageUrl);
        fullLink = base.origin + link;
      }
      try {
        // Use HEAD request for efficiency
        await axios.head(fullLink, { validateStatus: () => true });
        // If the HEAD request didn't throw, we got a response (status in range 200-399 or 404 etc.)
        // Axios doesn't throw on non-2xx if validateStatus always returns true.
        const status = await axios.head(fullLink, { validateStatus: () => true }).then(res => res.status);
        if (status >= 400) {
          console.error(`❌ [Link] Broken link found: ${fullLink} (status ${status})`);
        }
      } catch (err) {
        console.error(`❌ [Link] Error checking link ${fullLink}: ${err.message}`);
      }
    }
    console.log(`✅ [Link] Link check completed for ${pageUrl}`);
  } catch (err) {
    console.error(`Error fetching ${pageUrl}: ${err.message}`);
  }
}

In this snippet, we:

Fetch the HTML and collect all href values from tags.
Filter out irrelevant links (like page anchors # or email links).
Convert relative URLs to absolute (so that /docs/page becomes https://your-site.com/docs/page for example).
Perform a HEAD request for each link. We use validateStatus: () => true to prevent axios from throwing on 404s, so we can handle them ourselves. If we get a status code >= 400, we log it as a broken link.

(In practice, you might want to skip external domains or handle them separately, depending on your use case. For a small site, checking external links is fine; for a larger site with many external links, you may want to limit or parallelize checks. There are also dedicated packages like linkinator that can crawl recursively and find broken links for you)

Auditing with Lighthouse for Performance & SEO

No SEO health check would be complete without measuring performance and other best practices. Google’s Lighthouse is an automated tool that audits a page’s performance, accessibility, SEO, and more by running a headless Chrome instance . We can integrate Lighthouse into our GitHub Action to get scores and catch regressions over time.

For simplicity, we’ll use the Lighthouse CLI directly in our workflow (as shown in the YAML). The command was:

npx lighthouse https://your-site.com --only-categories=performance,accessibility,seo --quiet --chrome-flags="--headless"

his runs Lighthouse on the URL, focusing only on the performance, accessibility, and SEO categories to keep output concise. It uses --quiet to reduce log verbosity and runs Chrome in headless mode (required in CI). The result will be printed to the action log, including a numeric score out of 100 for each category and some suggestions.

For example, you might see output like:

Performance: 85
Accessibility: 92
SEO: 100

You can fine-tune Lighthouse usage by adding flags (for example, --output=json --output-path=lighthouse.json to save a detailed JSON report, or set thresholds to fail the action if scores drop below a certain value). For a simple scheduled check, reviewing the scores in the logs or downloading the artifact can be enough to spot when something goes wrong.

(Note: The first run might be slower as it downloads Chrome dependencies. The GitHub Actions Ubuntu runners typically have Chrome available, but if not, you can install it or use a setup action . In our case, the setup-node and npx lighthouse steps should suffice since a recent Chrome is pre-installed on runners.)

Verifying Google Search Results with Serpnode API

Finally, let’s verify our site’s presence on Google for some target keywords. This is a bit tricky to do manually or without an API, but serpnode.com provides a simple SERP (Search Engine Results Page) API. We can use it to automatically query Google and check if our website appears in the top results for specific keywords.

For example, suppose we have a documentation site and we want to ensure it ranks for the query “MyProject Docs”. We can use serpnode’s API to perform a Google search and retrieve the results as JSON. Here’s a sample API call using curl:

# Sample serpnode API call (replace YOUR-API-KEY accordingly)
curl -G 'https://api.serpnode.com/v1/search' \
     --data-urlencode 'q=MyProject Docs' \
     -H 'apikey: YOUR-API-KEY'

This GET request hits the serpnode /v1/search endpoint with our query, using an API key for authentication . The response comes back as JSON containing the search results. It includes sections like organic_results (the main Google results), along with any paid_results, local_results, etc. For example, part of the JSON response might look like:

{
  "organic_results": [
    {
      "position": 1,
      "title": "MyProject Documentation – Overview",
      "url": "https://myproject.org/docs/overview",
      "description": "Welcome to the MyProject documentation..."
    },
    {
      "position": 2,
      "title": "MyProject Docs - Installation",
      "url": "https://myproject.org/docs/install",
      "description": "How to install MyProject..."
    }
    // ...
  ],
  "paid_results": [ ... ],
  "local_results": [ ... ],
  "metadata": { ... }
}

(Above is an example structure demonstrating positions, titles, URLs, etc., similar to the format documented by serpnode .)

We can integrate this into our Node script to automatically check if a certain domain or URL appears in the top results. For instance:

// Verify that our site appears in Google results for a given keyword
async function checkSerpResult(keyword, expectedDomain) {
  try {
    const response = await axios.get('https://api.serpnode.com/v1/search', {
      params: { q: keyword },
      headers: { apikey: process.env.SERPNODE_API_KEY }
    });
    const results = response.data.result.organic_results || [];
    const found = results.find(r => r.url && r.url.includes(expectedDomain));
    if (found) {
      console.log(`✅ [SERP] "${keyword}" - found our site in results (position ${found.position})`);
    } else {
      console.warn(`⚠️ [SERP] "${keyword}" - our site is NOT in the top results`);
    }
  } catch (err) {
    console.error(`Error querying serpnode for "${keyword}": ${err.message}`);
  }
}

In this function, we query the API for a keyword, then scan the organic results to see if any result URL contains our domain (you might use a stricter check or a specific URL if you expect a certain page). We log a success if found, or a warning if not. This can alert you if your SEO standing for important keywords drops. Just be mindful of the API usage limits – serpnode offers 100 free requests per month which is plenty for a weekly check on a few keywords.

Security tip: We pass the API key via an environment variable (SERPNODE_API_KEY) set in the workflow. Never hardcode secrets in your scripts or repository.

Putting It All Together

We have outlined individual checks for meta tags, links, performance, and SERP. In practice, you would combine the meta and link checks (and possibly the SERP checks) in one script (like seo-check.js) that might look like:

// Example main function that runs the checks
(async () => {
  const baseUrl = process.env.BASE_URL;
  if (!baseUrl) {
    console.error("BASE_URL not specified");
    process.exit(1);
  }

  console.log(`Starting SEO health check for ${baseUrl} ...`);

  // 1. Meta tags and broken links on the homepage (you can add more pages as needed)
  await checkMetaTags(baseUrl);
  await checkBrokenLinks(baseUrl);

  // 2. (Optional) Check additional important pages
  // const docsPage = baseUrl + "/docs/";
  // await checkMetaTags(docsPage);
  // await checkBrokenLinks(docsPage);

  // 3. SERP verification for target keywords
  await checkSerpResult("MyProject Docs", "myproject.org");
  await checkSerpResult("MyProject install", "myproject.org");
})();

ou can customize which pages to scan and which keywords to verify based on your project. The workflow will run this script weekly and output any issues to the Actions log. If everything is okay, you’ll see a bunch of green checkmarks in the log; if not, you’ll see the warnings/errors we printed, and you can take action (fix content, add redirects, etc.). You could even make the action fail on certain conditions (by exiting with a non-zero code) to get an email alert, or configure it to create GitHub issues when problems are found, but that’s beyond our scope here.

Conclusion

By leveraging GitHub Actions, we set up an automated SEO health check that requires virtually no maintenance. Every week (or on-demand), our workflow will flag issues like missing meta tags, broken links, slow pages, or SEO regressions. This proactive approach helps catch problems early, ensuring our site remains in good SEO shape without manual audits.