Build a Broken Link Detector in Under 50 Lines of Node.js

#seo #node #javascript #webdev

Your site has broken links right now. You just don't know which ones.

It could be a blog post from 2019 linking to a vendor that went under. A docs page pointing to a GitHub repo you archived. A partner integration page whose link rotted when they redesigned their site. None of these show up in your analytics. Users quietly bounce, Google quietly downgrades you, and you find out six months later when SEO drops.

Here's how to build a broken link detector in under 50 lines of Node.js that actually works — no browser automation, no Puppeteer, no scraping logic to maintain.

The Approach

The standard approach to link checking is messy: spin up a headless browser, navigate to the page, wait for JS to render, extract hrefs from the DOM, then check each one with fetch. That's 80+ lines before you've handled relative URLs, redirects, or rate limiting.

There's a faster way. SnapAPI's /v1/metadata endpoint returns all links from a page as a clean array in a single API call — no browser to manage on your end. You send a URL, get back every outbound link already extracted. Then you just loop and check each one with a HEAD request.

Result: ~45 lines of Node.js, zero infrastructure.

The Code

npm install snapapi-sdk

import SnapAPI from 'snapapi-sdk';

const snap = new SnapAPI({ apiKey: process.env.SNAPAPI_KEY });

async function checkLinks(pageUrl) {
  console.log(`\nChecking: ${pageUrl}\n`);

  // Extract all links from the page in one API call
  const { links } = await snap.metadata({ url: pageUrl });

  if (!links || links.length === 0) {
    console.log('No outbound links found.');
    return;
  }

  // Resolve relative links against the base URL
  const base = new URL(pageUrl);
  const resolved = links.map(href => {
    try {
      return new URL(href, base).href;
    } catch {
      return null;
    }
  }).filter(Boolean);

  // Deduplicate
  const unique = [...new Set(resolved)];
  console.log(`Found ${unique.length} unique links. Checking each...\n`);

  const broken = [];

  for (const url of unique) {
    try {
      const res = await fetch(url, { method: 'HEAD', redirect: 'follow', signal: AbortSignal.timeout(8000) });
      const status = res.status;
      const label = status >= 400 ? '❌' : '✅';
      console.log(`${label} [${status}] ${url}`);
      if (status >= 400) broken.push({ url, status });
    } catch (err) {
      console.log(`💀 [ERR] ${url} — ${err.message}`);
      broken.push({ url, status: 'ERROR', error: err.message });
    }
  }

  console.log(`\n--- Summary ---`);
  console.log(`Total links checked: ${unique.length}`);
  console.log(`Broken: ${broken.length}`);
  if (broken.length > 0) {
    console.log('\nBroken links:');
    broken.forEach(b => console.log(`  [${b.status}] ${b.url}`));
  }
}

// Run it
checkLinks(process.argv[2] || 'https://example.com');

Save that as check-links.mjs and run it:

SNAPAPI_KEY=your_key node check-links.mjs https://yoursite.com/blog

What the Output Looks Like

Checking: https://yoursite.com/blog

Found 23 unique links. Checking each...

✅ [200] https://twitter.com/yourhandle
✅ [200] https://github.com/yourorg/repo
❌ [404] https://oldvendor.com/integration-docs
✅ [200] https://docs.stripe.com/api
💀 [ERR] https://defunct-partner.io — fetch failed (ECONNREFUSED)
✅ [200] https://en.wikipedia.org/wiki/REST_API

--- Summary ---
Total links checked: 23
Broken: 2

Broken links:
  [404] https://oldvendor.com/integration-docs
  [ERROR] https://defunct-partner.io

Clean, actionable. Pipe it to a file or a Slack webhook if you want alerts.

Gotchas to Know

Relative vs absolute URLs. The metadata endpoint returns links as they appear in the HTML — so href="/about" comes back as /about. The new URL(href, base) call handles this, but make sure your base URL includes the full origin (https://yoursite.com, not just yoursite.com).

Redirect chains. The redirect: 'follow' option in fetch means a 301 → 200 chain registers as 200. That's usually what you want — a redirect that resolves is not a broken link. But if you want to catch destination changes (e.g., your partner's page now redirects to their homepage instead of the actual docs), you'd need to check the final URL against the expected destination.

Rate limiting. The for loop above checks links one at a time. That's safe for most pages (under 100 links), but if you're scanning a site with heavy outbound linking, add a delay between requests or batch them with Promise.allSettled in chunks of 5–10. Hitting 30 sites simultaneously looks like a bot to their CDN.

Some servers reject HEAD. Most return proper status codes for HEAD requests, but occasionally a server returns 405 Method Not Allowed for HEAD while allowing GET. If you see a lot of unexpected 405s, swap to method: 'GET' — slower, but more reliable.

JavaScript-rendered links. SnapAPI runs a real Chromium browser, so the metadata endpoint captures links rendered by JavaScript too, not just static HTML. This matters for SPAs where nav items and footers are injected after load.

Run It on a Schedule

Wrap this in a cron job and you've got continuous link monitoring with zero infrastructure:

# Check your /docs every day at 9am
0 9 * * * SNAPAPI_KEY=your_key node check-links.mjs https://yoursite.com/docs >> /var/log/linkcheck.log 2>&1

Or push broken link counts to a monitoring service and alert when the number goes above zero.

Get Started

Free API key at snapapi.tech/start — 100 calls/month, no credit card. Each metadata call counts as one API call regardless of how many links are on the page.

The full link extractor above uses two things from SnapAPI: the metadata endpoint to get all links, and that's it. The actual HTTP checking is plain Node.js fetch. You own the logic; SnapAPI just handles the page scraping part.

Go find those broken links.