Ozor

Posted on Mar 5

How to Build a Broken Link Checker in JavaScript (Free API)

#seo #webdev #javascript #tutorial

Dead links destroy user experience and tank your SEO. Google penalizes pages with broken outbound links, and users bounce when they hit 404s.

Most link-checking tools are either expensive SaaS products or slow crawlers that hammer servers. In this tutorial, we'll build a fast broken link checker using JavaScript and a free web scraping API — no Puppeteer, no headless browsers, no dependencies.

What We're Building

A CLI tool that:

Scrapes all links from any webpage
Tests each link for HTTP errors (404, 500, timeouts)
Takes screenshots of broken destinations (proof for reports)
Outputs a clean report with actionable results

Prerequisites

Node.js 18+
A free API key from Frostbyte (200 free credits, no card required)

Step 1: Extract All Links from a Page

First, let's use the web scraper API to pull every link from a target page:

const API_KEY = process.env.FROSTBYTE_KEY || 'your-api-key';
const BASE = 'https://api.frostbyte.dev';

async function extractLinks(url) {
  const res = await fetch(`${BASE}/v1/scraper/scrape`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'X-API-Key': API_KEY
    },
    body: JSON.stringify({ url, extract: ['links'] })
  });

  const data = await res.json();
  if (!data.success) throw new Error(data.error || 'Scrape failed');

  // Filter to HTTP/HTTPS links only, deduplicate
  const links = [...new Set(
    (data.data?.links || [])
      .filter(link => link.startsWith('http'))
  )];

  console.log(`Found ${links.length} unique links on ${url}`);
  return links;
}

Step 2: Check Each Link

Now we test each link with a HEAD request (fast, minimal bandwidth). If HEAD fails, we fall back to GET:

async function checkLink(url, timeout = 10000) {
  const controller = new AbortController();
  const timer = setTimeout(() => controller.abort(), timeout);

  try {
    // Try HEAD first (faster)
    let res = await fetch(url, {
      method: 'HEAD',
      signal: controller.signal,
      redirect: 'follow'
    });

    // Some servers reject HEAD — fall back to GET
    if (res.status === 405 || res.status === 403) {
      res = await fetch(url, {
        method: 'GET',
        signal: controller.signal,
        redirect: 'follow'
      });
    }

    clearTimeout(timer);

    return {
      url,
      status: res.status,
      ok: res.ok,
      redirected: res.redirected,
      finalUrl: res.url
    };
  } catch (err) {
    clearTimeout(timer);
    return {
      url,
      status: 0,
      ok: false,
      error: err.name === 'AbortError' ? 'TIMEOUT' : err.message
    };
  }
}

Step 3: Screenshot Broken Pages

When we find a broken link, let's capture what users actually see — this is invaluable for reports:

async function screenshotUrl(url) {
  const res = await fetch(`${BASE}/v1/screenshot/take`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'X-API-Key': API_KEY
    },
    body: JSON.stringify({
      url,
      width: 1280,
      height: 720,
      format: 'png'
    })
  });

  const data = await res.json();
  return data.success ? data.data?.screenshot_url : null;
}

Step 4: Put It All Together

Here's the complete broken link checker:

async function checkPage(targetUrl) {
  console.log(`\n🔍 Scanning: ${targetUrl}\n`);

  // 1. Extract all links
  const links = await extractLinks(targetUrl);

  // 2. Check links in parallel batches (5 at a time to be polite)
  const results = [];
  const batchSize = 5;

  for (let i = 0; i < links.length; i += batchSize) {
    const batch = links.slice(i, i + batchSize);
    const checked = await Promise.all(batch.map(checkLink));
    results.push(...checked);

    // Progress indicator
    const done = Math.min(i + batchSize, links.length);
    process.stdout.write(`\rChecked ${done}/${links.length} links`);
  }

  console.log('\n');

  // 3. Categorize results
  const broken = results.filter(r => !r.ok);
  const redirected = results.filter(r => r.ok && r.redirected);
  const healthy = results.filter(r => r.ok && !r.redirected);

  // 4. Report
  console.log(`✅ Healthy: ${healthy.length}`);
  console.log(`↩️  Redirected: ${redirected.length}`);
  console.log(`❌ Broken: ${broken.length}\n`);

  if (broken.length > 0) {
    console.log('--- BROKEN LINKS ---\n');
    for (const link of broken) {
      const status = link.error || `HTTP ${link.status}`;
      console.log(`  ${status}: ${link.url}`);

      // Screenshot the broken page
      const screenshot = await screenshotUrl(link.url);
      if (screenshot) {
        console.log(`  📸 ${screenshot}\n`);
      }
    }
  }

  if (redirected.length > 0) {
    console.log('\n--- REDIRECTS ---\n');
    for (const link of redirected) {
      console.log(`  ${link.url}`);
      console.log(`  → ${link.finalUrl}\n`);
    }
  }

  return { healthy, broken, redirected };
}

// Run it
const url = process.argv[2] || 'https://example.com';
checkPage(url).catch(console.error);

Run it:

export FROSTBYTE_KEY=your-api-key
node link-checker.js https://your-site.com/blog

Sample Output

🔍 Scanning: https://your-site.com/blog

Found 47 unique links on https://your-site.com/blog
Checked 47/47 links

✅ Healthy: 41
↩️  Redirected: 3
❌ Broken: 3

--- BROKEN LINKS ---

  HTTP 404: https://old-service.com/deprecated-page
  📸 https://screenshots.frostbyte.dev/abc123.png

  TIMEOUT: https://dead-domain.xyz
  📸 https://screenshots.frostbyte.dev/def456.png

  HTTP 500: https://api.broken-service.io/docs
  📸 https://screenshots.frostbyte.dev/ghi789.png

--- REDIRECTS ---

  https://http-site.com/page
  → https://https-site.com/page

Making It Production-Ready

Scan an Entire Sitemap

Check every page on your site by parsing the sitemap:

async function checkSitemap(sitemapUrl) {
  const res = await fetch(sitemapUrl);
  const xml = await res.text();

  // Extract URLs from sitemap XML
  const urls = [...xml.matchAll(/<loc>(.+?)<\/loc>/g)]
    .map(m => m[1]);

  console.log(`Found ${urls.length} pages in sitemap\n`);

  let totalBroken = 0;
  for (const url of urls) {
    const { broken } = await checkPage(url);
    totalBroken += broken.length;
  }

  console.log(`\n=== TOTAL BROKEN LINKS: ${totalBroken} ===`);
}

Schedule Weekly Checks

Run it on a cron schedule with PM2:

# ecosystem.config.js
module.exports = {
  apps: [{
    name: 'link-checker',
    script: 'link-checker.js',
    args: 'https://your-site.com/sitemap.xml',
    cron_restart: '0 9 * * 1',  // Every Monday 9am
    autorestart: false
  }]
};

JSON Output for CI/CD

Add a --json flag to integrate into your build pipeline:

if (process.argv.includes('--json')) {
  const report = {
    scanned: url,
    timestamp: new Date().toISOString(),
    total: results.length,
    healthy: healthy.length,
    broken: broken.map(b => ({ url: b.url, status: b.status, error: b.error })),
    redirects: redirected.map(r => ({ from: r.url, to: r.finalUrl }))
  };
  console.log(JSON.stringify(report, null, 2));
}

Use it in GitHub Actions to catch broken links before they go live:

- name: Check for broken links
  run: |
    node link-checker.js https://staging.your-site.com --json > report.json
    BROKEN=$(jq '.broken | length' report.json)
    if [ "$BROKEN" -gt "0" ]; then
      echo "Found $BROKEN broken links!"
      jq '.broken' report.json
      exit 1
    fi

API Credits Used

Operation	Credits	Per Check
Scrape page for links	1	1 per page
Screenshot (broken only)	1	~3 per page
Total for 50-link page	—	~4 credits

With 200 free credits, you can check ~50 pages (2,500 links) without paying anything.

Why Not Just Use curl?

You could curl every link manually, but:

JavaScript-rendered pages won't return proper status codes via curl. The scraper API renders pages in a real browser.
Screenshot proof makes reports actionable — stakeholders can see exactly what's broken.
Link extraction from JavaScript-heavy SPAs requires a real browser engine, which the scraper handles.

What's Next?

Add Slack/Discord notifications when broken links are found
Track historical data to catch links as they break
Build a web dashboard with trend charts

Get your free API key at frostbyte.dev — 200 credits, no credit card, instant access. The scraper, screenshot, and 40+ other API tools are all included.

DEV Community