DEV Community

Ozor
Ozor

Posted on

How to Build a Broken Link Checker in JavaScript (Free API)

Dead links destroy user experience and tank your SEO. Google penalizes pages with broken outbound links, and users bounce when they hit 404s.

Most link-checking tools are either expensive SaaS products or slow crawlers that hammer servers. In this tutorial, we'll build a fast broken link checker using JavaScript and a free web scraping API — no Puppeteer, no headless browsers, no dependencies.

What We're Building

A CLI tool that:

  1. Scrapes all links from any webpage
  2. Tests each link for HTTP errors (404, 500, timeouts)
  3. Takes screenshots of broken destinations (proof for reports)
  4. Outputs a clean report with actionable results

Prerequisites

  • Node.js 18+
  • A free API key from Frostbyte (200 free credits, no card required)

Step 1: Extract All Links from a Page

First, let's use the web scraper API to pull every link from a target page:

const API_KEY = process.env.FROSTBYTE_KEY || 'your-api-key';
const BASE = 'https://api.frostbyte.dev';

async function extractLinks(url) {
  const res = await fetch(`${BASE}/v1/scraper/scrape`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'X-API-Key': API_KEY
    },
    body: JSON.stringify({ url, extract: ['links'] })
  });

  const data = await res.json();
  if (!data.success) throw new Error(data.error || 'Scrape failed');

  // Filter to HTTP/HTTPS links only, deduplicate
  const links = [...new Set(
    (data.data?.links || [])
      .filter(link => link.startsWith('http'))
  )];

  console.log(`Found ${links.length} unique links on ${url}`);
  return links;
}
Enter fullscreen mode Exit fullscreen mode

Step 2: Check Each Link

Now we test each link with a HEAD request (fast, minimal bandwidth). If HEAD fails, we fall back to GET:

async function checkLink(url, timeout = 10000) {
  const controller = new AbortController();
  const timer = setTimeout(() => controller.abort(), timeout);

  try {
    // Try HEAD first (faster)
    let res = await fetch(url, {
      method: 'HEAD',
      signal: controller.signal,
      redirect: 'follow'
    });

    // Some servers reject HEAD — fall back to GET
    if (res.status === 405 || res.status === 403) {
      res = await fetch(url, {
        method: 'GET',
        signal: controller.signal,
        redirect: 'follow'
      });
    }

    clearTimeout(timer);

    return {
      url,
      status: res.status,
      ok: res.ok,
      redirected: res.redirected,
      finalUrl: res.url
    };
  } catch (err) {
    clearTimeout(timer);
    return {
      url,
      status: 0,
      ok: false,
      error: err.name === 'AbortError' ? 'TIMEOUT' : err.message
    };
  }
}
Enter fullscreen mode Exit fullscreen mode

Step 3: Screenshot Broken Pages

When we find a broken link, let's capture what users actually see — this is invaluable for reports:

async function screenshotUrl(url) {
  const res = await fetch(`${BASE}/v1/screenshot/take`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'X-API-Key': API_KEY
    },
    body: JSON.stringify({
      url,
      width: 1280,
      height: 720,
      format: 'png'
    })
  });

  const data = await res.json();
  return data.success ? data.data?.screenshot_url : null;
}
Enter fullscreen mode Exit fullscreen mode

Step 4: Put It All Together

Here's the complete broken link checker:

async function checkPage(targetUrl) {
  console.log(`\nšŸ” Scanning: ${targetUrl}\n`);

  // 1. Extract all links
  const links = await extractLinks(targetUrl);

  // 2. Check links in parallel batches (5 at a time to be polite)
  const results = [];
  const batchSize = 5;

  for (let i = 0; i < links.length; i += batchSize) {
    const batch = links.slice(i, i + batchSize);
    const checked = await Promise.all(batch.map(checkLink));
    results.push(...checked);

    // Progress indicator
    const done = Math.min(i + batchSize, links.length);
    process.stdout.write(`\rChecked ${done}/${links.length} links`);
  }

  console.log('\n');

  // 3. Categorize results
  const broken = results.filter(r => !r.ok);
  const redirected = results.filter(r => r.ok && r.redirected);
  const healthy = results.filter(r => r.ok && !r.redirected);

  // 4. Report
  console.log(`āœ… Healthy: ${healthy.length}`);
  console.log(`ā†©ļø  Redirected: ${redirected.length}`);
  console.log(`āŒ Broken: ${broken.length}\n`);

  if (broken.length > 0) {
    console.log('--- BROKEN LINKS ---\n');
    for (const link of broken) {
      const status = link.error || `HTTP ${link.status}`;
      console.log(`  ${status}: ${link.url}`);

      // Screenshot the broken page
      const screenshot = await screenshotUrl(link.url);
      if (screenshot) {
        console.log(`  šŸ“ø ${screenshot}\n`);
      }
    }
  }

  if (redirected.length > 0) {
    console.log('\n--- REDIRECTS ---\n');
    for (const link of redirected) {
      console.log(`  ${link.url}`);
      console.log(`  → ${link.finalUrl}\n`);
    }
  }

  return { healthy, broken, redirected };
}

// Run it
const url = process.argv[2] || 'https://example.com';
checkPage(url).catch(console.error);
Enter fullscreen mode Exit fullscreen mode

Run it:

export FROSTBYTE_KEY=your-api-key
node link-checker.js https://your-site.com/blog
Enter fullscreen mode Exit fullscreen mode

Sample Output

šŸ” Scanning: https://your-site.com/blog

Found 47 unique links on https://your-site.com/blog
Checked 47/47 links

āœ… Healthy: 41
ā†©ļø  Redirected: 3
āŒ Broken: 3

--- BROKEN LINKS ---

  HTTP 404: https://old-service.com/deprecated-page
  šŸ“ø https://screenshots.frostbyte.dev/abc123.png

  TIMEOUT: https://dead-domain.xyz
  šŸ“ø https://screenshots.frostbyte.dev/def456.png

  HTTP 500: https://api.broken-service.io/docs
  šŸ“ø https://screenshots.frostbyte.dev/ghi789.png

--- REDIRECTS ---

  https://http-site.com/page
  → https://https-site.com/page
Enter fullscreen mode Exit fullscreen mode

Making It Production-Ready

Scan an Entire Sitemap

Check every page on your site by parsing the sitemap:

async function checkSitemap(sitemapUrl) {
  const res = await fetch(sitemapUrl);
  const xml = await res.text();

  // Extract URLs from sitemap XML
  const urls = [...xml.matchAll(/<loc>(.+?)<\/loc>/g)]
    .map(m => m[1]);

  console.log(`Found ${urls.length} pages in sitemap\n`);

  let totalBroken = 0;
  for (const url of urls) {
    const { broken } = await checkPage(url);
    totalBroken += broken.length;
  }

  console.log(`\n=== TOTAL BROKEN LINKS: ${totalBroken} ===`);
}
Enter fullscreen mode Exit fullscreen mode

Schedule Weekly Checks

Run it on a cron schedule with PM2:

# ecosystem.config.js
module.exports = {
  apps: [{
    name: 'link-checker',
    script: 'link-checker.js',
    args: 'https://your-site.com/sitemap.xml',
    cron_restart: '0 9 * * 1',  // Every Monday 9am
    autorestart: false
  }]
};
Enter fullscreen mode Exit fullscreen mode

JSON Output for CI/CD

Add a --json flag to integrate into your build pipeline:

if (process.argv.includes('--json')) {
  const report = {
    scanned: url,
    timestamp: new Date().toISOString(),
    total: results.length,
    healthy: healthy.length,
    broken: broken.map(b => ({ url: b.url, status: b.status, error: b.error })),
    redirects: redirected.map(r => ({ from: r.url, to: r.finalUrl }))
  };
  console.log(JSON.stringify(report, null, 2));
}
Enter fullscreen mode Exit fullscreen mode

Use it in GitHub Actions to catch broken links before they go live:

- name: Check for broken links
  run: |
    node link-checker.js https://staging.your-site.com --json > report.json
    BROKEN=$(jq '.broken | length' report.json)
    if [ "$BROKEN" -gt "0" ]; then
      echo "Found $BROKEN broken links!"
      jq '.broken' report.json
      exit 1
    fi
Enter fullscreen mode Exit fullscreen mode

API Credits Used

Operation Credits Per Check
Scrape page for links 1 1 per page
Screenshot (broken only) 1 ~3 per page
Total for 50-link page — ~4 credits

With 200 free credits, you can check ~50 pages (2,500 links) without paying anything.

Why Not Just Use curl?

You could curl every link manually, but:

  • JavaScript-rendered pages won't return proper status codes via curl. The scraper API renders pages in a real browser.
  • Screenshot proof makes reports actionable — stakeholders can see exactly what's broken.
  • Link extraction from JavaScript-heavy SPAs requires a real browser engine, which the scraper handles.

What's Next?

  • Add Slack/Discord notifications when broken links are found
  • Track historical data to catch links as they break
  • Build a web dashboard with trend charts

Get your free API key at frostbyte.dev — 200 credits, no credit card, instant access. The scraper, screenshot, and 40+ other API tools are all included.

Top comments (0)