Dead links destroy user experience and tank your SEO. Google penalizes pages with broken outbound links, and users bounce when they hit 404s.
Most link-checking tools are either expensive SaaS products or slow crawlers that hammer servers. In this tutorial, we'll build a fast broken link checker using JavaScript and a free web scraping API ā no Puppeteer, no headless browsers, no dependencies.
What We're Building
A CLI tool that:
- Scrapes all links from any webpage
- Tests each link for HTTP errors (404, 500, timeouts)
- Takes screenshots of broken destinations (proof for reports)
- Outputs a clean report with actionable results
Prerequisites
- Node.js 18+
- A free API key from Frostbyte (200 free credits, no card required)
Step 1: Extract All Links from a Page
First, let's use the web scraper API to pull every link from a target page:
const API_KEY = process.env.FROSTBYTE_KEY || 'your-api-key';
const BASE = 'https://api.frostbyte.dev';
async function extractLinks(url) {
const res = await fetch(`${BASE}/v1/scraper/scrape`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-API-Key': API_KEY
},
body: JSON.stringify({ url, extract: ['links'] })
});
const data = await res.json();
if (!data.success) throw new Error(data.error || 'Scrape failed');
// Filter to HTTP/HTTPS links only, deduplicate
const links = [...new Set(
(data.data?.links || [])
.filter(link => link.startsWith('http'))
)];
console.log(`Found ${links.length} unique links on ${url}`);
return links;
}
Step 2: Check Each Link
Now we test each link with a HEAD request (fast, minimal bandwidth). If HEAD fails, we fall back to GET:
async function checkLink(url, timeout = 10000) {
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), timeout);
try {
// Try HEAD first (faster)
let res = await fetch(url, {
method: 'HEAD',
signal: controller.signal,
redirect: 'follow'
});
// Some servers reject HEAD ā fall back to GET
if (res.status === 405 || res.status === 403) {
res = await fetch(url, {
method: 'GET',
signal: controller.signal,
redirect: 'follow'
});
}
clearTimeout(timer);
return {
url,
status: res.status,
ok: res.ok,
redirected: res.redirected,
finalUrl: res.url
};
} catch (err) {
clearTimeout(timer);
return {
url,
status: 0,
ok: false,
error: err.name === 'AbortError' ? 'TIMEOUT' : err.message
};
}
}
Step 3: Screenshot Broken Pages
When we find a broken link, let's capture what users actually see ā this is invaluable for reports:
async function screenshotUrl(url) {
const res = await fetch(`${BASE}/v1/screenshot/take`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-API-Key': API_KEY
},
body: JSON.stringify({
url,
width: 1280,
height: 720,
format: 'png'
})
});
const data = await res.json();
return data.success ? data.data?.screenshot_url : null;
}
Step 4: Put It All Together
Here's the complete broken link checker:
async function checkPage(targetUrl) {
console.log(`\nš Scanning: ${targetUrl}\n`);
// 1. Extract all links
const links = await extractLinks(targetUrl);
// 2. Check links in parallel batches (5 at a time to be polite)
const results = [];
const batchSize = 5;
for (let i = 0; i < links.length; i += batchSize) {
const batch = links.slice(i, i + batchSize);
const checked = await Promise.all(batch.map(checkLink));
results.push(...checked);
// Progress indicator
const done = Math.min(i + batchSize, links.length);
process.stdout.write(`\rChecked ${done}/${links.length} links`);
}
console.log('\n');
// 3. Categorize results
const broken = results.filter(r => !r.ok);
const redirected = results.filter(r => r.ok && r.redirected);
const healthy = results.filter(r => r.ok && !r.redirected);
// 4. Report
console.log(`ā
Healthy: ${healthy.length}`);
console.log(`ā©ļø Redirected: ${redirected.length}`);
console.log(`ā Broken: ${broken.length}\n`);
if (broken.length > 0) {
console.log('--- BROKEN LINKS ---\n');
for (const link of broken) {
const status = link.error || `HTTP ${link.status}`;
console.log(` ${status}: ${link.url}`);
// Screenshot the broken page
const screenshot = await screenshotUrl(link.url);
if (screenshot) {
console.log(` šø ${screenshot}\n`);
}
}
}
if (redirected.length > 0) {
console.log('\n--- REDIRECTS ---\n');
for (const link of redirected) {
console.log(` ${link.url}`);
console.log(` ā ${link.finalUrl}\n`);
}
}
return { healthy, broken, redirected };
}
// Run it
const url = process.argv[2] || 'https://example.com';
checkPage(url).catch(console.error);
Run it:
export FROSTBYTE_KEY=your-api-key
node link-checker.js https://your-site.com/blog
Sample Output
š Scanning: https://your-site.com/blog
Found 47 unique links on https://your-site.com/blog
Checked 47/47 links
ā
Healthy: 41
ā©ļø Redirected: 3
ā Broken: 3
--- BROKEN LINKS ---
HTTP 404: https://old-service.com/deprecated-page
šø https://screenshots.frostbyte.dev/abc123.png
TIMEOUT: https://dead-domain.xyz
šø https://screenshots.frostbyte.dev/def456.png
HTTP 500: https://api.broken-service.io/docs
šø https://screenshots.frostbyte.dev/ghi789.png
--- REDIRECTS ---
https://http-site.com/page
ā https://https-site.com/page
Making It Production-Ready
Scan an Entire Sitemap
Check every page on your site by parsing the sitemap:
async function checkSitemap(sitemapUrl) {
const res = await fetch(sitemapUrl);
const xml = await res.text();
// Extract URLs from sitemap XML
const urls = [...xml.matchAll(/<loc>(.+?)<\/loc>/g)]
.map(m => m[1]);
console.log(`Found ${urls.length} pages in sitemap\n`);
let totalBroken = 0;
for (const url of urls) {
const { broken } = await checkPage(url);
totalBroken += broken.length;
}
console.log(`\n=== TOTAL BROKEN LINKS: ${totalBroken} ===`);
}
Schedule Weekly Checks
Run it on a cron schedule with PM2:
# ecosystem.config.js
module.exports = {
apps: [{
name: 'link-checker',
script: 'link-checker.js',
args: 'https://your-site.com/sitemap.xml',
cron_restart: '0 9 * * 1', // Every Monday 9am
autorestart: false
}]
};
JSON Output for CI/CD
Add a --json flag to integrate into your build pipeline:
if (process.argv.includes('--json')) {
const report = {
scanned: url,
timestamp: new Date().toISOString(),
total: results.length,
healthy: healthy.length,
broken: broken.map(b => ({ url: b.url, status: b.status, error: b.error })),
redirects: redirected.map(r => ({ from: r.url, to: r.finalUrl }))
};
console.log(JSON.stringify(report, null, 2));
}
Use it in GitHub Actions to catch broken links before they go live:
- name: Check for broken links
run: |
node link-checker.js https://staging.your-site.com --json > report.json
BROKEN=$(jq '.broken | length' report.json)
if [ "$BROKEN" -gt "0" ]; then
echo "Found $BROKEN broken links!"
jq '.broken' report.json
exit 1
fi
API Credits Used
| Operation | Credits | Per Check |
|---|---|---|
| Scrape page for links | 1 | 1 per page |
| Screenshot (broken only) | 1 | ~3 per page |
| Total for 50-link page | ā | ~4 credits |
With 200 free credits, you can check ~50 pages (2,500 links) without paying anything.
Why Not Just Use curl?
You could curl every link manually, but:
- JavaScript-rendered pages won't return proper status codes via curl. The scraper API renders pages in a real browser.
- Screenshot proof makes reports actionable ā stakeholders can see exactly what's broken.
- Link extraction from JavaScript-heavy SPAs requires a real browser engine, which the scraper handles.
What's Next?
- Add Slack/Discord notifications when broken links are found
- Track historical data to catch links as they break
- Build a web dashboard with trend charts
Get your free API key at frostbyte.dev ā 200 credits, no credit card, instant access. The scraper, screenshot, and 40+ other API tools are all included.
Top comments (0)