Broken links kill SEO and user experience. Here's a minimal broken link checker using CheerioCrawler.
The Approach
import { CheerioCrawler } from "crawlee";
const broken = [];
const crawler = new CheerioCrawler({
maxRequestRetries: 1,
async requestHandler({ $, request, response, enqueueLinks }) {
if (response.statusCode >= 400) {
broken.push({
url: request.url,
status: response.statusCode,
foundOn: request.userData.parentUrl
});
return;
}
// Follow internal links
await enqueueLinks({
strategy: "same-domain",
userData: { parentUrl: request.url }
});
},
failedRequestHandler({ request }) {
broken.push({ url: request.url, error: "Connection failed" });
}
});
await crawler.addRequests([{ url: "https://your-site.com" }]);
await crawler.run();
console.log(`Found ${broken.length} broken links`);
Why CheerioCrawler?
- 50MB RAM vs 500MB for Playwright
- 10x faster than full-browser crawlers
- No JS rendering needed for link checking
- Built-in retry, concurrency, and request queue
I built a full version with internal/external link checking, depth control, and summary — free on Apify (search knotless_cadence broken-links).
Top comments (0)