DEV Community

Алексей Спинов
Алексей Спинов

Posted on

Broken Link Checker in 50 Lines of Node.js — CheerioCrawler Approach

Broken links kill SEO and user experience. Here's a minimal broken link checker using CheerioCrawler.

The Approach

import { CheerioCrawler } from "crawlee";

const broken = [];
const crawler = new CheerioCrawler({
  maxRequestRetries: 1,
  async requestHandler({ $, request, response, enqueueLinks }) {
    if (response.statusCode >= 400) {
      broken.push({
        url: request.url,
        status: response.statusCode,
        foundOn: request.userData.parentUrl
      });
      return;
    }
    // Follow internal links
    await enqueueLinks({
      strategy: "same-domain",
      userData: { parentUrl: request.url }
    });
  },
  failedRequestHandler({ request }) {
    broken.push({ url: request.url, error: "Connection failed" });
  }
});

await crawler.addRequests([{ url: "https://your-site.com" }]);
await crawler.run();
console.log(`Found ${broken.length} broken links`);
Enter fullscreen mode Exit fullscreen mode

Why CheerioCrawler?

  • 50MB RAM vs 500MB for Playwright
  • 10x faster than full-browser crawlers
  • No JS rendering needed for link checking
  • Built-in retry, concurrency, and request queue

I built a full version with internal/external link checking, depth control, and summary — free on Apify (search knotless_cadence broken-links).

Top comments (0)