Broken links hurt your SEO ranking, frustrate users, and make your site look unmaintained. But how do you actually find them?
Here are five practical approaches, from simple to automated, with real code you can use today.
1. Manual Browser Extensions
Browser extensions like "Check My Links" or "Broken Link Checker" highlight dead links on a single page.
Pros: Zero setup, visual feedback
Cons: One page at a time, no automation, no CI/CD integration
Best for: Quick spot-checks on a single page.
2. Command-Line Tools (wget)
wget --spider --recursive --level=3 \
--no-verbose --output-file=links.log \
https://example.com
grep -i "broken" links.log
Pros: Already installed on most systems, recursive crawling
Cons: Noisy output, no structured data, slow on large sites, hard to parse results
Best for: One-off checks on small sites.
3. Python Script (requests + BeautifulSoup)
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
def check_links(url):
resp = requests.get(url, timeout=10)
soup = BeautifulSoup(resp.text, 'html.parser')
broken = []
for link in soup.find_all('a', href=True):
href = urljoin(url, link['href'])
if not href.startswith('http'):
continue
try:
r = requests.head(href, timeout=5, allow_redirects=True)
if r.status_code >= 400:
broken.append({'url': href, 'status': r.status_code})
except requests.RequestException:
broken.append({'url': href, 'status': 'timeout'})
return broken
results = check_links('https://example.com')
for link in results:
print(f"BROKEN: {link['url']} ({link['status']})")
Pros: Customizable, can be extended with logging
Cons: Single-page only (no crawling), no concurrency, you maintain the code, doesn't follow internal links
Best for: Developers who want control and don't mind writing/maintaining code.
4. Dead Link Checker API (One API Call)
Instead of building your own crawler, you can use an API that handles the crawling, link extraction, and status checking for you:
curl "https://dead-link-checker.p.rapidapi.com/api/deadlinks?url=https://example.com&max_pages=10" \
-H "x-rapidapi-key: YOUR_KEY" \
-H "x-rapidapi-host: dead-link-checker.p.rapidapi.com"
Response:
{
"target": "https://example.com",
"pages_crawled": 10,
"total_links_checked": 142,
"working_links": 139,
"broken_count": 3,
"broken_links": [
{
"url": "https://example.com/old-page",
"status_code": 404,
"found_on": "https://example.com/blog",
"anchor_text": "Read more",
"link_type": "internal"
}
],
"summary": {
"internal_links": 98,
"external_links": 44,
"health_score": 97.9
}
}
Pros: Multi-page crawling, structured JSON response, health score, internal/external categorization, no infrastructure to maintain
Cons: Requires API key (free tier available)
This is the Dead Link Checker API on RapidAPI — it crawls multiple pages, categorizes links as internal/external, and gives you a health score.
Node.js Example
const response = await fetch(
'https://dead-link-checker.p.rapidapi.com/api/deadlinks?url=https://example.com&max_pages=10',
{
headers: {
'x-rapidapi-key': process.env.RAPIDAPI_KEY,
'x-rapidapi-host': 'dead-link-checker.p.rapidapi.com'
}
}
);
const data = await response.json();
console.log(`Health score: ${data.summary.health_score}%`);
console.log(`Broken links: ${data.broken_count}`);
data.broken_links.forEach(link => {
console.log(` ${link.status_code} — ${link.url} (found on ${link.found_on})`);
});
Python Example
import requests
response = requests.get(
"https://dead-link-checker.p.rapidapi.com/api/deadlinks",
params={"url": "https://example.com", "max_pages": 10},
headers={
"x-rapidapi-key": "YOUR_KEY",
"x-rapidapi-host": "dead-link-checker.p.rapidapi.com"
}
)
data = response.json()
print(f"Health: {data['summary']['health_score']}%")
for link in data["broken_links"]:
print(f" {link['status_code']}: {link['url']}")
5. CI/CD Integration
The most powerful approach: check for broken links on every deployment.
# .github/workflows/link-check.yml
name: Check Links
on:
push:
branches: [main]
jobs:
links:
runs-on: ubuntu-latest
steps:
- name: Check for broken links
run: |
RESULT=$(curl -s "https://dead-link-checker.p.rapidapi.com/api/deadlinks?url=${{ secrets.SITE_URL }}&max_pages=20" \
-H "x-rapidapi-key: ${{ secrets.RAPIDAPI_KEY }}" \
-H "x-rapidapi-host: dead-link-checker.p.rapidapi.com")
BROKEN=$(echo "$RESULT" | jq '.broken_count')
SCORE=$(echo "$RESULT" | jq '.summary.health_score')
echo "Health score: ${SCORE}%"
echo "Broken links: ${BROKEN}"
if [ "$BROKEN" -gt 0 ]; then
echo "$RESULT" | jq '.broken_links[] | "\(.status_code) \(.url) (on \(.found_on))"'
exit 1
fi
Pros: Catches broken links before users do, automated, blocks deploys with broken links
Cons: Needs API key in CI secrets
Which Approach Should You Use?
| Method | Best For | Multi-Page | Automation | Structured Data |
|---|---|---|---|---|
| Browser extension | Quick visual checks | No | No | No |
| wget | One-off CLI checks | Yes | Partial | No |
| Python script | Custom single-page checks | No | Yes | Partial |
| Dead Link API | Production monitoring | Yes | Yes | Yes |
| CI/CD integration | Deployment gates | Yes | Yes | Yes |
For most teams, the API approach (option 4 or 5) gives you the best balance of simplicity and power. You get structured data, multi-page crawling, and health scores — without maintaining your own crawler.
The Dead Link Checker API is free to start with on RapidAPI. PRO and ULTRA tiers available for higher volumes.
Top comments (0)