Most websites have hidden JSON APIs that return cleaner data than their HTML pages. I built a simple Node.js script that discovers them automatically.
The Problem
Traditional web scraping is fragile:
- CSS selectors break when sites redesign
- HTML parsing is slow and error-prone
- Anti-bot systems block headless browsers
But many sites expose internal JSON APIs that are faster, more stable, and return structured data.
The Discovery Script
const https = require("https");
async function findAPIs(domain) {
const commonPaths = [
"/api/v1",
"/api/v2",
"/api/graphql",
"/_next/data",
"/wp-json/wp/v2/posts",
"/feed.json",
"/sitemap.xml",
"/.well-known/openid-configuration",
"/robots.txt",
"/manifest.json"
];
const results = [];
for (const path of commonPaths) {
try {
const res = await fetch(`https://${domain}${path}`);
if (res.ok) {
const contentType = res.headers.get("content-type") || "";
results.push({
path,
status: res.status,
type: contentType.split(";")[0],
size: res.headers.get("content-length") || "unknown"
});
}
} catch (e) {
// Skip failed requests
}
}
return results;
}
// Usage
findAPIs("dev.to").then(apis => {
console.log(`Found ${apis.length} endpoints:\n`);
apis.forEach(api => {
console.log(` ${api.path} → ${api.type} (${api.size} bytes)`);
});
});
Real Examples I Found
| Website | Hidden API | What It Returns |
|---|---|---|
| dev.to | /api/articles |
Full article data as JSON |
| npm | /registry.npmjs.org/{pkg} |
Complete package metadata |
| PyPI | /pypi/{pkg}/json |
Package info, versions, downloads |
| Wikipedia | /api/rest_v1/page/summary/{title} |
Clean article summaries |
| GitHub | /api/v3/repos/{owner}/{repo} |
Repository data |
Why This Matters
For one of my projects, switching from HTML scraping to the hidden JSON API:
- Reduced code from 120 lines to 15 lines
- Increased speed by 10x (no browser needed)
- Eliminated breakage — JSON APIs rarely change structure
Pro Tips
- Check Network tab in DevTools — filter by XHR/Fetch to see what APIs the frontend calls
-
Look for
/api/,/_next/data/,/wp-json/— these are the most common patterns -
Check
robots.txt— it often reveals API paths the site doesn't want crawled (but are technically public) -
Try adding
.jsonto any URL — many frameworks return JSON when you do this
Want More?
I maintain a curated list of 60+ free APIs and scraping tools that never break. Star it if you find it useful!
Have you found any interesting hidden APIs? Share in the comments!
Top comments (0)