I spent an entire afternoon wondering why my client's Next.js portfolio wasn't showing up on Google. Clean code, fast load times, good content. Turned out the meta tags were rendering after the crawler gave up. One audit. That's all it took to find it. If you're shipping sites without running a structured audit first, you're guessing. This article walks through how to audit a website properly using free tools, including how to build a lightweight audit script yourself, so you stop guessing and start shipping with confidence.
Why Most Developers Skip Audits (And Pay for It Later)
Auditing feels like QA work: unglamorous, easy to defer. But consider: Google's crawler doesn't execute JavaScript the same way your browser does. Lighthouse scores don't catch missing canonical tags. And broken internal links are practically invisible until a real user hits a dead end.
The good news is that website audit tools free of cost are genuinely powerful now, and you don't have to leave your terminal to use them.
Here's what a minimal but effective audit should check:
- Meta tags (title, description, og:image, canonical)
- Robots.txt and sitemap validity
- Broken links (internal + external)
- Core Web Vitals baseline
- Structured data / JSON-LD presence
Let's build this layer by layer.
Step 1: Scrape and Validate Meta Tags with Node.js
The fastest free audit you can run is checking whether your pages are even telling Google what they're about.
// audit-meta.js
import * as cheerio from 'cheerio';
import fetch from 'node-fetch';
const url = 'https://your-site.com';
async function auditMeta(pageUrl) {
const res = await fetch(pageUrl);
const html = await res.text();
const $ = cheerio.load(html);
const results = {
title: $('title').text() || '❌ Missing',
description: $('meta[name="description"]').attr('content') || '❌ Missing',
canonical: $('link[rel="canonical"]').attr('href') || '❌ Missing',
ogTitle: $('meta[property="og:title"]').attr('content') || '❌ Missing',
ogImage: $('meta[property="og:image"]').attr('content') || '❌ Missing',
};
console.table(results);
const missing = Object.entries(results)
.filter(([_, v]) => v.startsWith('❌'))
.map(([k]) => k);
if (missing.length) {
console.warn(`\n⚠️ Missing fields: ${missing.join(', ')}`);
} else {
console.log('\n✅ All meta tags present.');
}
}
auditMeta(url);
Run it:
npm install cheerio node-fetch
node audit-meta.js
Result: A clean table printed in your terminal showing exactly what's missing. No browser extensions, no paid dashboards. If canonical is missing on a site with query parameters, that alone could be causing indexing issues.
Step 2: Validate Robots.txt and Sitemap Programmatically
A misconfigured robots.txt can silently block your entire site from being indexed. This happens more often than you'd think after a deployment pipeline misconfiguration.
// audit-crawlability.js
import fetch from 'node-fetch';
const BASE = 'https://your-site.com';
async function checkFile(path) {
const url = `${BASE}${path}`;
try {
const res = await fetch(url);
if (!res.ok) return `❌ ${url} returned ${res.status}`;
const text = await res.text();
if (path === '/robots.txt' && text.includes('Disallow: /')) {
return `⚠️ robots.txt may be blocking everything:\n${text.slice(0, 200)}`;
}
return `✅ ${url} is accessible (${text.length} bytes)`;
} catch (e) {
return `❌ ${url} unreachable: ${e.message}`;
}
}
(async () => {
console.log(await checkFile('/robots.txt'));
console.log(await checkFile('/sitemap.xml'));
})();
Result: You'll immediately see if your sitemap is returning a 404 (common after CMS migrations) or if robots.txt contains a blanket Disallow: /, a mistake that's devastatingly easy to make and hard to notice.
Step 3: Crawl for Broken Internal Links
Broken internal links hurt both UX and crawl budget. Here's a lightweight crawler that stays within your own domain:
// audit-links.js
import fetch from 'node-fetch';
import * as cheerio from 'cheerio';
const BASE = 'https://your-site.com';
const visited = new Set();
const broken = [];
async function crawl(url) {
if (visited.has(url)) return;
visited.add(url);
try {
const res = await fetch(url);
if (!res.ok) {
broken.push({ url, status: res.status });
return;
}
const html = await res.text();
const $ = cheerio.load(html);
const links = [];
$('a[href]').each((_, el) => {
const href = $(el).attr('href');
if (href.startsWith('/')) links.push(BASE + href);
else if (href.startsWith(BASE)) links.push(href);
});
for (const link of links) {
await crawl(link);
}
} catch (e) {
broken.push({ url, status: 'NETWORK_ERROR' });
}
}
(async () => {
await crawl(BASE);
if (broken.length === 0) {
console.log('✅ No broken links found.');
} else {
console.log('❌ Broken links:');
console.table(broken);
}
})();
Cap this with a depth limit or visited.size < 50 guard for large sites. For production-scale crawling, this is where a dedicated tool helps.
Step 4: Where power-seo Fits In
Once you've validated your meta tags, crawlability, and link health manually, you might want to automate this into a CI check or run it across multiple URLs at once. The power-seo npm package bundles many of these checks into a single CLI call, useful when you're auditing client sites regularly or want a structured JSON report you can diff between deployments.
npx power-seo audit https://your-site.com --output report.json
It won't replace understanding what you're looking for (which the steps above give you), but it's a solid time-saver once you know your audit checklist. The CyberCraft team also wrote a deeper dive on free JavaScript SEO audit approaches worth reading alongside this.
What I Learned
- Audit before you launch, not after you notice a traffic drop. Most SEO issues are invisible until they've already cost you weeks of ranking.
- Robots.txt and canonical tags cause more invisible damage than any other two things combined. Check them first, every time.
- You don't need a $99/month SaaS to do a solid audit. Node.js, cheerio, and 100 lines of code will catch 80% of real-world issues.
- Automate it. A pre-deploy audit script in your CI pipeline pays for itself the first time it catches a misconfiguration before it goes live.
If you want to try this approach, here's the repo: https://github.com/CyberCraftBD/power-seo
What's Your Audit Workflow?
Do you run any automated checks before deploying? Have you been burned by a meta tag issue or a blocked robots.txt? Drop your war story in the comments, and if you found the scripts above useful, a ❤️ helps other developers find this. What would you add to this audit checklist?
Top comments (1)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.