I Built a JavaScript SEO Content Analyzer in an Afternoon. Here's Everything I Learned

#webdev #seo #tooling #javascript

I Spent 3 Hours Debugging Why Google Couldn't See My React App

The Lighthouse score was 98. The site felt fast. Users loved it. But Google Search Console was returning crawl errors, and organic traffic was flatlining. Turns out, my beautifully rendered React app was serving a nearly empty

to bots: no meta description, no canonical, broken Open Graph tags. The content existed, just not when crawlers looked and that’s exactly where an ai SEO tool would have caught the issue early by simulating how search engines actually render and index your pages.

This article walks you through building a JavaScript SEO content analyzer that audits your pages programmatically so you catch these issues before Google does.

Why JavaScript Apps Break SEO (And Why Linters Won't Save You)

Most SEO problems in JS-heavy apps aren't syntax errors. They're runtime timing issues. Your <title> tag gets injected after the initial HTML is served. A crawler with a short render timeout sees nothing.

The classic mistakes:

Meta tags set inside useEffect or lifecycle hooks (too late for some crawlers)
Dynamic Open Graph tags that depend on API responses
Missing rel="canonical" on paginated or filtered routes
robots meta tags accidentally blocking indexing in staging configs that leaked to production

A manual audit catches maybe 30% of these. Automated analysis catches the rest, consistently, on every build.

Step 1: Fetch and Parse Page Metadata Programmatically

Start simple. Here's a Node.js script that fetches a URL and extracts SEO-critical tags:

import * as cheerio from 'cheerio';

async function analyzeSEO(url) {
  const res = await fetch(url);
  const html = await res.text();
  const $ = cheerio.load(html);

  return {
    title: $('title').text() || null,
    description: $('meta[name="description"]').attr('content') || null,
    canonical: $('link[rel="canonical"]').attr('href') || null,
    robots: $('meta[name="robots"]').attr('content') || null,
    ogTitle: $('meta[property="og:title"]').attr('content') || null,
    ogImage: $('meta[property="og:image"]').attr('content') || null,
    h1Count: $('h1').length,
    h1Text: $('h1').first().text() || null,
  };
}

// Usage
const result = await analyzeSEO('https://yoursite.com/blog/my-post');
console.log(result);

Install cheerio: npm install cheerio

Result: You now have a structured object representing the SEO state of any page. Run this across your sitemap and you've got a full audit.

Step 2: Score and Flag Issues Automatically

Raw data is useful. Scored data is actionable. Add a simple scoring layer:

function scoreResult(data) {
  const issues = [];
  let score = 100;

  if (!data.title) {
    issues.push({ severity: 'critical', message: 'Missing <title> tag' });
    score -= 30;
  } else if (data.title.length < 30 || data.title.length > 60) {
    issues.push({ severity: 'warning', message: `Title length (${data.title.length} chars) — optimal is 30–60` });
    score -= 10;
  }

  if (!data.description) {
    issues.push({ severity: 'critical', message: 'Missing meta description' });
    score -= 20;
  } else if (data.description.length > 160) {
    issues.push({ severity: 'warning', message: 'Meta description exceeds 160 chars, may be truncated' });
    score -= 5;
  }

  if (!data.canonical) {
    issues.push({ severity: 'warning', message: 'No canonical URL specified' });
    score -= 10;
  }

  if (data.robots && data.robots.includes('noindex')) {
    issues.push({ severity: 'critical', message: 'Page is set to noindex, will not be crawled!' });
    score -= 40;
  }

  if (data.h1Count === 0) {
    issues.push({ severity: 'warning', message: 'No H1 found on page' });
    score -= 10;
  } else if (data.h1Count > 1) {
    issues.push({ severity: 'info', message: `Multiple H1s found (${data.h1Count}), consider consolidating` });
    score -= 5;
  }

  return { score: Math.max(0, score), issues };
}

Combine both functions and you have a working CLI analyzer. Pipe it into your CI pipeline and fail builds when critical issues are found.

Step 3: Scale It Across Your Entire Sitemap

Single-page analysis is fine for debugging. For real coverage, parse your sitemap and batch-analyze every URL:

import { XMLParser } from 'fast-xml-parser';

async function analyzeSitemap(sitemapUrl) {
  const res = await fetch(sitemapUrl);
  const xml = await res.text();
  const parser = new XMLParser();
  const parsed = parser.parse(xml);

  const urls = parsed.urlset.url.map(u => u.loc);

  const results = await Promise.allSettled(
    urls.map(async (url) => {
      const data = await analyzeSEO(url);
      const scored = scoreResult(data);
      return { url, ...scored, ...data };
    })
  );

  return results
    .filter(r => r.status === 'fulfilled')
    .map(r => r.value)
    .sort((a, b) => a.score - b.score); // worst first
}

// Usage
const report = await analyzeSitemap('https://yoursite.com/sitemap.xml');
console.table(report.map(r => ({ url: r.url, score: r.score, issues: r.issues.length })));

Install: npm install fast-xml-parser

Running this on a 200-page site takes about 15-20 seconds and gives you a prioritized list of pages that need attention, sorted by SEO health score, worst first.

Step 4: Where a Library Like power-seo Fits In

Once you've built this yourself, you understand what a good SEO analysis library needs to do. I've been experimenting with @power-seo as a drop-in that handles a lot of this boilerplate, including structured data validation, heading hierarchy analysis, and image alt-text checks that the script above doesn't cover.

Here's the same audit using it:

import { analyzePage } from '@power-seo/core';

const report = await analyzePage('https://yoursite.com/blog/my-post');

console.log(report.score);       // 0-100
console.log(report.issues);      // categorized by severity
console.log(report.structured);  // JSON-LD / schema.org validation

It's worth trying if you want to skip the boilerplate. There's also a deeper write-up on auditing modern JS apps over at ccbd.dev/blog/seo-analysis-for-javascript-auditing-and-ranking-modern-web-apps if you want to go further.

What I Learned

Lighthouse scores and SEO health are not the same thing. A perfect performance score can coexist with completely broken meta tags.
The noindex bug is more common than you think. Staging configs get merged. Always add a CI check for it.
Batch sitemap auditing finds the long tail. Your homepage is probably fine. It's page 47 of your blog archive that's broken.
Build the tool first, then evaluate libraries. Understanding what cheerio-based parsing actually does makes you a better consumer of any abstraction on top of it.

If you want to try the pre-built approach, here's the repo: https://github.com/CyberCraftBD/power-seo

What's Your SEO Debugging Story?

Have you been burned by a JS rendering issue that killed your search rankings? Drop it in the comments. I'm curious how common the noindex leak is out in the wild. And if you extend this analyzer with keyword density checks or structured data validation, share what you built. 🔍