DEV Community

Cover image for React App is Invisible to Google: How to Fix It With Open-Source SEO Tools
Mitu Das
Mitu Das

Posted on • Originally published at ccbd.dev

React App is Invisible to Google: How to Fix It With Open-Source SEO Tools

I spent three hours debugging why my freshly deployed React app wasn't showing up in Google Search Console. No crawl errors. No manual penalties. Just... nothing indexed. Turns out, Googlebot was fetching a <div id="root"></div> and giving up before hydration ever happened. The fix was four lines of Next.js config, but I only found it because I stopped trusting dashboards and started reading raw HTTP responses myself.

If you've ever shipped a JS-heavy app and watched your organic traffic flatline, this article is for you. We'll build a real, scriptable SEO audit pipeline using open-source tools. No $299/month Ahrefs subscription required.

Why Paid SEO Platforms Hide the Details That Actually Matter

Commercial SEO tools are optimized for marketers, not engineers. They surface scores and suggestions, but they abstract away the raw signals that tell you why something is broken. You get a "3/10 (Low)" on Technical SEO with a recommendation to "improve page speed" and zero stack trace.

Open-source tools give you the opposite: raw data, full programmatic control, and the ability to integrate audits into your CI pipeline. Let's look at three areas where this matters most.

1. Auditing Rendered HTML vs. Server Response

The single biggest SEO trap for React/Vue/Angular apps is the gap between what your browser sees and what Googlebot fetches. Your browser executes JavaScript. Googlebot's crawler often doesn't. Or it does, but with a ~5-second timeout.

The tool: lighthouse (CLI version, npm i -g lighthouse)

# What Googlebot actually receives (no JS execution)
curl -A "Googlebot/2.1" https://yoursite.com | grep -i "<title>\|<meta name=\"description\"\|<h1"

# What a real browser renders
lighthouse https://yoursite.com \
  --only-categories=seo \
  --output=json \
  --output-path=./seo-report.json \
  --chrome-flags="--headless"
Enter fullscreen mode Exit fullscreen mode

Then diff the two:

// parse-seo-report.js
const fs = require("fs");
const report = JSON.parse(fs.readFileSync("./seo-report.json", "utf8"));

const audits = report.categories.seo.auditRefs
  .map((ref) => report.audits[ref.id])
  .filter((a) => a.score !== null && a.score < 1);

audits.forEach((a) => {
  console.log(`❌ [${a.score}] ${a.title}: ${a.description}`);
});
Enter fullscreen mode Exit fullscreen mode

Run this in CI on every deploy. If your title tag is missing from the curl output but present in Lighthouse, you have a server-side rendering gap. That's your indexing problem, right there.

2. Structured Data Validation Without the GUI

Google's Rich Results Test is a great UI tool, but it doesn't scale. If you have 800 product pages, you need programmatic validation.

The tool: schema-dts (TypeScript types for schema.org) + ajv for runtime validation.

npm install schema-dts ajv
Enter fullscreen mode Exit fullscreen mode
// validate-schema.ts
import Ajv from "ajv";
import addFormats from "ajv-formats";

const ajv = new Ajv({ strict: false });
addFormats(ajv);

// Minimal Article schema (extend as needed)
const articleSchema = {
  type: "object",
  required: ["@context", "@type", "headline", "author", "datePublished"],
  properties: {
    "@context": { type: "string", const: "https://schema.org" },
    "@type": { type: "string", const: "Article" },
    headline: { type: "string", maxLength: 110 },
    author: {
      type: "object",
      required: ["@type", "name"],
    },
    datePublished: { type: "string", format: "date-time" },
  },
};

const validate = ajv.compile(articleSchema);

export function auditStructuredData(jsonLd: object): string[] {
  const valid = validate(jsonLd);
  if (valid) return [];
  return (validate.errors ?? []).map(
    (e) => `${e.instancePath || "root"}: ${e.message}`
  );
}

// Usage
const pageData = {
  "@context": "https://schema.org",
  "@type": "Article",
  headline: "My Post",
  // Missing author and datePublished (will error)
};

const errors = auditStructuredData(pageData);
if (errors.length) {
  console.error("Schema errors:", errors);
  process.exit(1); // Fail the build
}
Enter fullscreen mode Exit fullscreen mode

Wire this into your build step and you'll never accidentally ship a product page with broken structured data again.

3. Automated Crawl Audits With power-seo

For a higher-level audit that ties everything together (meta tags, canonical URLs, Open Graph, robots directives, sitemap coverage), I've been using @power-seo. It's an open-source npm package that runs a structured SEO audit on any URL and returns a machine-readable result.

npm install @power-seo
Enter fullscreen mode Exit fullscreen mode
// audit.js
const { auditUrl } = require("@power-seo");

async function runAudit(url) {
  const result = await auditUrl(url);

  const failures = Object.entries(result.checks)
    .filter(([, check]) => check.status === "fail")
    .map(([key, check]) => `  ✗ ${key}: ${check.message}`);

  if (failures.length === 0) {
    console.log(`✅ ${url} passed all checks`);
  } else {
    console.log(`\n❌ ${url} failed ${failures.length} check(s):\n`);
    console.log(failures.join("\n"));
    process.exit(1);
  }
}

runAudit(process.argv[2] || "https://yoursite.com");
Enter fullscreen mode Exit fullscreen mode

Run it as a post-deploy smoke test:

node audit.js https://yoursite.com/blog/my-latest-post
Enter fullscreen mode Exit fullscreen mode

What I like about this approach (and why I write about it over at ccbd.dev) is that it fits into existing engineering workflows. It's not a dashboard you check once a quarter. It's a check that runs on every PR.

What I Learned

  • The crawl gap is real. If your app doesn't render critical meta content server-side, Googlebot may never see it, regardless of how your site looks in a browser.
  • Fail fast in CI. SEO regressions are easy to introduce and hard to notice. Treating structured data and meta tags as testable assets (with exit codes) catches them before they ship.
  • Raw signals beat dashboards. A curl with a Googlebot user-agent tells you more about your indexability than a score out of 100.
  • Open-source composability wins. Combining lighthouse, ajv, and a lightweight audit package like @power-seo gives you a full audit pipeline for exactly $0/month.

What's Your Setup?

Are you running any SEO checks in your CI pipeline, or is auditing still a manual quarterly thing at your company? I'm especially curious whether anyone has wired Lighthouse scores into GitHub Actions with a hard failure threshold. I've been experimenting with that and it's surprisingly effective.

Drop your setup in the comments.

Top comments (0)