shadow88sky

Posted on Mar 12

I Built an Open-Source GEO Audit Tool — Because I Couldn't Find One

#ai #opensource #webdev #showdev

65% of websites unknowingly block AI crawlers. Your site might be invisible to ChatGPT, Gemini, and Claude right now — and you'd never know.

I went looking for an open-source tool to audit websites for AI search readiness. Couldn't find one. So I built it.

What is GEO?

GEO (Generative Engine Optimization) is like SEO, but for AI search engines.

When someone asks ChatGPT "What's the best CRM for small teams?", the AI doesn't return a list of links. It generates an answer by pulling from crawled content, structured data, and knowledge graphs.

If your site blocks AI crawlers or lacks structured data, your brand doesn't exist in that answer.

What the tool checks

rankweave-geo-audit evaluates 4 dimensions:

1. AI Crawler Access (30%)

Checks your robots.txt against 9 AI crawlers:

GPTBot · ChatGPT-User · OAI-SearchBot · ClaudeBot · Claude-Web · Google-Extended · PerplexityBot · Bytespider · CCBot

Fun fact: websites blocking GPTBot get cited 73% less often in ChatGPT responses.

2. Structured Data (25%)

Detects JSON-LD and checks Schema.org type coverage (Organization, WebSite, Product, FAQPage, Article, BreadcrumbList).

GPT-4's accuracy jumps from 16% to 54% when content includes structured data.

3. Knowledge Graph (20%)

Checks if your brand exists in Wikidata, Wikipedia (EN/ZH), and Baidu Baike — the sources AI models use as ground truth.

4. Content Basics (25%)

9 checks: HTTPS, <title>, meta description, OG tags, H1, content length, blog link, FAQ link.

How to use it

npm install rankweave-geo-audit

import { audit } from 'rankweave-geo-audit';

const result = await audit({
  domain: 'example.com',
  companyName: 'Example Inc',
});

console.log(result.overallScore);       // 0-100
console.log(result.dimensions);         // 4 scores with details
console.log(result.recommendations);    // bilingual suggestions (EN/ZH)

The result gives you an overall score (0-100), per-dimension breakdowns, and actionable recommendations in both English and Chinese.

Scoring formula

Overall = Crawler Access × 30%
        + Structured Data × 25%
        + Knowledge Graph × 20%
        + Content Basics  × 25%

Everything is transparent — the weights, the checks, the algorithm. No black boxes.

Design decisions

Only 1 dependency — cheerio for HTML parsing. HTTP uses Node 18+ built-in fetch.
Parallel execution — Crawler access, knowledge graph, and homepage fetch run concurrently.
Pure functions — No database, no framework. Just audit(options) → result object.
ESM + TypeScript — Full type definitions included.
Progress callback — onProgress for real-time updates in your UI.

What it doesn't do

This covers the technical foundation — can AI crawl and understand your site?

It doesn't tell you whether AI engines are actually mentioning your brand in their answers. That requires querying live AI engines, which is a different problem.

DEV Community