Juan Camilo Auriti

Posted on Mar 16

Your site ranks #1 on Google but AI never cites it. Here's how to fix it (open source tool)

#ai #opensource #python #seo

GEO (Generative Engine Optimization) is the new discipline that makes your site visible to ChatGPT, Perplexity, Claude, and Gemini. I built an open source Python toolkit to audit and fix it in 15 minutes

You worked hard to get that first page ranking on Google.

But the other day, someone asked ChatGPT about your niche — and your competitor got cited. Not you.

That's not a coincidence. That's a structural problem, and it has a name: your site isn't optimized for AI search engines.

This post explains why, what GEO (Generative Engine Optimization) actually is, and how I built an open source Python tool — geo-optimizer-skill — to audit and fix it in about 15 minutes.

The problem nobody is talking about

AI search engines don't work like Google. They don't give you a list of ten blue links. They give a direct answer and cite their sources inline.

User: "What's the best way to calculate compound interest?"

Perplexity: "According to [Competitor.com], the standard formula is..."
                        ↑ They appear. You don't.

This means two sites can cover the exact same topic at the same quality level — but only one gets cited, consistently. The difference isn't content quality. It's structural optimization.

If your site isn't telling AI bots what it's about, in the format they understand, you're invisible. Even if you rank #1 on Google.

What is GEO?

GEO (Generative Engine Optimization) is the practice of optimizing web content to be cited by AI-powered search engines — not just crawled and ranked by traditional algorithms.

The term comes from a 2024 research paper by Princeton University (published at KDD 2024), which ran over 10,000 real queries against Perplexity.ai and measured which content optimization techniques actually increased citation frequency.

The results were striking:

| Method | Visibility increase |

|--------|-------------------|
| Cite authoritative sources | up to +115% |
| Add statistics and data | +40% average |
| Fluency optimization | +15–30% |
| Keyword stuffing | ~0% (ineffective) |

The paper is here if you want to read it: arxiv.org/abs/2311.09735

These aren't SEO myths. This is empirical data from controlled experiments on a live AI search engine.

Why I built this tool

I've been a full-stack developer for 15 years, mostly working on web projects for private clients — e-commerce, SaaS, content sites.

About a year ago I started noticing a pattern: clients were asking "why doesn't our site appear in ChatGPT answers?". At first I brushed it off. Then I started digging into the Princeton research, and realized this was a real, fixable, technical problem.

The issue was that there was no reliable way to audit a site for GEO compliance. SEO tools didn't check for it. No CLI existed. You had to manually go through a checklist.

So I built geo-optimizer-skill — a Python toolkit that:

Audits any site and gives it a GEO score from 0 to 100
Generates the /llms.txt file AI engines use to understand your site
Generates and injects JSON-LD structured data (WebSite, FAQPage, WebApplication)

What the tool checks

The audit covers 5 categories, each with a weighted score:

1. robots.txt — AI Bot Access (20 pts)
Are the major AI crawlers allowed? GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot... The tool checks all 13 critical bots and tells you exactly which ones are missing.

2. /llms.txt — The AI index file (20 pts)
This is a machine-readable file at your site's root that tells AI engines what your site is about, how it's structured, and which pages matter. Think of it as a sitemap designed for LLMs. The tool checks if it exists, if it's properly structured, and if it has enough content.

3. JSON-LD Schema (25 pts)
Structured data is how you tell AI engines your content type, author, topic, and relationships. The tool checks for WebSite, WebApplication, and FAQPage schemas — the three most impactful for AI citation.

4. Meta tags (20 pts)
Title, description, canonical, Open Graph. These affect how AI engines parse and excerpt your content.

5. Content quality (15 pts)
Heading structure, presence of statistics, external citations, word count. Directly mapped to Princeton GEO methods.

Quick start

Install with pip:

pip install geo-optimizer-skill

Run an audit on any public URL:

geo audit --url https://yoursite.com

Here's what the output looks like:

============================================================
  1. ROBOTS.TXT — AI Bot Access
============================================================
  ✅ GPTBot allowed ✓
  ✅ ClaudeBot allowed ✓
  ✅ PerplexityBot allowed ✓
  ⚠️  meta-externalagent not configured (Meta AI)

============================================================
  2. LLMS.TXT — AI Index File
============================================================
  ✅ llms.txt found (200, 6517 bytes)
  ✅ H1 present, blockquote description present
  ✅ 6 H2 sections, 46 links to site pages

============================================================
  3. SCHEMA JSON-LD — Structured Data
============================================================
  ✅ Found 2 JSON-LD blocks
  ✅ WebSite schema ✓
  ⚠️  FAQPage schema missing

============================================================
  5. CONTENT QUALITY — GEO Best Practices
============================================================
  ✅ Good heading structure: 31 headings (H1–H4)
  ✅ Numerical data present: 15 numbers/statistics ✓
  ⚠️  No external source links detected

============================================================
  📊 FINAL GEO SCORE
============================================================

  [█████████████████░░░] 85/100
  ✅ GOOD — Core optimizations in place, fine-tune content and schema

  📋 NEXT PRIORITY STEPS:
  1. Add FAQPage schema with frequently asked questions
  2. Cite authoritative sources with external links

Clear, actionable, no guessing.

Generate your /llms.txt automatically

If your site has a sitemap, the tool can auto-generate a properly structured /llms.txt file:

geo llms \
  --base-url https://yoursite.com \
  --site-name "MySite" \
  --description "Short description of what your site does" \
  --output ./public/llms.txt

It auto-detects your sitemap, groups URLs by category, and produces a structured Markdown file that follows the llmstxt.org spec.

Generate and inject JSON-LD schema

# Analyze an existing HTML file
geo schema --file index.html --analyze

# Generate a WebSite schema
geo schema --type website --name "MySite" --url https://yoursite.com

# Inject FAQPage schema directly into a file
geo schema --file page.html --type faq --inject

# Generate snippet for Astro BaseLayout
geo schema --astro --name "MySite" --url https://yoursite.com

Supported schema types: website, webapp, faq, article, organization, breadcrumb.

CI/CD integration

The audit supports JSON output, so you can gate deployments on your GEO score:

geo audit --url https://yoursite.com --format json --output report.json

SCORE=$(jq '.score' report.json)
if [ "$SCORE" -lt 70 ]; then
  echo "GEO score too low: $SCORE/100"
  exit 1
fi

A GEO regression check in your pipeline. Same concept as a Lighthouse performance budget.

The 9 Princeton GEO methods

Based on the research paper, these are the techniques that actually move the needle on AI citation rates, in priority order:

Priority	Method	Measured impact
🔴 1	Cite authoritative sources — link to external sources with credibility	+30–115%
🔴 2	Add statistics — specific numbers, percentages, dates, measurements	+40%
🟠 3	Quote experts — direct quotes with attribution	+30–40%
🟠 4	Authoritative tone — precise, expert language	+6–12%
🟡 5	Fluency optimization — clear sentences, logical flow	+15–30%
🟡 6	Accessibility — define terms, use analogies	+8–15%
🟢 7	Technical terms — correct industry terminology	+5–10%
🟢 8	Vocabulary variety — avoid repetition	+5–8%
❌ 9	Keyword stuffing — proven ineffective for GEO	~0%

Notice what's at the bottom: keyword density. The technique that still dominates a lot of SEO thinking has zero measurable impact on AI citation. Meanwhile, citing sources — something many sites skip — can more than double your visibility.

GEO checklist before publishing any page

☐ robots.txt — all major AI bots have Allow: /
☐ /llms.txt — present at site root, structured, updated
☐ WebSite schema — in global <head> on all pages
☐ WebApplication schema — on every tool or product page
☐ FAQPage schema — on every page with Q&A content
☐ At least 3 external citations to authoritative sources
☐ At least 5 concrete numerical data points
☐ Meta description — accurate, 120–160 chars
☐ Canonical URL — on every page
☐ Open Graph tags — og:title, og:description, og:image

This checklist is also included in the repo as part of the AI context files.

Use it with your AI coding assistant

The repo includes context files for every major AI coding platform:

Platform	File
Claude Projects	`ai-context/claude-project.md`
ChatGPT Custom GPT	`ai-context/chatgpt-custom-gpt.md`
Cursor	`ai-context/cursor.mdc`
Windsurf	`ai-context/windsurf.md`
Kiro	`ai-context/kiro-steering.md`

Once loaded in your assistant, you can just say: "audit my site" or "generate the llms.txt for this sitemap" and it knows exactly what to do.

What's inside the repo

geo-optimizer/
├── 📄 SKILL.md                     ← Index of AI context files
├── 🧠 ai-context/                  ← Platform-specific context files
├── 🐍 scripts/
│   ├── geo_audit.py                ← Score 0–100, find what's missing
│   ├── generate_llms_txt.py        ← Auto-generate /llms.txt
│   └── schema_injector.py          ← Generate & inject JSON-LD
├── 📚 references/
│   ├── princeton-geo-methods.md    ← The 9 research-backed methods
│   ├── ai-bots-list.md             ← 25+ AI crawlers robots.txt block
│   └── schema-templates.md         ← 8 JSON-LD templates
└── 📁 docs/                        ← Full documentation (9 pages)

800+ tests, MIT license, security-hardened (SSRF prevention, XSS protection, path traversal validation).

The bigger picture

Traditional SEO optimizes for ranking. You try to be relevant to a query so a crawler surfaces you in a list.

GEO optimizes for citation. You try to be the source an AI chooses when constructing an answer. The signal isn't position — it's trust, structure, and explicit authority.

These are different games with different rules. The good news is that GEO-optimized content also tends to be better SEO content: it's more structured, better cited, and more precise. Fixing one often improves the other.

The bad news is that most sites are starting from zero on GEO. The robots.txt is missing critical bots. There's no /llms.txt. The JSON-LD covers only WebSite and nothing else. The content has no external citations.

That's fixable. It just needs a tool.

Try it

pip install geo-optimizer-skill
geo audit --url https://yoursite.com

GitHub: github.com/Auriti-Labs/geo-optimizer-skill

If you run an audit on your site, I'd love to hear what score you get and what the biggest gaps were. Drop it in the comments.

And if this saved you time — a ⭐ on GitHub helps others find it.

Built by Juan Camilo Auriti — full-stack developer and creator of Auriti Labs. Based on Princeton KDD 2024 research.

DEV Community