Juan Camilo Auriti

Posted on Apr 2

GEO Optimizer v4.0.0 is Stable — What We Fixed, What We Built, What's Next

#ai #webdev #python #opensource

v4.0.0 is out.

Not beta. Not release candidate. Stable.

This post is the honest story of how it got there: four beta releases, a full architectural rewrite, 13 security fixes, and a few new capabilities we are genuinely excited about.

pip install geo-optimizer-skill
# or upgrade from any previous version
pip install --upgrade geo-optimizer-skill

Why a Major Version?

GEO Optimizer started as a focused CLI tool in early 2025. By v3.x, the core audit engine — audit.py — had grown to 2270 lines. A single file handling HTTP fetching, scoring logic, content parsing, schema validation, bot detection, and output formatting. It worked. It was also increasingly difficult to test, extend, or reason about.

v4.0.0 fixes that, and along the way adds several new detection capabilities that the original architecture could not have supported cleanly.

Here is what changed.

Architecture: From Monolith to 12 Focused Modules

The 2270-line audit.py is gone. Its responsibilities now live in 12 separate modules, each with a single job:

Module	Responsibility
`audit.py`	Orchestration only — calls the others, assembles results
`citability.py`	42 citability signals (3200 lines, now properly isolated)
`scoring.py`	Score calculation, thresholds, grade assignment
`fixer.py`	Fix file generation (robots, llms, schema, meta, ai_discovery)
`injection_detector.py`	Prompt injection detection
`registry.py`	Plugin system via entry points

This matters for contributors. Before v4, adding a new audit check meant navigating a 2270-line file and hoping you did not break something adjacent. Now you add a module, register it via entry_points, and the plugin system handles the rest.

The public API is unchanged:

from geo_optimizer.core.audit import run_full_audit, run_full_audit_async

result = run_full_audit("https://yoursite.com")
print(result.score.total)   # same as before

13 Security Fixes

The beta cycle surfaced a cluster of security issues. All thirteen are patched in v4.0.0 stable.

The most important ones:

DNS Pinning — fetch_url() now validates the resolved IP against a block list before making the request. This prevents SSRF attacks where a user-supplied URL resolves to an internal network address (169.254.x.x, 10.x.x.x, 172.16-31.x.x, ::1, etc.). The check happens after DNS resolution, not before, so it handles DNS rebinding.

XSS Sanitization — The HTML report formatter now escapes all user-controlled content before rendering. Previously, a page title containing <script> would execute in the generated report.

SSRF Prevention — Every URL accepted from user input — CLI --url, API POST /api/audit, MCP tool arguments — goes through validators.resolve_and_validate_url() before any HTTP call is made. There is no code path that calls requests.get() directly on user input.

Auth Bypass — The /report/{id} endpoint now correctly enforces GEO_API_TOKEN when the token is set. Previously, it skipped authentication on the report retrieval endpoint even when the audit endpoint required it.

CSP Hardening — The web app now sets Content-Security-Policy, X-Content-Type-Options, X-Frame-Options, and Referrer-Policy headers on all responses.

If you are running the web demo in a semi-public context — internal tooling, shared staging, anything other than localhost — upgrading is not optional.

27 AI Bots (Was 24)

The tracked bot list grew from 24 to 27. Three additions: Applebot-Extended (Apple Intelligence), Kangaroo Bot (AI2 / Allen Institute), and Meta-ExternalFetcher (Meta AI).

The robots.txt audit checks for all 27. The fixer generates entries for all 27.

geo audit --url https://yoursite.com --format json | jq '.robots.missing_bots'
# ["Applebot-Extended", "Meta-ExternalFetcher", ...]

10 MCP Tools (Was 8)

The MCP server, which lets Claude Code, Cursor, and Windsurf query GEO Optimizer directly, gains two new tools in v4.0.0:

geo_trust_stack — returns the Trust Stack Score breakdown (see below)
geo_injection_scan — runs prompt injection detection on a page (see below)

Setup is unchanged:

# Claude Code
claude mcp add geo-optimizer -- geo-mcp

# Cursor — .cursor/mcp.json
{
  "mcpServers": {
    "geo-optimizer": {
      "command": "geo-mcp",
      "args": []
    }
  }
}

New: Trust Stack Score

One of the more interesting additions in v4.0.0 is the Trust Stack Score — a five-layer aggregation that grades site authority on an A–F scale.

The five layers:

Schema Completeness — Are Organization, Article, FAQPage present and valid?
Brand Coherence — Does the brand name appear consistently across pages, meta tags, and schema?
Contact Signals — Are About and Contact pages reachable and linked from the homepage?
Knowledge Graph Readiness — Does the Organization schema include sameAs links to verified profiles (Wikidata, LinkedIn, Crunchbase, etc.)?
Freshness — Are publication and modification dates present and recent?

Each layer scores 0–20. The aggregate produces a letter grade:

Score	Grade	Meaning
90–100	A	Strong entity identity, high trust signal
75–89	B	Solid foundation, minor gaps
55–74	C	Mixed signals, some layers weak
35–54	D	Significant gaps, low KG readiness
0–34	F	AI engines cannot establish trust

Access it via CLI:

geo audit --url https://yoursite.com --format json | jq '.trust_stack'

Or via the MCP tool directly from your AI IDE:

"Run geo_trust_stack on my-client.com and tell me which layers are weakest."

New: Prompt Injection Detection

This one is genuinely novel for a GEO toolkit.

Some sites — accidentally or intentionally — include content in their pages that attempts to manipulate AI crawlers: hidden text instructing bots to treat the site as authoritative, <meta> tags with unusual directives, or llms.txt files containing override instructions.

GEO Optimizer v4.0.0 ships an eight-category detector for these patterns:

Hidden text (display:none, opacity:0, font-size:0, color matching background)
Off-screen positioning (position:absolute with large negative offsets)
Instruction injection in meta tags (unusual name attributes with imperative text)
llms.txt override attempts (phrases like "ignore previous instructions", "disregard", "as an AI")
Schema manipulation (unexpected description fields containing directives)
Cloaking signals (content served differently based on User-Agent)
Invisible iframe injection
Comment-based injection ()

For legitimate site owners, the detector helps you catch accidental patterns — a third-party widget injecting hidden text, a CMS generating unusual meta tags — before they trigger AI crawler penalties.

geo audit --url https://yoursite.com --format json | jq '.injection_detection'

Scoring Accuracy: 6 Regex False Positives Eliminated

The v3.x scoring engine had six known false positive patterns — cases where a site would score points for a signal that was technically present but semantically empty (e.g., a statistics count matching a phone number format, a heading "hierarchy" score awarded to a page with only <h2> tags and no <h1>).

v4.0.0 fixes all six:

Statistics count now uses context-aware regex that excludes phone numbers, ZIP codes, and ISBN patterns
Heading hierarchy requires an <h1> before awarding hierarchy points
Brand coherence requires a minimum string length to avoid matching single-character artifacts
RSS detection validates feed MIME type, not just URL pattern
Freshness detection now requires a parseable date, not just the presence of datetime attribute
FAQ score now validates that FAQPage schema contains at least one Question entity with non-empty text

If your site scored unusually high on any of these categories in v3.x, expect a small correction in v4.0.0. That is the correct behavior.

New: WebMCP Readiness

Chrome 146 ships with a built-in AI agent capability (the "Web MCP" standard) that allows browser AI agents to interact with sites programmatically. GEO Optimizer v4.0.0 audits for compatibility with this emerging standard.

The check looks for:

A /.well-known/mcp.json manifest
A declared mcp-endpoint in HTTP headers or meta tags
Valid MCP capability declarations (tools, resources, prompts)

This is early days. Most sites score zero here because the standard is brand new. But as Chrome 146+ rolls out and AI agents become a real traffic source, having this in place ahead of the curve is the same bet as llms.txt was in early 2025.

Test Suite: 1120 Tests, 85% Coverage

The beta cycle added 90 tests beyond the 1030 in v3.x. All tests remain fully mocked — no real HTTP calls, no external dependencies.

pytest tests/ -v --cov=geo_optimizer
# 1120 passed, 0 failed
# coverage: 85%

The coverage target for v4.1.0 is 90%. The gap is mostly edge cases in the injection detector and schema validation paths.

Full English Codebase

Previous versions had a mix of Italian and English in code comments, docstrings, and error messages. v4.0.0 standardizes on English throughout the codebase — consistent with being an open-source project accepting contributions from a global audience.

The GEO_LANG environment variable still controls CLI output language (it or en). That is user-facing and unchanged.

Upgrading from v3.x

No breaking changes in the public API. The run_full_audit() and run_full_audit_async() function signatures are unchanged, and the result dataclasses are backward compatible.

The only thing to be aware of: the six scoring corrections above. If you have CI thresholds tuned against v3.x scores, run an audit on v4.0.0 before tightening those thresholds.

pip install --upgrade geo-optimizer-skill
geo audit --url https://yoursite.com

What's Next (v4.1.0)

The roadmap for v4.1.0 has three items:

Async batch audits — audit up to 50 URLs concurrently via geo audit --urls-file sites.txt
Historical tracking — geo audit will optionally write results to a local SQLite database, enabling trend charts via geo history
Playwright rendering mode — optional JS rendering pass for SPAs (opt-in, not default)

If any of these is a priority for your use case, open an issue and we will prioritize accordingly.

Try It

Web demo (no install):

geo-optimizer-web.onrender.com

CLI:

pip install geo-optimizer-skill
geo audit --url https://yoursite.com

GitHub (star if you find it useful):

github.com/Auriti-Labs/geo-optimizer-skill

Questions about a specific audit result or the new detection capabilities? Drop them in the comments — I read everything.

Top comments (3)

Jonathan Murray • Apr 4

"Four betas, a full architectural rewrite, and 13 security fixes" is the honest version of a stable release announcement. Most changelogs skip the archaeology and present v1.0.0 as if it arrived fully formed. The actual story — what broke, what you redesigned, what you found when you took security seriously — is more useful to people evaluating whether to depend on the project.

The architectural rewrite detail is particularly interesting from a maintainability standpoint. Was the rewrite driven by a performance problem, a design decision that didn't scale to new use cases, or was it primarily the security findings requiring structural changes? The answer tells you a lot about the project's likely trajectory.

Juan Camilo Auriti • Apr 28

Great question, the rewrite was not driven by a single cause.

The honest answer is: maintainability came first, new use cases came second, and the security findings turned the rewrite into a hard requirement.

By v3.x, the project still worked, but the core audit flow had become too dense. audit.py was doing too many things at once: fetching, parsing, scoring, schema checks, bot detection, reporting, and parts of the web/API behavior. That made every new feature more expensive than it should have been.

The main architectural problem was not raw performance. It was coupling.

Adding a new audit signal meant touching code that was too close to unrelated responsibilities. That is dangerous in an audit tool, because a small change in one area can silently affect scoring, output structure, or validation behavior somewhere else.

Security made that even more obvious.

Once I started treating user-supplied URLs, generated reports, API endpoints, and MCP tool inputs as serious attack surfaces, the old structure was no longer acceptable. Fixing individual issues would have patched the symptoms, but it would not have given the project a safer long-term shape.

So v4 separates the system around clearer boundaries:

URL validation and SSRF prevention are centralized
scoring is isolated from fetching and parsing
prompt injection detection lives in its own module
report generation has its own sanitization path
MCP/API/CLI flows reuse the same validation logic
the audit orchestrator coordinates instead of owning everything

That makes the project easier to extend, but also easier to reason about when something security-sensitive changes.

The trajectory from here is probably less about “big rewrites” and more about adding capabilities on top of this structure: batch audits, historical tracking, rendering support for SPAs, stronger citation-quality checks, and better developer-facing integrations.

So, in short: the rewrite started because the design was reaching its limit. Security review confirmed that the old structure had to go. Performance was a benefit, but not the main driver.

Saul Fleischman • May 31 • Edited

GEO and SEO are converging.

Buyers asking ChatGPT "is X any good?" often verify on Google with reviews right after. If you're winning GEO but losing SEO (broken schema, thin pages, missing canonical), the citation doesn't convert. And the inverse — strong SEO with no AI presence — leaves you out of the first-answer conversation. Most teams I see solve only one side. The same content pipeline can fuel both if you instrument them together.

We need to regard the framing of LLM output as "ephemeral" and "non-deterministic." That's the part founders new to this space tend to miss when they assume GEO behaves like SEO.

A few additions from building in this space (I run MentionFox.com, which sits at the GEO + SEO + outreach intersection):

The tracking-vs-fixing gap is the real story for indie / B2B teams. Most tools on this list tell you whether you got cited. They don't change the underlying signal. That's like a rank tracker that can't suggest what to write. Useful telemetry, but the buyer's actual question is: once I know I'm invisible for "best payment processing API" — what specific action gets me cited next month?

It is worth checking how many LLMs a tool actually measures against on each measurement day vs. just claims to support. Coverage varies more than the marketing pages suggest. For a defensible study you want at least the 8 consumer flagships — ChatGPT, Claude, Gemini, Mistral, Grok, Perplexity, DeepSeek, Cohere — running real buyer queries. Fewer than that and individual-model quirks dominate the data.