Gregory Shevchenko

Posted on Jun 2 • Originally published at gregshevchenko.com

The open-source AI Search visibility audit stack I’m building

#ai #opensource #seo #testing

Most AI Search visibility work becomes vague too early.

A team asks, “How do we show up in ChatGPT?”

Then the work jumps straight to prompts, dashboards, content ideas, competitor checks, and brand-mention screenshots.

Those things can matter.

But if the page layer is messy, the rest of the system becomes hard to interpret. A weak citation rate may be a content problem, a crawl problem, a schema problem, an entity problem, or a page that is not represented clearly enough for machines to reuse.

So I am building geo-audit as the open-source, inspectable layer of my AI Search visibility workflow. The first public slice is intentionally boring: crawl the site, inspect the head tags, parse JSON-LD, check canonical URLs, and turn that into a repeatable proof packet before asking an LLM to judge anything. [1]

Why start with deterministic gates?

Because deterministic checks remove preventable noise.

Before a team asks an LLM whether a page is “good for AI Search,” I want to know simpler things:

Can the route be fetched?
Is the final URL stable?
Is there one clear title?
Is there a meta description?
Does the canonical URL match the intended source of record?
Is there a visible H1?
Does JSON-LD exist?
Does JSON-LD parse?
Are obvious noindex or schema gaps present?

None of that requires a model.

That is the point.

AI Search visibility work needs LLM checks later, but the base layer should be code-level, repeatable, and boring enough to rerun after every change.

What went public first?

The first public release adds two modules to geo-audit: site-crawl-lite and head-schema-gate. [2]

site-crawl-lite gives a small-site route inventory. It checks status, final URL, title, meta description, canonical, H1, word count, JSON-LD count and types, link counts, image alt counts, and noindex state.

head-schema-gate checks the homepage or target route more directly: title, description, canonical, H1, Open Graph, JSON-LD parse errors, Article author sameAs, BreadcrumbList, and FAQPage signals.

These modules are informational gates for now. They produce scores and action items, but they do not silently change the existing composite methodology.

That was deliberate.

A public audit tool should not move scoring goalposts in the same change that adds new checks.

How does the secrets boundary work?

The public repository must never contain real API keys, private credentials, internal hostnames, or personal secrets.

Users who clone the repo bring their own keys.

The deterministic modules run without paid APIs. If someone wants richer provider checks, they can copy .env.example into a local, gitignored .env file and configure their own credentials. [3] [4]

The boundary is simple:

Public repo = safe code, docs, placeholders, tests, and trust checks.

Private workspace = local keys, team credentials, configured providers, and deployment-specific proof.

That boundary matters more when agents are involved.

If an agent can safely improve public code without touching secrets, the tool can evolve in public. If the same stack runs in a private environment with configured keys, it can do richer audits without leaking the private layer.

Where does this fit in the broader stack?

I do not think one tool replaces everything.

The operating stack has layers.

First, crawl and head/schema gates answer the technical baseline question: can the site be fetched and represented cleanly enough for search engines, answer engines, and social surfaces?

Second, ContentOS readiness answers the publishing question: does the page have a source pack, claims, evidence, answer units, FAQs, and human review before publication? [6]

Third, distribution checks answer the authority question: do Medium, LinkedIn, Habr, VC.ru, X, Substack, GitHub, and profile pages route authority back to the canonical URL?

Fourth, measurement answers the business question: do prompt sets, citations, source context, competitors, and downstream traffic change after the work ships? [5]

geo-audit is strongest at the first layer today.

That is okay. A good open-source base should be small enough to inspect and useful enough to run.

What did the first proof find?

I ran the two new gates against my own site before publishing the canonical note.

The result was not a dramatic failure, which is exactly what a baseline gate should show after recent technical cleanup: site-crawl-lite returned 99/100 across 19 checked routes, and head-schema-gate returned 94/100 on the homepage. [2]

The remaining notes were small follow-ups: one route without JSON-LD and a BreadcrumbList recommendation where breadcrumb-like markup already exists.

That is useful signal.

It says the next improvement is a schema consistency pass, not a panic rewrite.

What does this not replace?

This does not replace Screaming Frog, Sitebulb, Oncrawl, log-file analysis, enterprise crawls, full keyword suites, or paid brand-monitoring products.

Those tools are still useful.

The point is different.

I want an install-first, agent-friendly, testable stack that can run inside a repo workflow, produce proof artifacts, keep secrets local, and explain exactly what it checked.

That makes it easier to improve the process in public and then write about the improvement with the source code attached.

What I want to build next

The next modules are not glamorous.

That is a feature.

I want an internal-link graph, route-readiness runner, image-alt gate, sitemap/feed/llms consistency checker, and a stronger bridge from ContentOS source packs into publish-readiness scoring.

The pattern I want to keep is simple:

Build a deterministic layer.

Prove it on my own site.

Publish the code.

Write the canonical note.

Then distribute the idea only after the first-party page is the source of record.

The canonical version of this article lives on my site, where I keep the related pages and distribution links updated. [7]

FAQ

Is geo-audit a replacement for Screaming Frog or Sitebulb?

No. It is an inspectable AI Search visibility audit layer. Enterprise crawlers still matter for large-scale crawling, log-file analysis, and advanced technical SEO workflows.

Does the public repo contain API keys?

No. Public users bring their own keys through local environment variables or a gitignored .env file. The public repository should contain placeholders and documentation, not real credentials. [3]

Can the tool run without paid APIs?

Yes. The deterministic modules run without paid API keys. Optional keys unlock richer brand-mention, PageSpeed, and provider-specific checks. [4]

Why start with crawl and head/schema gates?

Because LLM scoring is less useful when a page is missing canonical tags, titles, descriptions, JSON-LD, or crawlable routes. Deterministic checks remove preventable noise first.