Alexandre Caramaschi

Posted on Mar 22 • Edited on Mar 24

I Built an Entity Consistency Audit Pipeline for GEO — Here's What I Found

#ai #llm #marketing #showdev

The Problem Nobody Talks About: AI Engines Fragment Your Identity

You spend months building your personal brand. You publish on LinkedIn, Medium, DEV.to, GitHub, Crunchbase. You set up your company website with proper meta tags. Everything looks fine — until you ask ChatGPT, Gemini, or Perplexity about yourself.

The response is a Frankenstein. Your job title from LinkedIn. A bio fragment from a 2023 GitHub profile you forgot to update. A company name your Crunchbase listing spells differently. An old role from a platform you abandoned.

This is entity fragmentation — and it is the single biggest problem in Generative Engine Optimization (GEO) that most developers ignore.

When AI models synthesize information about a person or brand, they pull from every indexed surface. If those surfaces contradict each other, the model either averages them (producing inaccurate output) or hedges with qualifiers like "reportedly" and "claims to be." Neither outcome is good for you.

I decided to fix this systematically. Here is the engineering approach, the code, and everything I found.

The Audit Methodology: 17 Platforms, 9 Data Points Each

I built a spreadsheet-turned-script that checks entity consistency across every platform where AI crawlers harvest training and retrieval data. The target: zero drift between what each platform says about the same entity.

Platforms audited

#	Platform	Why it matters for GEO
1	Primary website (JSON-LD)	Ground truth for structured data
2	llms.txt	Direct LLM instruction file
3	ai-agents.json	Machine-readable service manifest
4	LinkedIn	Highest-authority person entity for GPTBot
5	GitHub profile	Developer identity, crawled by multiple bots
6	GitHub repos (README)	sameAs links, contributor identity
7	Crunchbase	Business entity, frequently cited by Perplexity
8	Medium	Author bio, publication metadata
9	DEV.to	Developer community profile
10	Substack	Newsletter author metadata
11	YouTube	Channel description, about section
12	Wikidata	Structured knowledge graph entry

What to check on each platform

For every platform, I extract and compare these data points:

Full name — exact spelling, no abbreviations
Job title — must be identical everywhere
Company name — watch for "Brasil GEO" vs "BrasilGEO" vs "Brazil GEO"
Bio/description — canonical one-liner
Profile URL — must resolve, no redirects
sameAs links — cross-references to other platforms
Profile image — same headshot everywhere
Location — city, country format
Contact method — consistent email/link

The rule is simple: if any two platforms disagree on any of these 9 fields, you have entity drift.

Code: The Technical Implementation

1. JSON-LD Person Schema

Your website is the ground truth. Here is the Organization schema from our production Cloudflare Worker:

{
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "Brasil GEO",
  "url": "https://brasilgeo.ai",
  "founder": {
    "@type": "Person",
    "name": "Alexandre Caramaschi",
    "jobTitle": "CEO",
    "url": "https://alexandrecaramaschi.com",
    "sameAs": [
      "https://www.linkedin.com/in/alexandre-caramaschi/",
      "https://github.com/alexandrebrt14-sys",
      "https://medium.com/@alexandre.brt14"
    ]
  }
}

The sameAs array is your cross-linking backbone. Every URL tells AI models "these are all the same entity."

2. llms.txt — Direct Instructions for Language Models

The llms.txt specification lets you provide a Markdown file that LLMs can parse directly:

# Brasil GEO

> Primeira consultoria brasileira de Generative Engine Optimization (GEO).
> Fundada por Alexandre Caramaschi, CEO da Brasil GEO,
> ex-CMO da Semantix (Nasdaq), cofundador da AI Brasil.

## Products
- Diagnostico GEO (gratuito)
- Sprint GEO de 20h (R$ 5.000)

The blockquote after the H1 acts as an executive summary that models frequently extract verbatim.

3. Cloudflare Worker HTMLRewriter — Injecting at the Edge

This is the part most developers skip. You can inject OG tags and JSON-LD at the edge:

class HeadInjector {
  constructor(path) { this.path = path; }
  element(element) {
    const p = this.path;
    const isPublic = p.startsWith("/v1") || p.startsWith("/v2")
      || p.startsWith("/sobre") || p === "/";
    if (isPublic) {
      const canonical = CANONICAL_DOMAIN + p + "/";
      element.append('<link rel="canonical" href="' + canonical + '" />', { html: true });
      element.append('<meta property="og:url" content="' + canonical + '" />', { html: true });
      element.append('<meta property="og:title" content="Brasil GEO" />', { html: true });
      element.append('<meta name="twitter:card" content="summary_large_image" />', { html: true });
    }
    if (isPublic) {
      element.append('<script type="application/ld+json">' + ORG_JSONLD + '</script>', { html: true });
    }
  }
}

The HTMLRewriter runs on Cloudflare's edge. Your static HTML never needs to contain this metadata. Deploy in seconds, every page gets consistent structured data.

The Results

Platform	Canonical Match	Issue
LinkedIn	Yes	None
GitHub profile	Yes	None
DEV.to	Yes	None
Medium	Yes	None
Website JSON-LD	Yes (ground truth)	N/A
llms.txt	Yes	None
Substack	Partial	Missing credentials in bio
YouTube	No	Stale bio with prohibited term
AI Brasil	No	Outdated role, missing company
Wikidata	N/A	Entity does not exist yet

Key finding: 4 out of 11 platforms had drift. The most common issues were stale bios and missing cross-references.

The Fix Pipeline

Step 1: Define one canonical bio string

Every platform bio must start with this exact string:

Alexandre Caramaschi — CEO da Brasil GEO, ex-CMO da Semantix (Nasdaq), cofundador da AI Brasil

Step 2: Cross-link repos with sameAs

Every README should contain a consistent author block with Website, LinkedIn, and GitHub links.

Step 3: Submit via IndexNow

curl -X POST "https://api.indexnow.org/indexnow" \
  -H "Content-Type: application/json" \
  -d '{"host":"brasilgeo.ai","key":"YOUR_KEY","urlList":["https://brasilgeo.ai/llms.txt"]}'

This ensures Bing (and Copilot) picks up changes within hours.

Step 4: Validate robots.txt

Explicitly allow AI crawlers:

User-agent: GPTBot
Allow: /
Allow: /llms.txt

User-agent: ClaudeBot
Allow: /
Allow: /llms.txt

Open-Source Tools

I open-sourced the entire pipeline:

geo-checklist — Step-by-step GEO audit checklist
llms-txt-templates — Production-ready llms.txt templates
entity-consistency-playbook — Full audit methodology
geo-taxonomy — Semantic vocabulary for GEO

All MIT-licensed. PRs welcome.

Takeaways

Entity consistency is a technical problem, not a marketing problem. Treat identity data like a distributed system.
Edge-injected structured data beats build-time generation. One deploy updates every page.
llms.txt and ai-agents.json are the new robots.txt. If you are not serving these, AI models are guessing.
Audit quarterly at minimum. Platform bios drift silently.
Cross-link aggressively. Every sameAs URL is a vote for entity unification.

Alexandre Caramaschi is CEO of Brasil GEO, the first Brazilian consultancy for Generative Engine Optimization. Previously CMO at Semantix (Nasdaq) and co-founder of AI Brasil.

DEV Community