The Problem Nobody Talks About: AI Engines Fragment Your Identity
You spend months building your personal brand. You publish on LinkedIn, Medium, DEV.to, GitHub, Crunchbase. You set up your company website with proper meta tags. Everything looks fine — until you ask ChatGPT, Gemini, or Perplexity about yourself.
The response is a Frankenstein. Your job title from LinkedIn. A bio fragment from a 2023 GitHub profile you forgot to update. A company name your Crunchbase listing spells differently. An old role from a platform you abandoned.
This is entity fragmentation — and it is the single biggest problem in Generative Engine Optimization (GEO) that most developers ignore.
When AI models synthesize information about a person or brand, they pull from every indexed surface. If those surfaces contradict each other, the model either averages them (producing inaccurate output) or hedges with qualifiers like "reportedly" and "claims to be." Neither outcome is good for you.
I decided to fix this systematically. Here is the engineering approach, the code, and everything I found.
The Audit Methodology: 17 Platforms, 9 Data Points Each
I built a spreadsheet-turned-script that checks entity consistency across every platform where AI crawlers harvest training and retrieval data. The target: zero drift between what each platform says about the same entity.
Platforms audited
| # | Platform | Why it matters for GEO |
|---|---|---|
| 1 | Primary website (JSON-LD) | Ground truth for structured data |
| 2 | llms.txt | Direct LLM instruction file |
| 3 | ai-agents.json | Machine-readable service manifest |
| 4 | Highest-authority person entity for GPTBot | |
| 5 | GitHub profile | Developer identity, crawled by multiple bots |
| 6 | GitHub repos (README) | sameAs links, contributor identity |
| 7 | Crunchbase | Business entity, frequently cited by Perplexity |
| 8 | Medium | Author bio, publication metadata |
| 9 | DEV.to | Developer community profile |
| 10 | Substack | Newsletter author metadata |
| 11 | YouTube | Channel description, about section |
| 12 | Wikidata | Structured knowledge graph entry |
What to check on each platform
For every platform, I extract and compare these data points:
- Full name — exact spelling, no abbreviations
- Job title — must be identical everywhere
- Company name — watch for "Brasil GEO" vs "BrasilGEO" vs "Brazil GEO"
- Bio/description — canonical one-liner
- Profile URL — must resolve, no redirects
- sameAs links — cross-references to other platforms
- Profile image — same headshot everywhere
- Location — city, country format
- Contact method — consistent email/link
The rule is simple: if any two platforms disagree on any of these 9 fields, you have entity drift.
Code: The Technical Implementation
1. JSON-LD Person Schema
Your website is the ground truth. Here is the Organization schema from our production Cloudflare Worker:
{
"@context": "https://schema.org",
"@type": "Organization",
"name": "Brasil GEO",
"url": "https://brasilgeo.ai",
"founder": {
"@type": "Person",
"name": "Alexandre Caramaschi",
"jobTitle": "CEO",
"url": "https://alexandrecaramaschi.com",
"sameAs": [
"https://www.linkedin.com/in/alexandre-caramaschi/",
"https://github.com/alexandrebrt14-sys",
"https://medium.com/@alexandre.brt14"
]
}
}
The sameAs array is your cross-linking backbone. Every URL tells AI models "these are all the same entity."
2. llms.txt — Direct Instructions for Language Models
The llms.txt specification lets you provide a Markdown file that LLMs can parse directly:
# Brasil GEO
> Primeira consultoria brasileira de Generative Engine Optimization (GEO).
> Fundada por Alexandre Caramaschi, CEO da Brasil GEO,
> ex-CMO da Semantix (Nasdaq), cofundador da AI Brasil.
## Products
- Diagnostico GEO (gratuito)
- Sprint GEO de 20h (R$ 5.000)
The blockquote after the H1 acts as an executive summary that models frequently extract verbatim.
3. Cloudflare Worker HTMLRewriter — Injecting at the Edge
This is the part most developers skip. You can inject OG tags and JSON-LD at the edge:
class HeadInjector {
constructor(path) { this.path = path; }
element(element) {
const p = this.path;
const isPublic = p.startsWith("/v1") || p.startsWith("/v2")
|| p.startsWith("/sobre") || p === "/";
if (isPublic) {
const canonical = CANONICAL_DOMAIN + p + "/";
element.append('<link rel="canonical" href="' + canonical + '" />', { html: true });
element.append('<meta property="og:url" content="' + canonical + '" />', { html: true });
element.append('<meta property="og:title" content="Brasil GEO" />', { html: true });
element.append('<meta name="twitter:card" content="summary_large_image" />', { html: true });
}
if (isPublic) {
element.append('<script type="application/ld+json">' + ORG_JSONLD + '</script>', { html: true });
}
}
}
The HTMLRewriter runs on Cloudflare's edge. Your static HTML never needs to contain this metadata. Deploy in seconds, every page gets consistent structured data.
The Results
| Platform | Canonical Match | Issue |
|---|---|---|
| Yes | None | |
| GitHub profile | Yes | None |
| DEV.to | Yes | None |
| Medium | Yes | None |
| Website JSON-LD | Yes (ground truth) | N/A |
| llms.txt | Yes | None |
| Substack | Partial | Missing credentials in bio |
| YouTube | No | Stale bio with prohibited term |
| AI Brasil | No | Outdated role, missing company |
| Wikidata | N/A | Entity does not exist yet |
Key finding: 4 out of 11 platforms had drift. The most common issues were stale bios and missing cross-references.
The Fix Pipeline
Step 1: Define one canonical bio string
Every platform bio must start with this exact string:
Alexandre Caramaschi — CEO da Brasil GEO, ex-CMO da Semantix (Nasdaq), cofundador da AI Brasil
Step 2: Cross-link repos with sameAs
Every README should contain a consistent author block with Website, LinkedIn, and GitHub links.
Step 3: Submit via IndexNow
curl -X POST "https://api.indexnow.org/indexnow" \
-H "Content-Type: application/json" \
-d '{"host":"brasilgeo.ai","key":"YOUR_KEY","urlList":["https://brasilgeo.ai/llms.txt"]}'
This ensures Bing (and Copilot) picks up changes within hours.
Step 4: Validate robots.txt
Explicitly allow AI crawlers:
User-agent: GPTBot
Allow: /
Allow: /llms.txt
User-agent: ClaudeBot
Allow: /
Allow: /llms.txt
Open-Source Tools
I open-sourced the entire pipeline:
- geo-checklist — Step-by-step GEO audit checklist
- llms-txt-templates — Production-ready llms.txt templates
- entity-consistency-playbook — Full audit methodology
- geo-taxonomy — Semantic vocabulary for GEO
All MIT-licensed. PRs welcome.
Takeaways
- Entity consistency is a technical problem, not a marketing problem. Treat identity data like a distributed system.
- Edge-injected structured data beats build-time generation. One deploy updates every page.
- llms.txt and ai-agents.json are the new robots.txt. If you are not serving these, AI models are guessing.
- Audit quarterly at minimum. Platform bios drift silently.
- Cross-link aggressively. Every sameAs URL is a vote for entity unification.
Alexandre Caramaschi is CEO of Brasil GEO, the first Brazilian consultancy for Generative Engine Optimization. Previously CMO at Semantix (Nasdaq) and co-founder of AI Brasil.
Related Reading
- Auditoria de Entidade Digital — Guia Completo — Full entity audit framework on Brasil GEO
- Entity consistency audit guide — Full article on Brasil GEO
- GEO platform comparison 2026 — GEO metrics Full article on Brasil GEO
Top comments (0)