<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alexandre Caramaschi</title>
    <description>The latest articles on DEV Community by Alexandre Caramaschi (@alexandrebrt14sys).</description>
    <link>https://dev.to/alexandrebrt14sys</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3835714%2F8a5b0d12-7104-4cef-81f3-7a8eac04c5fb.jpeg</url>
      <title>DEV Community: Alexandre Caramaschi</title>
      <link>https://dev.to/alexandrebrt14sys</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/alexandrebrt14sys"/>
    <language>en</language>
    <item>
      <title>Coletei 8.571 queries em sete dias e descobri que ser citado por IA é uma métrica que não existe</title>
      <dc:creator>Alexandre Caramaschi</dc:creator>
      <pubDate>Wed, 29 Apr 2026 13:20:18 +0000</pubDate>
      <link>https://dev.to/alexandrebrt14sys/coletei-8571-queries-em-sete-dias-e-descobri-que-ser-citado-por-ia-e-uma-metrica-que-nao-existe-1nic</link>
      <guid>https://dev.to/alexandrebrt14sys/coletei-8571-queries-em-sete-dias-e-descobri-que-ser-citado-por-ia-e-uma-metrica-que-nao-existe-1nic</guid>
      <description>&lt;p&gt;Há sete dias eu liguei o cronômetro de uma janela de 90 dias. Pré-registrei a metodologia no OSF, travei a versão 2 do pipeline, e deixei a coleta rodar no automático em cinco LLMs (ChatGPT, Claude, Gemini, Groq e Perplexity), 69 entidades brasileiras (61 reais e 8 fictícias plantadas como controle), quatro verticais (Fintech, Varejo, Saúde e Tecnologia). Hoje, dia 7 de 90, já temos &lt;strong&gt;8.571 queries empíricas&lt;/strong&gt; e &lt;strong&gt;1.785 citações&lt;/strong&gt; no banco. E os primeiros sinais já desmontam uma premissa que circula em quase todo deck de marketing brasileiro.&lt;/p&gt;

&lt;p&gt;O dado mais importante deste post é uma frase. &lt;strong&gt;Não existe uma métrica única chamada "ser citado por IA".&lt;/strong&gt; Existem cinco mercados completamente diferentes acontecendo ao mesmo tempo, e a maior parte das marcas está otimizando para o errado.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cinco mercados, 75 vezes de diferença
&lt;/h2&gt;

&lt;p&gt;A taxa de citação global, sobre 8.571 queries, é de &lt;strong&gt;20,8%&lt;/strong&gt; (intervalo de confiança de 95%: 20,0%–21,7%). Esse número, sozinho, é inútil. Quando se decompõe por LLM:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Perplexity:&lt;/strong&gt; 82,5% de citação&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude:&lt;/strong&gt; 26,0%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ChatGPT:&lt;/strong&gt; 17,2%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Groq:&lt;/strong&gt; 8,2%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini:&lt;/strong&gt; 1,1%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Setenta e cinco vezes de diferença&lt;/strong&gt; entre o melhor e o pior. Não é ruído. São 8.571 queries pareadas, com a mesma cohort, na mesma janela, com o mesmo prompt-set. O modelo com RAG ativo (Perplexity) e o modelo paramétrico puro (Gemini) são, do ponto de vista de visibilidade de marca, dois universos. Quando uma marca declara "fui citada pela IA", precisa terminar a frase: por qual.&lt;/p&gt;

&lt;p&gt;Reportar "presença em IA" como métrica única é esconder duas ordens de grandeza atrás de uma média ponderada. Cada engine opera com pipeline diferente — recuperação ao vivo, augmentação seletiva, inferência paramétrica pura — e a barra de entrada para cada uma é radicalmente distinta.&lt;/p&gt;

&lt;h2&gt;
  
  
  Três achados que estão me tirando o sono
&lt;/h2&gt;

&lt;p&gt;Além do gap entre engines, três sinais preliminares já mostram que a leitura ingênua do mercado está errada em pelo menos três frentes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Vertical importa duas vezes mais do que eu esperava.&lt;/strong&gt; Fintech tem 28,6% de taxa de citação. Saúde tem 14,0%. Mesma metodologia, mesma janela, mesma cohort. O recall setorial dos LLMs é profundamente desigual, e o setor de saúde está órfão.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Inglês cita mais do que português.&lt;/strong&gt; Queries em inglês geram 23,0% de citações. As mesmas queries em português, sobre as mesmas marcas, geram 18,7%. Eu esperava o oposto. O sinal prático: hoje, perguntar &lt;em&gt;"best Brazilian fintechs"&lt;/em&gt; devolve mais marcas brasileiras do que perguntar "melhores fintechs brasileiras". Provavelmente envolve o volume de corpus de treinamento em inglês citando marcas brasileiras, mais do que a presença em conteúdo nativo em português.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Quase ninguém fala mal.&lt;/strong&gt; De 3.841 contextos com sentimento classificado, 0,2% são negativos. Os LLMs raramente criticam quem citam. Qualquer dashboard tipo "share of voice em IA" mede presença, não reputação. Reputação exige outro experimento.&lt;/p&gt;

&lt;p&gt;E ainda: &lt;strong&gt;97% das menções identificadas usam o nome próprio da marca&lt;/strong&gt; (167 em 172 contextos auditados). Os modelos preferem citar pelo nome a inserir um link. A unidade competitiva no GEO é a entidade nomeada, não a URL.&lt;/p&gt;

&lt;h2&gt;
  
  
  Por que confio nesses números aos sete dias
&lt;/h2&gt;

&lt;p&gt;Resultado parcial do dia 7 que talvez seja o mais importante: &lt;strong&gt;especificidade de 100,0%&lt;/strong&gt;. As oito entidades fictícias plantadas na cohort — nomes plausíveis em português que correspondem a empresas que não existem — receberam &lt;strong&gt;zero falsos positivos&lt;/strong&gt; em 8.571 queries. A instrumentação está calibrada.&lt;/p&gt;

&lt;p&gt;Para chegar aqui, eu tive que jogar fora a versão 1 desse pipeline. Em fevereiro publiquei um paper chamado &lt;strong&gt;Null-Triad: Three Ways to Fail to Conclude&lt;/strong&gt; no Zenodo (DOI &lt;a href="https://doi.org/10.5281/zenodo.19712217" rel="noopener noreferrer"&gt;10.5281/zenodo.19712217&lt;/a&gt;) admitindo que a primeira metodologia tinha três falhas estruturais simultâneas: poder estatístico insuficiente em H1, design que não testava o que media em H2, e um casamento de string que inflava H3. A migração para a v2 derrubou &lt;strong&gt;45% das "citações"&lt;/strong&gt; que estávamos contando, porque eram falsos positivos do tipo "Inter" sendo capturado dentro de "international", ou "Stone" dentro de "cornerstone".&lt;/p&gt;

&lt;p&gt;Foi humilhante e foi necessário. Publicar o Null-Triad antes de iniciar a v2 foi a forma mais honesta que encontrei de declarar publicamente: o que eu disse antes estava errado, e aqui está exatamente como.&lt;/p&gt;

&lt;h2&gt;
  
  
  O que muda no pipeline v2
&lt;/h2&gt;

&lt;p&gt;A versão 2 está formalizada em &lt;a href="https://github.com/alexandrebrt14-sys/papers/blob/main/docs/METHODOLOGY_V2.md" rel="noopener noreferrer"&gt;METHODOLOGY_V2.md&lt;/a&gt; e aberta sob licença MIT em &lt;a href="https://github.com/alexandrebrt14-sys/papers" rel="noopener noreferrer"&gt;github.com/alexandrebrt14-sys/papers&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;NER com word-boundary rigoroso&lt;/strong&gt; e normalização Unicode dupla (NFC + NFKD).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dicionário canônico de aliases&lt;/strong&gt; (BTG ↔ BTG Pactual, XP ↔ XP Investimentos, C6 ↔ C6 Bank, Magalu ↔ Magazine Luiza).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Oito decoys fictícios&lt;/strong&gt; plantados como canários de especificidade.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Estimador sanduíche cluster-robust (CR1)&lt;/strong&gt; respeitando a estrutura de cluster diário.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simulação Monte Carlo&lt;/strong&gt; substituindo thresholds arbitrários por percentis empíricos.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Correção BH-FDR&lt;/strong&gt; para múltiplas comparações.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regra de decisão pré-registrada&lt;/strong&gt;: rejeito H₀ apenas se o p-valor ajustado for menor que 0,05 &lt;em&gt;e&lt;/em&gt; o intervalo de 95% excluir o valor nulo.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reprodutibilidade container-level&lt;/strong&gt;: Dockerfile com &lt;code&gt;PYTHONHASHSEED&lt;/code&gt; pinado, &lt;code&gt;requirements-lock.txt&lt;/code&gt; imutável, manifest SHA-256 dos outputs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A janela vai até &lt;strong&gt;21 de julho de 2026&lt;/strong&gt;. No &lt;strong&gt;dia 25&lt;/strong&gt; o estudo atinge poder estatístico para H1, no &lt;strong&gt;dia 38&lt;/strong&gt; para H2. Só vou bater no peito sobre conclusões definitivas em &lt;strong&gt;outubro&lt;/strong&gt;, quando o paper for submetido à &lt;em&gt;Information Sciences&lt;/em&gt; (Elsevier, fator de impacto 8,1). Até lá, prometo o que prometi no OSF: vou publicar também os resultados nulos, se aparecerem.&lt;/p&gt;

&lt;h2&gt;
  
  
  O que já dá para usar na prática (com cautela)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pare de tratar "presença em IA" como métrica única.&lt;/strong&gt; Reporte por modelo. Idealmente por modelo e por idioma.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Se você é fintech ou varejo, o jogo está aberto.&lt;/strong&gt; Barra de entrada estruturalmente menor — Fintech 28,6%, Varejo 25,5%.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Se você é saúde, o trabalho é estrutural.&lt;/strong&gt; Com 14,0% de taxa, ganhar visibilidade exige construção de autoridade externa em ciclo longo.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Se você está investindo em conteúdo só em português, está deixando dinheiro na mesa.&lt;/strong&gt; Conteúdo bilíngue, com base inglesa sólida, é hoje uma alavanca subestimada.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Não confie em dashboards que prometem "share of voice em IA" sem mostrar intervalo de confiança, tamanho de amostra e metodologia de extração.&lt;/strong&gt; A v1 deste mesmo estudo cometeu o erro de contar "international" como "Inter" durante meses.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Sete dias. Mais oitenta e três pela frente
&lt;/h2&gt;

&lt;p&gt;Dataset e dashboard atualizados em tempo real:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://alexandrecaramaschi.com/research" rel="noopener noreferrer"&gt;alexandrecaramaschi.com/research&lt;/a&gt; — números do dia, intervalos de confiança, distribuição por vertical, LLM e idioma.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://alexandrecaramaschi.com/papers-roadmap" rel="noopener noreferrer"&gt;alexandrecaramaschi.com/papers-roadmap&lt;/a&gt; — fases, hipóteses, venues alvo, ondas entregues.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/alexandrebrt14-sys/papers" rel="noopener noreferrer"&gt;github.com/alexandrebrt14-sys/papers&lt;/a&gt; — código completo, pipeline, testes, migrations, Dockerfile.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A próxima vez que alguém te disser que "a IA está citando" a sua marca, a resposta correta tem quatro componentes: &lt;strong&gt;qual IA, em que idioma, em que vertical e com que intervalo de confiança&lt;/strong&gt;. Se faltar qualquer um dos quatro, o que está sendo medido não é visibilidade — é folclore.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Alexandre Caramaschi — CEO da Brasil GEO, ex-CMO da Semantix (Nasdaq), cofundador da AI Brasil.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>marketing</category>
      <category>datascience</category>
      <category>research</category>
    </item>
    <item>
      <title>YouTube as a GEO Engine: 10 Field Rules for Getting Cited by ChatGPT, Gemini and Perplexity</title>
      <dc:creator>Alexandre Caramaschi</dc:creator>
      <pubDate>Fri, 24 Apr 2026 16:11:20 +0000</pubDate>
      <link>https://dev.to/alexandrebrt14sys/youtube-as-a-geo-engine-10-field-rules-for-getting-cited-by-chatgpt-gemini-and-perplexity-2ogp</link>
      <guid>https://dev.to/alexandrebrt14sys/youtube-as-a-geo-engine-10-field-rules-for-getting-cited-by-chatgpt-gemini-and-perplexity-2ogp</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9zmcezj33ebvb9qlicvr.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9zmcezj33ebvb9qlicvr.jpeg" alt=" " width="800" height="373"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Last quarter I helped scale a B2B channel to 1,200+ subs, 179K total views, and two videos past 30K views each. Great numbers. Wrong lens.&lt;/p&gt;

&lt;p&gt;In the same period, that channel generated &lt;strong&gt;zero attributed leads&lt;/strong&gt; in the CRM. Zero. The paid campaigns bought 11,765 views at R$0.13 average CPV. Every single one landed on the YouTube channel page — not the website. Subs went up. Pipeline stayed flat.&lt;/p&gt;

&lt;p&gt;The problem wasn't the channel. It was the strategy. YouTube in 2026 is not a conversion funnel — it is &lt;strong&gt;structured authority storage for generative engines&lt;/strong&gt;. Treat it as the first and you waste money. Treat it as the second and you buy something your competitor cannot: citation inside ChatGPT, Gemini, Perplexity and Claude answers.&lt;/p&gt;

&lt;p&gt;Ten field rules follow. None are theoretical. All came from auditing a real channel (&lt;a href="https://www.youtube.com/@acaramaschi" rel="noopener noreferrer"&gt;@acaramaschi&lt;/a&gt;) that averages a GEO score of 75/100 with 12 videos below threshold — meaning, lots of room to grow, probably like yours.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Read full 10 rules + Portuguese version:&lt;/strong&gt; &lt;a href="https://alexandrecaramaschi.com/artigos/youtube-para-geo-o-canal-como-prova-de-autoridade-algoritmica" rel="noopener noreferrer"&gt;https://alexandrecaramaschi.com/artigos/youtube-para-geo-o-canal-como-prova-de-autoridade-algoritmica&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  10 rules in bullets (short version for dev.to)
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Metadata GEO-first before thumbnail&lt;/strong&gt; — Title ≤ 60 chars with keyword in first 35; description ≥ 300 chars + CTA; 5+ tags; chapters for 2min+; 3 hashtags. 82% of audited videos had zero tags.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Shorts open, long-form converts authority&lt;/strong&gt; — Shorts = 30-60s hook trailer. Long-form 7-20min = pillar. One long-form yields 10x the indexable transcript of 10 Shorts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Treat transcript as a blog post&lt;/strong&gt; — Download auto-transcript, rewrite, upload as manual caption, republish as site article with Schema VideoObject + Article. One video → three indexable sources.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Wikidata + Schema VideoObject on the site&lt;/strong&gt; — Create Wikidata item, add P2397 (YouTube channel ID). Site articles embed with full VideoObject markup. Attribution triad: channel ↔ Wikidata ↔ site.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Demand Gen with site destination, not just subscribe&lt;/strong&gt; — Run two parallel campaigns: Subs (final_url=channel) and Leads (final_url=site). Same asset, different CTA, different conversion goals. 70/30 budget split.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Canonical UTM and ≤ 20% dark attribution&lt;/strong&gt; — &lt;code&gt;utm_source=youtube&amp;amp;utm_medium=demandgen&amp;amp;utm_campaign={snake}&lt;/code&gt;. Persist first-touch cookie 90d. Measure CPL per video, not per channel.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Weekly cadence beats daily volume&lt;/strong&gt; — 3-7 day upload interval. Retention ≥ 50%. If 21 days without posting, pause ads (CTR drops 40-60%).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pick 3 pillar topics, rotate format&lt;/strong&gt; — 60% / 25% / 15% split. LLMs associate channels with topics. Topical authority comes from density, not variety.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Crosslink channel ↔ site ↔ platforms&lt;/strong&gt; — Every video: 3 outbound + 3 inbound links. Description → site article. &lt;a href="https://dev.to/alexandrebrt14sys"&gt;dev.to&lt;/a&gt;, LinkedIn, Medium, Hashnode repost. 3-5 crosslinks = average delta between cited vs uncited.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Measure GEO score, not views&lt;/strong&gt; — 0-100 formula based on title + desc + tags + chapters + hashtags + manual transcript + site link + retention. Target ≥ 85 for every 2026 video.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;strong&gt;Full 2000-word article with detailed examples, formulas and 5 FAQs:&lt;/strong&gt;&lt;br&gt;
→ &lt;a href="https://alexandrecaramaschi.com/artigos/youtube-para-geo-o-canal-como-prova-de-autoridade-algoritmica" rel="noopener noreferrer"&gt;alexandrecaramaschi.com/artigos/youtube-para-geo-o-canal-como-prova-de-autoridade-algoritmica&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Other articles in the Generative Engine Optimization series:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://alexandrecaramaschi.com/artigos/share-of-voice-em-ia-como-medir-visibilidade" rel="noopener noreferrer"&gt;Share of Voice in AI: how to measure if your brand exists for the machine&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://alexandrecaramaschi.com/artigos/economia-zero-clique-e-o-fim-do-funil" rel="noopener noreferrer"&gt;The zero-click economy and the end of the funnel as you know it&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://alexandrecaramaschi.com/artigos/geo-vs-seo-vs-aeo-o-que-muda-na-pratica" rel="noopener noreferrer"&gt;GEO vs SEO vs AEO: what actually changes in practice&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Alexandre Caramaschi is CEO of Brasil GEO, former CMO of Semantix (Nasdaq: STIX), co-founder of AI Brasil. Pioneer in Generative Engine Optimization and Business-to-Agent (B2A) in Brazil. Watch the channel: &lt;a href="https://www.youtube.com/@acaramaschi" rel="noopener noreferrer"&gt;@acaramaschi&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Por que o Brasil deveria estar pesquisando GEO antes do resto do mundo — e o que encontrei em 1.004 consultas a LLMs</title>
      <dc:creator>Alexandre Caramaschi</dc:creator>
      <pubDate>Thu, 16 Apr 2026 19:33:49 +0000</pubDate>
      <link>https://dev.to/alexandrebrt14sys/por-que-o-brasil-deveria-estar-pesquisando-geo-antes-do-resto-do-mundo-e-o-que-encontrei-em-1004-58dh</link>
      <guid>https://dev.to/alexandrebrt14sys/por-que-o-brasil-deveria-estar-pesquisando-geo-antes-do-resto-do-mundo-e-o-que-encontrei-em-1004-58dh</guid>
      <description>&lt;h1&gt;
  
  
  Por que o Brasil deveria estar pesquisando GEO antes do resto do mundo
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Na semana passada, colei dois prompts idênticos no ChatGPT, Claude, Gemini e Perplexity.&lt;/strong&gt; A única diferença: um estava em português, outro em inglês. A pergunta era a mesma — "quais os melhores bancos digitais do Brasil?". A resposta em português citou Nubank, Inter, C6 Bank nove vezes em dez. A versão em inglês citou os mesmos nomes em cinco de dez respostas, e nas outras cinco apareceram Revolut, N26, Monzo — marcas europeias que nem operam aqui.&lt;/p&gt;

&lt;p&gt;A diferença não foi opinião. Foi &lt;strong&gt;+29 pontos percentuais de visibilidade&lt;/strong&gt; — 79,4% de citação em português contra 50,4% em inglês, medido sobre um dataset empírico de 1.004 consultas estruturadas rodando há 24 dias em paralelo nos quatro principais motores generativos.&lt;/p&gt;

&lt;p&gt;Esse número sozinho inverte a lógica que domina agências brasileiras: não faz sentido escrever conteúdo em inglês para "ampliar alcance" quando o motor que decide quem aparece na resposta já aprendeu a citar marcas brasileiras em português melhor do que em inglês. E esse é apenas um dos achados.&lt;/p&gt;

&lt;h2&gt;
  
  
  A tese central: quem não aparece em IA nos próximos 18 meses vai desaparecer
&lt;/h2&gt;

&lt;p&gt;O Brasil tem uma janela curta para fazer Generative Engine Optimization virar disciplina antes do resto do mundo. Esta não é uma afirmação de marketing — é uma observação empírica ancorada em três assimetrias raramente discutidas em conjunto:&lt;/p&gt;

&lt;p&gt;A primeira é &lt;strong&gt;linguística&lt;/strong&gt;. Os LLMs citam marcas brasileiras em português com densidade 29pp maior que em inglês. Isso cria um domínio onde o conteúdo publicado em pt-BR tem valor específico no treinamento e no grounding retrieval desses modelos — algo que concorrentes em espanhol, alemão ou francês não desfrutam na mesma intensidade.&lt;/p&gt;

&lt;p&gt;A segunda é &lt;strong&gt;institucional&lt;/strong&gt;. Não existe hoje, no Brasil, um framework empírico público medindo continuamente como LLMs tratam marcas locais em série temporal longitudinal. Nenhuma ABRADi, ABComm, Endeavor ou universidade pública publicou um dataset aberto comparável a CC-GSEO-Bench (China) ou SAGEO Arena (EUA) para o mercado brasileiro. Quem medir primeiro define a metodologia.&lt;/p&gt;

&lt;p&gt;A terceira é &lt;strong&gt;comercial&lt;/strong&gt;. O agentic commerce — agentes de IA fazendo compras em nome de humanos — vai chegar no varejo brasileiro nos próximos 18 meses. Quem aparecer nas recomendações desses agentes vende. Quem não aparecer, não existe.&lt;/p&gt;

&lt;p&gt;Eu sou &lt;strong&gt;Alexandre Caramaschi, CEO da Brasil GEO, ex-CMO da Semantix (Nasdaq), cofundador da AI Brasil&lt;/strong&gt;. Há seis meses larguei consultorias pontuais para construir a primeira infraestrutura científica de GEO no Brasil. O que segue é o relato técnico do que estamos medindo, por que estamos medindo, e o que os primeiros resultados sugerem.&lt;/p&gt;

&lt;h2&gt;
  
  
  A arquitetura da pesquisa: quatro verticais, cinco LLMs, 69 entidades
&lt;/h2&gt;

&lt;p&gt;O protocolo tem quatro componentes metodológicos. Cada um responde a uma objeção que um revisor rigoroso levantaria.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quatro verticais independentes&lt;/strong&gt;. Fintech, saúde, tecnologia e varejo. Cada vertical tem sua coorte de entidades monitoradas: 21 em fintech, 16 em saúde, 16 em tecnologia, 16 em varejo. A escolha de verticais diferentes testa se os achados são generalizáveis ou específicos de um setor (Proposal 8 do design doc de concorrentes internacionais incluídos para cross-market comparison).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cinco LLMs consultados em paralelo&lt;/strong&gt;. ChatGPT 4o-mini, Claude Haiku 4.5, Gemini 2.5 Pro, Perplexity Sonar e, desde hoje, Groq Llama 3.3 70B. A diversidade é intencional: três modelos comerciais fechados e dois open-weight. Isso isola o efeito "modelo da OpenAI" do efeito "LLM em geral" quando uma marca aparece em todos.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Oito entidades fictícias para calibração&lt;/strong&gt;. Aqui mora a contribuição metodológica que considero mais forte. Inserimos oito marcas inventadas — Banco Floresta Digital, FinPay Solutions, MegaStore Brasil, ShopNova Digital, HealthTech Brasil, Clínica Horizonte Digital, TechNova Solutions, DataBridge Brasil — distribuídas uma por vertical. Se qualquer LLM citar uma dessas, sabemos que está alucinando. Zero tolerância.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1.004 consultas empíricas em 24 dias&lt;/strong&gt;. Cada query é estruturada, tem categoria (descoberta, comparativo, confiança, produto, B2B, investimento, alternativas), idioma (PT ou EN) e é executada contra todos os LLMs simultaneamente. O banco SQLite (papers.db) fica versionado no git, público, auditável. Qualquer pesquisador pode baixar o dataset e reproduzir nossos números em trinta minutos.&lt;/p&gt;

&lt;h2&gt;
  
  
  O achado publicável: specificity = 100%
&lt;/h2&gt;

&lt;p&gt;Se eu pudesse escolher um único resultado para entrar no abstract do paper que estamos escrevendo para submissão em julho de 2026, seria este:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero menções às oito entidades fictícias em 1.004 respostas.&lt;/strong&gt; False positive rate = 0,00%. Specificity = 100%.&lt;/p&gt;

&lt;p&gt;Isso significa duas coisas. Primeiro, que o denominador da nossa pesquisa é confiável — quando contamos citações, estamos contando citações reais, não alucinações do modelo. Segundo, que os LLMs que testamos não inventam marcas brasileiras sob prompts de descoberta, comparação e confiança. Esse é um achado não-trivial. Existem setores e idiomas onde LLMs alucinam empresas em volume — nossa calibração mostra que, em português sobre marcas brasileiras reais, a alucinação é residual.&lt;/p&gt;

&lt;p&gt;Essa validação sustenta todos os outros números que vou citar.&lt;/p&gt;

&lt;h2&gt;
  
  
  Os números que estão vivos agora em alexandrecaramaschi.com/research
&lt;/h2&gt;

&lt;p&gt;A página de research é dinâmica: puxa o snapshot consolidado do repositório de coleta a cada hora. Conforme o dataset cresce, os números atualizam. No momento em que escrevo, este é o estado:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Taxa global de citação: 62,4% (IC 95% de Wilson: 59,3% — 65,3%).&lt;/strong&gt; Em outras palavras, quando uma query relevante é feita a um LLM, existe uma em três chances de uma marca brasileira aparecer na resposta. Isso é alto — mais alto do que os 15-30% que benchmarks internacionais reportam para marcas em outros mercados emergentes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ranking de LLMs por taxa de citação:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;LLM&lt;/th&gt;
&lt;th&gt;Taxa&lt;/th&gt;
&lt;th&gt;IC 95%&lt;/th&gt;
&lt;th&gt;n&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Haiku 4.5&lt;/td&gt;
&lt;td&gt;67,8%&lt;/td&gt;
&lt;td&gt;62,2% — 72,9%&lt;/td&gt;
&lt;td&gt;298&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Perplexity Sonar&lt;/td&gt;
&lt;td&gt;65,3%&lt;/td&gt;
&lt;td&gt;58,6% — 71,4%&lt;/td&gt;
&lt;td&gt;213&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ChatGPT 4o-mini&lt;/td&gt;
&lt;td&gt;63,0%&lt;/td&gt;
&lt;td&gt;57,5% — 68,2%&lt;/td&gt;
&lt;td&gt;316&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 2.5 Pro&lt;/td&gt;
&lt;td&gt;48,6%&lt;/td&gt;
&lt;td&gt;41,3% — 56,0%&lt;/td&gt;
&lt;td&gt;177&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Claude lidera. Gemini fica atrás por uma combinação de dois fatores: respostas mais curtas (média de 300 tokens contra 800 dos outros) e latência 13 vezes maior que Claude, o que sugere que o modelo está pensando mais antes de responder — paradoxalmente, citando menos.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Taxa por vertical:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Vertical&lt;/th&gt;
&lt;th&gt;Taxa&lt;/th&gt;
&lt;th&gt;n&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Fintech&lt;/td&gt;
&lt;td&gt;68,5%&lt;/td&gt;
&lt;td&gt;336&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tecnologia&lt;/td&gt;
&lt;td&gt;65,5%&lt;/td&gt;
&lt;td&gt;252&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Varejo&lt;/td&gt;
&lt;td&gt;63,4%&lt;/td&gt;
&lt;td&gt;191&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Saúde&lt;/td&gt;
&lt;td&gt;48,9%&lt;/td&gt;
&lt;td&gt;225&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Fintech brasileira é o setor mais citado. Saúde é o menos. A diferença sugere que marcas com histórico digital forte — fintechs que nasceram online — têm presença desproporcional no corpus de treinamento. Marcas de saúde, mesmo as grandes como Dasa, Fleury e Rede D'Or, competem com terminologia técnica genérica ("hospital em São Paulo") que dissolve o sinal de marca.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Taxa por categoria de prompt:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Descobrimos que a categoria da pergunta importa mais do que o modelo escolhido. Perguntas de confiança (&lt;code&gt;Nubank é seguro?&lt;/code&gt;) geram 100% de citação porque o nome já está embutido. Perguntas de descoberta (&lt;code&gt;quais os melhores bancos?&lt;/code&gt;) geram 87% — alta densidade, mas com variância entre LLMs. Perguntas abertas de reputação (&lt;code&gt;quais as marcas mais inovadoras do Brasil?&lt;/code&gt;) caem para 61%.&lt;/p&gt;

&lt;p&gt;A implicação prática é direta: &lt;strong&gt;prompt engineering de marca é responsável por mais variância do que a escolha de LLM&lt;/strong&gt;. Uma marca que aparece em 80% das queries de descoberta em português mas em 30% das queries em inglês tem um problema de presença de idioma, não de SEO tradicional.&lt;/p&gt;

&lt;h2&gt;
  
  
  O mecanismo técnico: por que português funciona
&lt;/h2&gt;

&lt;p&gt;A diferença de 29 pontos percentuais entre português e inglês merece explicação técnica, porque pode soar mágica.&lt;/p&gt;

&lt;p&gt;Três hipóteses sustentam o achado. A primeira é &lt;strong&gt;densidade de corpus&lt;/strong&gt;. LLMs treinados em grandes volumes de texto em português brasileiro — sites, notícias, redes sociais, documentação fiscal — têm embeddings densos para marcas locais. Quando o prompt está em pt-BR, o retrieval puxa exatamente esses embeddings, com alta cosine similarity para as marcas monitoradas.&lt;/p&gt;

&lt;p&gt;A segunda é &lt;strong&gt;efeito de contexto&lt;/strong&gt;. Uma query em inglês aciona um espaço latente global. "Best digital banks" tem Revolut, Monzo, N26, Chime como vizinhos fortes no embedding — a marca brasileira compete contra um pool internacional. Uma query em português aciona o espaço latente brasileiro, onde Nubank e Inter são os vizinhos fortes.&lt;/p&gt;

&lt;p&gt;A terceira é &lt;strong&gt;grounding retrieval&lt;/strong&gt;. Perplexity e as versões recentes de ChatGPT consultam a web em tempo real. Quando a query está em português, o retrieval traz sites brasileiros. Em inglês, traz Forbes, Bloomberg, The Economist — veículos que raramente escrevem sobre bancos digitais brasileiros fora do hype Nubank.&lt;/p&gt;

&lt;p&gt;O mecanismo é compatível com o que Karpathy chamou de "LLM as a compression of the internet". A internet brasileira, em português, tem densidade de marca local. A internet em inglês, não.&lt;/p&gt;

&lt;h2&gt;
  
  
  O que decidimos medir a seguir
&lt;/h2&gt;

&lt;p&gt;Este dataset tem 24 dias de idade. A meta é atingir 10.000 consultas em 7 dias (com a expansão aplicada hoje: 35 queries por vertical, duas coletas diárias às 6h e 18h BRT, cinco LLMs em paralelo). Daqui até 15 de julho de 2026, temos 90 dias de coleta contínua para submeter à primeira publicação acadêmica peer-reviewed sobre GEO no mercado brasileiro.&lt;/p&gt;

&lt;p&gt;As próximas perguntas já estão na fila:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sensibilidade ao prompt (Proposal 6).&lt;/strong&gt; Vamos rodar 30 paráfrases da mesma query. Uma marca que aparece em 80% das variações é forte. Uma que aparece em 20% depende de formulações específicas — fragilidade que um agente de IA autônomo vai expor quando parafrasear a pergunta do usuário.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Efeito de intervenção (Módulo 4).&lt;/strong&gt; Quando a marca publica uma peça de conteúdo específica — por exemplo, um llms.txt estruturado, um post com schema.org ItemList — a taxa de citação muda em 7 dias? 14? 30? Medir isso com grupo de controle e teste-t de Welch pareado nos dá causalidade, não correlação.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross-LLM agreement.&lt;/strong&gt; Quando Claude, ChatGPT e Perplexity concordam em citar a mesma marca para a mesma query, a probabilidade de um quarto LLM também citar é 91%. Quando apenas um LLM cita, a probabilidade de um segundo concordar é 23%. Isso cria um sinal de robustez: marcas que aparecem em múltiplos LLMs têm presença estrutural no corpus. Marcas que aparecem em apenas um podem estar num viés idiossincrático.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Temporal stability.&lt;/strong&gt; Com 90 dias de série, conseguiremos rodar Mann-Kendall para detectar tendências e decomposição sazonal para isolar efeitos de ciclo. A hipótese que quero testar é que a taxa de citação de marcas médias oscila mais do que a de marcas grandes — sinal de que LLMs estão aprendendo e esquecendo em janelas curtas.&lt;/p&gt;

&lt;h2&gt;
  
  
  Por que isso importa para o Brasil antes do que para qualquer outro país
&lt;/h2&gt;

&lt;p&gt;A oportunidade é específica e tem data de validade.&lt;/p&gt;

&lt;p&gt;No Ocidente, GEO já virou departamento em agências enterprise. Na China, universidades publicaram CC-GSEO-Bench. No Brasil, apenas cases dispersos sem framework comparável. Isso cria uma janela de 18 meses — até o final de 2027 — onde quem medir com rigor científico define a literatura, os padrões e os casos canônicos.&lt;/p&gt;

&lt;p&gt;Três movimentos aceleram essa janela:&lt;/p&gt;

&lt;p&gt;Primeiro, o &lt;strong&gt;agentic commerce&lt;/strong&gt; vai chegar no Brasil em 2027, talvez antes. Agentes de IA comprando em nome de consumidores. O OpenAI Operator, o Google Mariner, o Anthropic Claude computer use estão em beta pública. Quando esses agentes escolherem onde comprar seu Pix, seu cartão, seu plano de saúde, a resposta vai depender do ranking no modelo — não do ranking no Google. Marcas que só fizeram SEO tradicional estão cegas para o que decide a venda.&lt;/p&gt;

&lt;p&gt;Segundo, o &lt;strong&gt;custo de coleta é trivial&lt;/strong&gt;. Toda a infraestrutura que mantém essa pesquisa — cinco LLMs, quatro verticais, dois runs diários, 70 observações por célula por dia, pipeline automatizado em GitHub Actions — custa 27 dólares por mês. Isso está dentro do budget de cinco dias de tráfego pago de qualquer empresa média. O gargalo não é capital, é convicção.&lt;/p&gt;

&lt;p&gt;Terceiro, o &lt;strong&gt;Brasil tem ativos únicos&lt;/strong&gt;. Português é o quinto idioma mais falado no mundo e o terceiro mais presente em LLMs. Temos marcas que venceram na era mobile (Nubank), na era social (iFood) e agora precisam vencer na era agentic. E temos pesquisadores, engenheiros e operadores com track record em AI — da AI Brasil à Semantix, passando por dezenas de startups.&lt;/p&gt;

&lt;p&gt;O que falta é alguém colocando a infraestrutura no chão. Estamos colocando.&lt;/p&gt;

&lt;h2&gt;
  
  
  O que estou oferecendo
&lt;/h2&gt;

&lt;p&gt;Ao longo dos próximos 90 dias, o dataset de pesquisa vai triplicar de tamanho. Vou publicar o preprint em ArXiv até 24 de maio. A submissão a um journal peer-reviewed — CSCW, CHI ou ACL — está agendada para 15 de julho. Todos os dados ficam abertos em &lt;a href="https://alexandrecaramaschi.com/research" rel="noopener noreferrer"&gt;alexandrecaramaschi.com/research&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Paralelamente, rodo &lt;strong&gt;Sprints GEO de 20 horas&lt;/strong&gt; para marcas que querem entrar no dataset como cases tratados: medição de baseline, diagnóstico estrutural, intervenção de conteúdo, medição pós-intervenção com grupo de controle. Cinco marcas por ciclo. A próxima leva abre em maio.&lt;/p&gt;

&lt;p&gt;Se você dirige marketing ou produto em uma marca que ainda não tem auditoria de visibilidade em IA, dois caminhos:&lt;/p&gt;

&lt;p&gt;O primeiro é rodar o diagnóstico gratuito que mantenho em &lt;a href="https://brasilgeo.ai" rel="noopener noreferrer"&gt;brasilgeo.ai&lt;/a&gt;. Ele usa a mesma metodologia do paper, aplicada a uma marca específica, em menos de 10 minutos.&lt;/p&gt;

&lt;p&gt;O segundo é enviar email direto para agendar conversa sobre onde sua marca cai no corpus atual — quais LLMs te citam, em qual idioma, em qual categoria de query — antes que o agentic commerce torne essa conversa tarde demais.&lt;/p&gt;

&lt;p&gt;O Brasil pode ser potência em GEO e em agentic commerce antes do resto do mundo. Não porque somos melhores que os outros mercados. Porque a janela está aberta, o idioma trabalha a nosso favor e ninguém ainda ocupou o lugar de referência científica. Quem ocupar agora vai definir o resto da década.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Alexandre Caramaschi&lt;/strong&gt; é CEO da Brasil GEO, ex-CMO da Semantix (Nasdaq), cofundador da AI Brasil. Escreve sobre pesquisa empírica em Generative Engine Optimization em &lt;a href="https://alexandrecaramaschi.com" rel="noopener noreferrer"&gt;alexandrecaramaschi.com&lt;/a&gt;. O dataset completo desta pesquisa está em &lt;a href="https://alexandrecaramaschi.com/research" rel="noopener noreferrer"&gt;alexandrecaramaschi.com/research&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>seo</category>
      <category>brazil</category>
      <category>marketing</category>
    </item>
    <item>
      <title>I Built a Deterministic Crosslink Engine for 117 Pages Using Jaccard Similarity</title>
      <dc:creator>Alexandre Caramaschi</dc:creator>
      <pubDate>Fri, 10 Apr 2026 02:30:11 +0000</pubDate>
      <link>https://dev.to/alexandrebrt14sys/i-built-a-deterministic-crosslink-engine-for-117-pages-using-jaccard-similarity-3mkn</link>
      <guid>https://dev.to/alexandrebrt14sys/i-built-a-deterministic-crosslink-engine-for-117-pages-using-jaccard-similarity-3mkn</guid>
      <description>&lt;p&gt;A content site with 117 pages and zero internal linking strategy is a site where visitors bounce after reading one page. That was my site two weeks ago.&lt;/p&gt;

&lt;p&gt;Today, every page on &lt;a href="https://alexandrecaramaschi.com" rel="noopener noreferrer"&gt;alexandrecaramaschi.com&lt;/a&gt; has 6 contextual crosslinks generated by a deterministic engine that runs in 200ms, costs nothing, and lives in a single Node.js script — no embeddings, no vector databases, no API calls.&lt;/p&gt;

&lt;p&gt;Here is exactly how I built it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: 117 Pages, Manual Linking
&lt;/h2&gt;

&lt;p&gt;The site has 41 long-form articles, 38 courses (388 modules), 26 strategic insights, and 14 service/tool pages. All built with Next.js 16 App Router.&lt;/p&gt;

&lt;p&gt;The existing &lt;code&gt;relatedArticles&lt;/code&gt; field in my CMS was manually curated — and covered maybe 15% of pages. Course pages had zero outbound links to articles. Articles never pointed to courses. The result: visitors arrived via search, consumed one page, and left.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture: Faceted Taxonomy + Weighted Scoring
&lt;/h2&gt;

&lt;p&gt;Instead of reaching for OpenAI embeddings, I designed a controlled vocabulary with 4 semantic facets:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Topics&lt;/strong&gt; — 26 canonical terms with synonym normalization:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;TOPICS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;geo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;geo&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;generative engine optimization&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;motor generativo&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;seo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;seo&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;search engine optimization&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ia-generativa&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ia generativa&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;llm&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;chatgpt&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gemini&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;vscode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;vscode&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;vs code&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;visual studio code&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;editor&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ide&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="c1"&gt;// ... 22 more&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each piece of content is annotated by scanning its title, description, and keywords against this vocabulary. Normalization strips accents and lowercases before matching (critical for Portuguese content).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Audience&lt;/strong&gt; — 7 profiles (beginner, dev, marketing-pro, executive, etc.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Intent&lt;/strong&gt; — 4 journey stages: &lt;code&gt;discover → learn → apply → decide&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Vertical&lt;/strong&gt; — 12 industry sectors (healthcare, legal, tourism, etc.)&lt;/p&gt;

&lt;h2&gt;
  
  
  The Scoring Function
&lt;/h2&gt;

&lt;p&gt;For each pair of content items (A, B), the score is a weighted sum across facets:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nf"&gt;score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;A&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;B&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;jaccard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;topics_A&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;topics_B&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;audienceOverlap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;A&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;B&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;intentFlow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;A&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;B&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;1.2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;verticalBridge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;A&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;B&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;1.3&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;crossDomainBonus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;A&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;B&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.6&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;trackAffinity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;A&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;B&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Jaccard similarity&lt;/strong&gt; handles topic matching. Two items sharing 3 of 5 topics score 0.6 — high enough to be relevant, low enough to avoid duplicates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Intent flow&lt;/strong&gt; rewards linking from discovery content (articles) to learning content (courses) to action pages (tools) — guiding visitors deeper.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross-domain bonus&lt;/strong&gt; is the key retention driver: an article about "zero-click economy" linking to the "SEO + GEO Fundamentals" course is more valuable than linking to another article about zero-click. Different content &lt;em&gt;types&lt;/em&gt; with shared topics get a 1.3x boost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Track affinity&lt;/strong&gt; ensures courses in the same learning path (e.g., Python → Data Science → Deploy) link to each other even without keyword overlap.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anti-Bubble Mixing
&lt;/h2&gt;

&lt;p&gt;Raw scoring produces homogeneous results — a course page would only suggest other courses. The mixer enforces quotas:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;content (articles + insights): min 1
learning (courses):             min 1
action (guides + tools):        min 1
any single group:               max 50%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three phases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Fill mandatory quotas from each group&lt;/li&gt;
&lt;li&gt;Complete by score, respecting group caps&lt;/li&gt;
&lt;li&gt;Fallback by supercategory for edge cases&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Injection Without Editing 63 Static Pages
&lt;/h2&gt;

&lt;p&gt;The site has 38 static course pages and 26 static insight pages — all individual &lt;code&gt;page.tsx&lt;/code&gt; files. Editing each one was not viable.&lt;/p&gt;

&lt;p&gt;Solution: &lt;strong&gt;middleware + headers + layout injection&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The middleware sets an &lt;code&gt;x-pathname&lt;/code&gt; header:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// middleware.ts&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;requestHeaders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Headers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;requestHeaders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-pathname&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pathname&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;NextResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;requestHeaders&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A server component reads it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// SmartRelated.tsx&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;h&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;h&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;x-pathname&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getCrosslinksFor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Injected via &lt;code&gt;educacao/layout.tsx&lt;/code&gt; and &lt;code&gt;insights/layout.tsx&lt;/code&gt;, it automatically appears below every course and insight page. For articles (dynamic &lt;code&gt;[slug]&lt;/code&gt; route), the pathname is passed explicitly as a prop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Pages with crosslinks&lt;/td&gt;
&lt;td&gt;~15%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total crosslinks&lt;/td&gt;
&lt;td&gt;~40 manual&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;700 generated&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-type links&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;116 of 117 pages&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Badge types per page&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.3 average&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Build time delta&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+200ms&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API costs&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The generator runs as part of &lt;code&gt;prebuild&lt;/code&gt; and outputs a static JSON map consumed at render time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Not Embeddings?
&lt;/h2&gt;

&lt;p&gt;At 117 pages, embeddings are overkill. The controlled vocabulary approach is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Deterministic&lt;/strong&gt; — same input, same output, every time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auditable&lt;/strong&gt; — grep the vocabulary file to understand any link&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Free&lt;/strong&gt; — no API calls, no vector DB&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fast&lt;/strong&gt; — 200ms to generate the entire map&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Versionable&lt;/strong&gt; — the JSON map is committed to git&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When the site crosses ~500 pages, I will migrate to pgvector. The architecture was designed for this: consumers only read &lt;code&gt;crosslink-map.json&lt;/code&gt; — they do not care how it was generated.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;The full source is at &lt;a href="https://alexandrecaramaschi.com" rel="noopener noreferrer"&gt;alexandrecaramaschi.com&lt;/a&gt;. Navigate any course, scroll to the bottom, and you will see the crosslinks in action.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Alexandre Caramaschi — CEO at Brasil GEO, former CMO at Semantix (Nasdaq), co-founder of AI Brasil. Building the practice of Generative Engine Optimization in Latin America.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>seo</category>
      <category>webdev</category>
      <category>nextjs</category>
    </item>
    <item>
      <title>12 dias de 'success' coletando zero dados — o bug silencioso que matou minha pesquisa de 90 dias</title>
      <dc:creator>Alexandre Caramaschi</dc:creator>
      <pubDate>Wed, 08 Apr 2026 00:24:00 +0000</pubDate>
      <link>https://dev.to/alexandrebrt14sys/12-dias-de-success-coletando-zero-dados-o-bug-silencioso-que-matou-minha-pesquisa-de-90-dias-3pfn</link>
      <guid>https://dev.to/alexandrebrt14sys/12-dias-de-success-coletando-zero-dados-o-bug-silencioso-que-matou-minha-pesquisa-de-90-dias-3pfn</guid>
      <description>&lt;p&gt;&lt;strong&gt;8 dias. 0 observações. 12 workflows GitHub Actions marcados como verde.&lt;/strong&gt; Foi isso que descobri há seis horas, em 7 de abril de 2026, ao olhar meu dashboard de pesquisa em alexandrecaramaschi.com/research e ver &lt;code&gt;overall_rate: 0&lt;/code&gt;, &lt;code&gt;total_observations: 0&lt;/code&gt;, &lt;code&gt;days_collecting: 0&lt;/code&gt; em todas as quatro verticais.&lt;/p&gt;

&lt;p&gt;O GitHub Actions me dizia que tinha rodado com sucesso desde 30 de março. Os commits estavam lá, datados, com mensagens automáticas perfeitas: &lt;code&gt;data: daily collection 4 verticals 2026-04-07&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Eu tinha um workflow chamando &lt;code&gt;python -m src.cli collect citation&lt;/code&gt; para 4 verticais (fintech, varejo, saúde, tecnologia), 4 LLMs (ChatGPT, Claude, Gemini, Perplexity), todo dia às 06:00 BRT.&lt;/p&gt;

&lt;p&gt;A pasta &lt;code&gt;output/&lt;/code&gt; tinha checkpoints atualizados. O dashboard estava no ar. E não havia uma única linha de dado real desde 30 de março.&lt;/p&gt;

&lt;h2&gt;
  
  
  A tese contraintuitiva
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Workflows verdes em CI mentem.&lt;/strong&gt; Especialmente quando o seu código tem &lt;code&gt;continue-on-error: true&lt;/code&gt; espalhado por todo lado e o seu único critério de sucesso é "o processo não exceptioned".&lt;/p&gt;

&lt;p&gt;O caso que eu vou contar é uma combinação de três falhas que se reforçam: API keys rotacionadas externamente sem propagação ao repositório, retorno HTTP 401 silencioso porque o coletor capturava a exceção e seguia, e um workflow YAML que considerava "completou sem crash" como "rodou bem". O resultado é o pior tipo de bug de pipeline: o que mantém todos os indicadores verdes enquanto a base de dados envelhece.&lt;/p&gt;

&lt;h2&gt;
  
  
  O contexto: pesquisa empírica de 90 dias
&lt;/h2&gt;

&lt;p&gt;Estou rodando um estudo longitudinal sobre como LLMs citam empresas brasileiras. O design tem 4 verticais, 69 entidades (61 reais + 8 fictícias para calibração de falso positivo), 4 modelos com versão pinada (&lt;code&gt;gpt-4o-mini-2024-07-18&lt;/code&gt;, &lt;code&gt;claude-haiku-4-5-20251001&lt;/code&gt;, &lt;code&gt;sonar&lt;/code&gt;, &lt;code&gt;gemini-2.5-pro&lt;/code&gt;) e ~288 observações por dia. O alvo eram 90 dias contínuos, ~25.920 observações, três papers planejados para ArXiv + SIGIR/WWW + Information Sciences Q1.&lt;/p&gt;

&lt;p&gt;A coleta começou em 24 de março. Tudo funcionou no dia 1. Em 25 e 26, um &lt;code&gt;SyntaxError&lt;/code&gt; em Python 3.11 do CI (válido em 3.12 do meu local) matou a coleta — incidente já documentado, fixado, post-mortem escrito.&lt;/p&gt;

&lt;p&gt;Em 29 de março, tudo funcionando de novo: 256 observações reais, distribuição saudável entre os 4 LLMs.&lt;/p&gt;

&lt;p&gt;Em 30 de março, alguma coisa quebrou.&lt;/p&gt;

&lt;h2&gt;
  
  
  A causa raiz
&lt;/h2&gt;

&lt;p&gt;Em algum momento entre 29 e 30 de março, eu rotacionei as 5 chaves de API do meu workspace local — provavelmente durante uma auditoria FinOps que estava fazendo no orquestrador multi-LLM. Atualizei o &lt;code&gt;.env&lt;/code&gt; do repositório principal. Fiz smoke test, validei que tudo respondia HTTP 200. Segui em frente.&lt;/p&gt;

&lt;p&gt;O que eu não fiz: propagar as chaves novas para os GitHub Secrets do repositório &lt;code&gt;papers&lt;/code&gt;. As keys lá ficaram datadas de 24 de março, ainda apontando para o conjunto antigo, agora inválido.&lt;/p&gt;

&lt;p&gt;A partir de 30 de março, todo dia às 06:00 BRT, o workflow rodava. Cada chamada a OpenAI retornava &lt;code&gt;HTTP 401 invalid_api_key&lt;/code&gt;. Cada chamada a Anthropic retornava &lt;code&gt;HTTP 401 invalid x-api-key&lt;/code&gt;. Cada chamada a Perplexity, mesma coisa. O Gemini retornava &lt;code&gt;HTTP 400&lt;/code&gt; por outro motivo (estrutura de resposta do 2.5 Pro com thinking mode incompatível com o parser que eu tinha — outro bug que vou cobrir abaixo).&lt;/p&gt;

&lt;p&gt;E o coletor continuava. Porque a função &lt;code&gt;collect()&lt;/code&gt; capturava as exceções, logava no stderr, e retornava uma lista vazia. A função do CLI verificava &lt;code&gt;if results:&lt;/code&gt; antes de inserir no banco — lista vazia significava simplesmente "nada para inserir, ok, próxima vertical". Sem exit code não-zero. Sem &lt;code&gt;raise&lt;/code&gt;. Sem alerta.&lt;/p&gt;

&lt;p&gt;O job &lt;code&gt;finalize&lt;/code&gt; baixava o artifact &lt;code&gt;papers-db-latest&lt;/code&gt; do dia anterior, rodava o &lt;code&gt;sync_to_supabase.py&lt;/code&gt; que agregava (zero linhas → todos os KPIs zerados), atualizava o snapshot da tabela &lt;code&gt;papers_dashboard_data&lt;/code&gt; no Supabase com &lt;code&gt;total_observations: 0&lt;/code&gt;, &lt;code&gt;overall_rate: 0&lt;/code&gt;, &lt;code&gt;days_collecting: 0&lt;/code&gt;, fazia upload do mesmo artifact inalterado, commitava &lt;code&gt;data/daily_*.csv&lt;/code&gt; (vazio), &lt;code&gt;data/finops_checkpoint.json&lt;/code&gt; e &lt;code&gt;docs/&lt;/code&gt;. E saía com exit code 0.&lt;/p&gt;

&lt;p&gt;12 dias assim. Workflow status: &lt;code&gt;completed/success&lt;/code&gt;. Banco real: 186 observações estagnadas em 24 de março. Dashboard live: zeros em todas as verticais.&lt;/p&gt;

&lt;h2&gt;
  
  
  Como descobri
&lt;/h2&gt;

&lt;p&gt;Não foi um alerta. Era para ser. Não havia.&lt;/p&gt;

&lt;p&gt;Foi uma pergunta. "A coleta está funcionando consistentemente para termos massa crítica em 90 dias?"&lt;/p&gt;

&lt;p&gt;Cinco minutos depois, baixando os logs do último run via &lt;code&gt;gh run view --log&lt;/code&gt; e filtrando por &lt;code&gt;ERROR&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ERROR: [ChatGPT] HTTP 401: invalid_api_key
ERROR: [Claude] HTTP 401: invalid x-api-key
ERROR: [Gemini] HTTP 400: ...
ERROR: [Perplexity] HTTP 401: Invalid API key provided
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Repetido em loop por todas as 18 queries de cada uma das 4 verticais. Mais de 200 linhas de erro. E o workflow no topo da página dizia &lt;code&gt;success&lt;/code&gt; em verde.&lt;/p&gt;

&lt;h2&gt;
  
  
  A decisão de regredir
&lt;/h2&gt;

&lt;p&gt;Eu tinha uma escolha: fazer backfill manual com data alterada para preservar a sequência (mas com timestamps todos do dia atual, contaminando análises temporais), ou aceitar que perdi 8 dias e reiniciar o contador.&lt;/p&gt;

&lt;p&gt;Reiniciei. A integridade temporal de um estudo longitudinal vale mais que a vaidade de um número de "dias contínuos". 90 dias com timestamps reais é evidência. 90 dias com 8 deles inventados é fraude metodológica.&lt;/p&gt;

&lt;p&gt;O dia 1 da nova janela é 8 de abril. Dia 90 será 6 de julho de 2026. ~256 observações por dia × 90 dias ≈ 23.000 observações totais com integridade temporal preservada.&lt;/p&gt;

&lt;h2&gt;
  
  
  Os 5 fixes que vão garantir que isso nunca mais aconteça
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Fail-loud no comando de coleta
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;src/cli.py::collect_citation&lt;/code&gt; agora soma o total de citações coletadas em todas as verticais. Se for zero quando pelo menos uma vertical foi tentada, o comando levanta &lt;code&gt;SystemExit(1)&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;total_attempted&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;total_collected&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FAIL-LOUD: 0 citacoes em &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;total_attempted&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; verticais. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Provavel causa: API keys invalidas/expiradas, rate limiting, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ou erro de configuracao.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;SystemExit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Isso garante que o workflow falha de verdade quando 100% das chamadas dão erro. Sem &lt;code&gt;|| true&lt;/code&gt; por cima. Sem &lt;code&gt;continue-on-error&lt;/code&gt;. O job termina vermelho.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Retry policy granular no coletor
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;src/collectors/base.py&lt;/code&gt; antes só tratava HTTP 429. Agora trata cinco categorias diferentes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Erro&lt;/th&gt;
&lt;th&gt;Comportamento&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;HTTP 401/403&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Circuit break imediato. Não retenta. Loga "rotacionar key no GitHub Secrets".&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;HTTP 429&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Retry com backoff exponencial. Após max retries, circuit break.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;HTTP 5xx&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Retry com backoff exponencial.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;ConnectError&lt;/code&gt;, &lt;code&gt;ReadTimeout&lt;/code&gt;, &lt;code&gt;WriteTimeout&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Retry com backoff.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;HTTP 4xx fatais&lt;/code&gt; (400, 404, 422)&lt;/td&gt;
&lt;td&gt;Log e segue para a próxima query.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A separação importa: 401 não é transient, é configuração. Retry não resolve. O fix é rotacionar a chave. Logar isso explicitamente faz a falha aparecer no diagnóstico em vez de ficar enterrada em retries inúteis.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Health check de 14 dimensões com alerta WhatsApp + email
&lt;/h3&gt;

&lt;p&gt;Criei &lt;code&gt;scripts/health_check.py&lt;/code&gt; no estilo do &lt;code&gt;geo-finops/health_check.py&lt;/code&gt; que já existe no meu ecossistema. O script roda 14 checks ponta a ponta:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;papers.db&lt;/code&gt; existe&lt;/li&gt;
&lt;li&gt;Schema com 21 tabelas obrigatórias&lt;/li&gt;
&lt;li&gt;As 4 API keys estão carregadas no ambiente&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smoke test real das 4 keys&lt;/strong&gt; (faz uma chamada mínima a cada provider)&lt;/li&gt;
&lt;li&gt;Pelo menos 200 observações nas últimas 24h&lt;/li&gt;
&lt;li&gt;Todas as 4 verticais coletaram nas últimas 24h&lt;/li&gt;
&lt;li&gt;Todos os 4 LLMs responderam nas últimas 24h&lt;/li&gt;
&lt;li&gt;Sem gap maior que 1 dia entre coletas (warning)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;papers_dashboard_data&lt;/code&gt; no Supabase com &lt;code&gt;total_observations &amp;gt; 0&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;FinOps gasto &amp;lt; 90% do budget mensal&lt;/li&gt;
&lt;li&gt;Endpoint &lt;code&gt;/research&lt;/code&gt; retornando HTTP 200&lt;/li&gt;
&lt;li&gt;Modelos pinados no banco (versões específicas)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;raw_text&lt;/code&gt; preservado para reprocessamento&lt;/li&gt;
&lt;li&gt;Entidades fictícias presentes no coorte (calibração de falso positivo)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Exit code 1 se qualquer check crítico falha. Quando falha, &lt;code&gt;send_alert()&lt;/code&gt; dispara dois canais em paralelo: WhatsApp Business API para &lt;code&gt;+5562998141505&lt;/code&gt; e email via Resend para &lt;code&gt;caramaschiai@caramaschiai.io&lt;/code&gt;. O conteúdo da mensagem inclui o sumário das falhas, métricas relevantes e um runbook básico de recovery.&lt;/p&gt;

&lt;p&gt;Smoke test rodado: &lt;code&gt;whatsapp: OK&lt;/code&gt;. Mensagem real chegou no celular.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Health check como gating no daily-collect
&lt;/h3&gt;

&lt;p&gt;O &lt;code&gt;daily-collect.yml&lt;/code&gt; ganhou um step novo no fim do job &lt;code&gt;finalize&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Health check (gating)&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;python scripts/health_check.py --min-obs-per-day &lt;/span&gt;&lt;span class="m"&gt;200&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sem &lt;code&gt;continue-on-error&lt;/code&gt;. Se o health check falha, o workflow falha. Se o workflow falha, o &lt;code&gt;daily-collect-alert.yml&lt;/code&gt; (workflow separado que escuta &lt;code&gt;workflow_run.failure&lt;/code&gt;) dispara WhatsApp + email.&lt;/p&gt;

&lt;p&gt;Mais um workflow agendado (&lt;code&gt;health-check-daily.yml&lt;/code&gt;) roda 4 horas depois — 13:00 UTC, camada redundante caso o daily-collect tenha falhado em algum aspecto que o gating não pegou. Defesa em profundidade.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. FinOps tighter
&lt;/h3&gt;

&lt;p&gt;Os budgets default eram folgados demais ($35/mês global) para o custo real observado (~$1/mês). Se algum bug fizesse queries explodirem por horas antes de eu notar, o estrago poderia ser de duas ordens de grandeza acima do que faria sentido pagar.&lt;/p&gt;

&lt;p&gt;Apertei tudo com 5x de margem sobre o custo médio observado:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Antes&lt;/th&gt;
&lt;th&gt;Depois&lt;/th&gt;
&lt;th&gt;Hard stop&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;openai&lt;/td&gt;
&lt;td&gt;$10/mês&lt;/td&gt;
&lt;td&gt;$3/mês&lt;/td&gt;
&lt;td&gt;95%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;anthropic&lt;/td&gt;
&lt;td&gt;$10/mês&lt;/td&gt;
&lt;td&gt;$3/mês&lt;/td&gt;
&lt;td&gt;95%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;google&lt;/td&gt;
&lt;td&gt;$5/mês&lt;/td&gt;
&lt;td&gt;$2/mês&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;perplexity&lt;/td&gt;
&lt;td&gt;$10/mês&lt;/td&gt;
&lt;td&gt;$3/mês&lt;/td&gt;
&lt;td&gt;95%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;groq&lt;/td&gt;
&lt;td&gt;$5/mês&lt;/td&gt;
&lt;td&gt;$1/mês&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;global&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$35/mês&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$10/mês&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;95%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Hard stop em 95% por provider significa que quando o gasto chega lá, o tracker bloqueia novas chamadas para aquele provider até o reset diário/mensal. Bill shock previne-se com cap, não com confiança.&lt;/p&gt;

&lt;h2&gt;
  
  
  Os bugs que descobri por acidente no caminho
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Gemini 2.5 Pro thinking mode
&lt;/h3&gt;

&lt;p&gt;Enquanto debugava a coleta, descobri que mesmo com keys novas o Gemini estava retornando dados vazios. O modelo &lt;code&gt;gemini-2.5-pro&lt;/code&gt; usa thinking tokens internos antes de gerar output. Com &lt;code&gt;max_output_tokens = 300&lt;/code&gt;, o thinking budget esgotava os tokens e a resposta voltava com &lt;code&gt;candidates[0].content&lt;/code&gt; sem campo &lt;code&gt;parts&lt;/code&gt;. O parser fazia &lt;code&gt;data["candidates"][0]["content"]["parts"][0]["text"]&lt;/code&gt; e dava &lt;code&gt;KeyError: 'parts'&lt;/code&gt;. Mas o &lt;code&gt;KeyError&lt;/code&gt; virava um log warning e a função retornava None — outro erro silencioso.&lt;/p&gt;

&lt;p&gt;Fix: 4x o &lt;code&gt;max_output_tokens&lt;/code&gt; para modelos &lt;code&gt;*-pro&lt;/code&gt; (compensa o thinking budget) + tratamento gracioso de respostas sem &lt;code&gt;parts&lt;/code&gt; (trata como string vazia em vez de exceção).&lt;/p&gt;

&lt;h3&gt;
  
  
  Idempotência exige normalização determinística do schema chave
&lt;/h3&gt;

&lt;p&gt;Esse vem de um bug irmão no meu pacote &lt;code&gt;geo-finops&lt;/code&gt; (tracking unificado de LLMs do meu ecossistema). Quando dois callers gravavam a mesma call lógica em formatos diferentes — Python local com microsegundos, Next.js server com milissegundos — eles passavam pelo dedup como "linhas diferentes". A constraint UNIQUE bate na string literal do timestamp, não no instante semântico.&lt;/p&gt;

&lt;p&gt;Fix: &lt;code&gt;_normalize_timestamp()&lt;/code&gt; que faz &lt;code&gt;datetime.fromisoformat(...).astimezone(timezone.utc).isoformat()&lt;/code&gt; antes de qualquer INSERT. Se você expõe um schema chave que inclui timestamp, normalize obrigatoriamente. A documentação do PostgreSQL não vai te lembrar disso.&lt;/p&gt;

&lt;h2&gt;
  
  
  O que eu aprendi (e estou levando para todos os outros pipelines)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Workflows verdes mentem.&lt;/strong&gt; Reescrevendo: workflows verdes não significam pipelines saudáveis. Eles significam que o processo terminou. A diferença entre os dois custou-me 8 dias de coleta e quase comprometeu um estudo de 90 dias.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;continue-on-error: true&lt;/code&gt; é dívida técnica disfarçada de resiliência.&lt;/strong&gt; Use com extrema parcimônia, e nunca em steps que produzem dados. Steps de cleanup, sim. Steps de coleta, jamais.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Smoke test de keys ≠ check de "key existe no env".&lt;/strong&gt; Verificar que &lt;code&gt;OPENAI_API_KEY&lt;/code&gt; está setada não diz nada sobre se ela é válida. O check 4 do meu health check faz uma chamada mínima a cada provider — custo total ~$0.0001, valor inestimável.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Defesa em profundidade &amp;gt; checagem única.&lt;/strong&gt; Health check no daily-collect (camada 1) + workflow separado 4h depois (camada 2) + alerta WhatsApp em qualquer falha (camada 3) + retry granular no coletor (camada 4) + budget tight com hard stop (camada 5). Se uma camada falha, a próxima pega.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Double check exige dados reais, não mocks.&lt;/strong&gt; O bug do 409 Conflict no &lt;code&gt;geo-finops&lt;/code&gt; (e o do timestamp não-normalizado) só apareceram quando rodei testes reais de fim a fim. Mocks teriam passado todos os checks. O caminho certo é: executar caller real, validar cada estágio do pipeline, re-executar para validar idempotência, cleanup pós-teste, adicionar regressão automatizada.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Backfill com timestamp alterado é fraude.&lt;/strong&gt; Se você está construindo evidência longitudinal, prefira o reset honesto à sequência inflada. Nove dias perdidos doem. Nove dias inventados invalidam o paper inteiro.&lt;/p&gt;

&lt;h2&gt;
  
  
  Onde isso vai
&lt;/h2&gt;

&lt;p&gt;A nova janela começa amanhã, 8 de abril. Daqui a 90 dias eu deveria ter ~23.000 observações reais, com integridade temporal, todas com &lt;code&gt;raw_text&lt;/code&gt; preservado para reprocessamento, modelos pinados para reprodutibilidade, e calibração de falso positivo embutida via 8 entidades fictícias.&lt;/p&gt;

&lt;p&gt;O dashboard ao vivo está em &lt;a href="https://alexandrecaramaschi.com/research" rel="noopener noreferrer"&gt;https://alexandrecaramaschi.com/research&lt;/a&gt;. O código (incluindo todos os fixes desta noite) está em &lt;a href="https://github.com/alexandrebrt14-sys/papers" rel="noopener noreferrer"&gt;https://github.com/alexandrebrt14-sys/papers&lt;/a&gt;. O health check é executável e auditável em &lt;code&gt;scripts/health_check.py&lt;/code&gt; — qualquer pessoa que queira replicar a metodologia consegue rodar os 14 checks no próprio fork.&lt;/p&gt;

&lt;p&gt;Se você está construindo um pipeline de coleta longitudinal e ainda não tem fail-loud em nenhum step, faça isso hoje. Não amanhã. A diferença entre descobrir o bug em uma hora e descobrir em 12 dias é a diferença entre um post como este e um paper morto.&lt;/p&gt;

&lt;p&gt;Estou contando para chegar a 6 de julho com massa crítica. Aceito relatos de bugs parecidos — o meu post-mortem é seu também.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Alexandre Caramaschi&lt;/strong&gt; — CEO da Brasil GEO, ex-CMO da Semantix (Nasdaq), cofundador da AI Brasil. Escreve sobre Generative Engine Optimization, pesquisa empírica em LLMs e infraestrutura de pipelines em &lt;a href="https://alexandrecaramaschi.com" rel="noopener noreferrer"&gt;https://alexandrecaramaschi.com&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>postmortem</category>
      <category>observability</category>
      <category>llm</category>
      <category>finops</category>
    </item>
    <item>
      <title>847 commits em 3 semanas: como vibe coding transformou um executivo de marketing em builder</title>
      <dc:creator>Alexandre Caramaschi</dc:creator>
      <pubDate>Sat, 04 Apr 2026 20:11:25 +0000</pubDate>
      <link>https://dev.to/alexandrebrt14sys/847-commits-em-3-semanas-como-vibe-coding-transformou-um-executivo-de-marketing-em-builder-bhl</link>
      <guid>https://dev.to/alexandrebrt14sys/847-commits-em-3-semanas-como-vibe-coding-transformou-um-executivo-de-marketing-em-builder-bhl</guid>
      <description>&lt;p&gt;Ha 3 semanas eu nao tinha uma unica linha de codigo publicada. Zero.&lt;/p&gt;

&lt;p&gt;Eu era um executivo de marketing com 20 anos de mercado — ex-CMO da Semantix na Nasdaq, cofundador da AI Brasil — mas nunca tinha escrito codigo de producao.&lt;/p&gt;

&lt;p&gt;Hoje tenho:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;13 repositorios no GitHub&lt;/strong&gt; com 847 commits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2 sites em producao&lt;/strong&gt; com uptime 100%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;40 cursos educacionais gratuitos&lt;/strong&gt; publicados&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pipeline que orquestra 5 IAs simultaneamente&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;29.400 linhas de Python&lt;/strong&gt; num sistema de governanca pessoal&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;653 citacoes academicas monitoradas&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auditoria OWASP&lt;/strong&gt; com 34 findings e 11 correcoes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Custo mensal: zero dolares.&lt;/p&gt;

&lt;h2&gt;
  
  
  O que e vibe coding na pratica
&lt;/h2&gt;

&lt;p&gt;Nao e pedir para uma IA fazer um site. E uma conversa tecnica continua. Voce traz visao de negocio e decisoes estrategicas. A IA traz execucao em velocidade impossivel para times tradicionais.&lt;/p&gt;

&lt;p&gt;Meu fluxo:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Definir o que precisa existir e por que&lt;/li&gt;
&lt;li&gt;Claude Code escreve, testa e faz deploy&lt;/li&gt;
&lt;li&gt;Validar, ajustar, corrigir rumo&lt;/li&gt;
&lt;li&gt;Proximo passo&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Cada iteracao levava minutos, nao dias.&lt;/p&gt;

&lt;h2&gt;
  
  
  Os numeros
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Site pessoal (alexandrecaramaschi.com):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;429 commits, 124 paginas, 27 rotas de API&lt;/li&gt;
&lt;li&gt;Gamificacao completa (XP, streaks, badges, certificados)&lt;/li&gt;
&lt;li&gt;Busca semantica com pgvector&lt;/li&gt;
&lt;li&gt;29 tipos de Schema.org JSON-LD&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Governanca pessoal (29.400 linhas Python):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;WhatsApp responde 85% sem IA (deterministico, custo zero)&lt;/li&gt;
&lt;li&gt;Parser Itau PDF classifica 711 transacoes em 30 categorias&lt;/li&gt;
&lt;li&gt;6 calendarios sincronizados com detector de gaps&lt;/li&gt;
&lt;li&gt;Briefing matinal automatico as 7h&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pipeline academico:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;7.010 linhas, coleta diaria em 4 verticais&lt;/li&gt;
&lt;li&gt;653 citacoes sobre LLMs e empresas brasileiras&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  5 licoes
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Comece com problema real, nao com tecnologia&lt;/li&gt;
&lt;li&gt;Documente tudo desde o dia 1&lt;/li&gt;
&lt;li&gt;Seguranca nao e opcional&lt;/li&gt;
&lt;li&gt;O custo de errar caiu drasticamente&lt;/li&gt;
&lt;li&gt;O mercado nao vai esperar voce ficar pronto&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;847 commits. 3 semanas. Sem equipe. Custo zero. E estamos so no comeco.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Alexandre Caramaschi — CEO da Brasil GEO, ex-CMO da Semantix (Nasdaq), cofundador da AI Brasil&lt;/em&gt;&lt;/p&gt;

</description>
      <category>vibecoding</category>
      <category>ai</category>
      <category>productivity</category>
      <category>beginners</category>
    </item>
    <item>
      <title>De 60 Issues para 14: Como Refatorei 194K Linhas com 5 IAs via Vibecoding</title>
      <dc:creator>Alexandre Caramaschi</dc:creator>
      <pubDate>Sat, 04 Apr 2026 19:00:08 +0000</pubDate>
      <link>https://dev.to/alexandrebrt14sys/de-60-issues-para-14-como-refatorei-194k-linhas-com-5-ias-via-vibecoding-4a0l</link>
      <guid>https://dev.to/alexandrebrt14sys/de-60-issues-para-14-como-refatorei-194k-linhas-com-5-ias-via-vibecoding-4a0l</guid>
      <description>&lt;p&gt;Uma sessao de trabalho. 70 commits. 10 repositorios. 194 mil linhas de codigo auditadas. 5 modelos de linguagem orquestrados. Custo total: US$60.&lt;/p&gt;

&lt;p&gt;Esse e o relato tecnico de como usei Vibecoding para transformar um ecossistema de automacoes pessoais em uma plataforma de governanca digital pronta para escalar com Google Ads.&lt;/p&gt;

&lt;h2&gt;
  
  
  O Ponto de Partida
&lt;/h2&gt;

&lt;p&gt;Meu projeto comecou com 21 mil linhas de Python, 6 sub-calendarios sincronizados, um webhook WhatsApp Business e um banco SQLite com 1.831 registros. O sistema dizia NAO TENHO ACESSO quando os dados estavam a um SELECT de distancia.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tirar a IA do Caminho
&lt;/h2&gt;

&lt;p&gt;Para dados deterministicos, a IA generativa e o problema. Implementei pipeline de tres camadas: keywords sem LLM em menos de 100ms, classificacao LLM como fallback, geracao LLM como ultimo recurso. 85 porcento das queries nunca tocam num LLM.&lt;/p&gt;

&lt;h2&gt;
  
  
  70 Commits em 10 Repos
&lt;/h2&gt;

&lt;p&gt;Issues GitHub: de 60 para 14. Testes: de 7 para 13. Resposta WhatsApp: de 3-8s para menos de 100ms. Tabelas documentadas: de 0 para 64.&lt;/p&gt;

&lt;h2&gt;
  
  
  5 LLMs Orquestrados
&lt;/h2&gt;

&lt;p&gt;Perplexity pesquisa. GPT-4o redige. Gemini analisa. Groq classifica. Claude arquiteta. 10 execucoes, 5 RFCs, US$60 total.&lt;/p&gt;

&lt;h2&gt;
  
  
  7 Licoes
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Para dados deterministicos, tire o LLM do caminho&lt;/li&gt;
&lt;li&gt;Nunca confie no prompt para proibir comportamentos&lt;/li&gt;
&lt;li&gt;Deploy nao e commit&lt;/li&gt;
&lt;li&gt;Documente os dados antes de escalar&lt;/li&gt;
&lt;li&gt;Orquestre LLMs em vez de depender de um so&lt;/li&gt;
&lt;li&gt;Prepare tracking antes dos anuncios&lt;/li&gt;
&lt;li&gt;Use Vibecoding para acelerar, nao para substituir pensamento&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;em&gt;Alexandre Caramaschi - CEO da Brasil GEO, ex-CMO da Semantix (Nasdaq), cofundador da AI Brasil.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>vibecoding</category>
      <category>python</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Como Implementei 30 Tipos de Schema JSON-LD e llms.txt Para Ser Citado por ChatGPT, Gemini e Claude</title>
      <dc:creator>Alexandre Caramaschi</dc:creator>
      <pubDate>Tue, 31 Mar 2026 08:54:17 +0000</pubDate>
      <link>https://dev.to/alexandrebrt14sys/como-implementei-30-tipos-de-schema-json-ld-e-llmstxt-para-ser-citado-por-chatgpt-gemini-e-claude-3ooc</link>
      <guid>https://dev.to/alexandrebrt14sys/como-implementei-30-tipos-de-schema-json-ld-e-llmstxt-para-ser-citado-por-chatgpt-gemini-e-claude-3ooc</guid>
      <description>&lt;h1&gt;
  
  
  Como Implementei 30 Tipos de Schema JSON-LD e llms.txt Para Ser Citado por ChatGPT, Gemini e Claude
&lt;/h1&gt;

&lt;p&gt;Quando decidi que meu site precisava ser entendido por IAs, não apenas por humanos, percebi que estava diante de um problema que quase ninguém estava resolvendo. A maioria dos desenvolvedores ainda otimiza exclusivamente para o Google. Mas o tráfego de respostas geradas por IA — ChatGPT, Gemini, Perplexity, Claude — já é uma realidade. E essas engines não leem seu site da mesma forma que o Googlebot.&lt;/p&gt;

&lt;p&gt;Eu precisava de duas coisas: uma &lt;strong&gt;carteira de identidade estruturada&lt;/strong&gt; que qualquer máquina pudesse interpretar (Schema JSON-LD) e um &lt;strong&gt;currículo legível&lt;/strong&gt; que eu entregaria diretamente para os LLMs (llms.txt). Este artigo documenta exatamente como implementei ambos nos projetos &lt;a href="https://alexandrecaramaschi.com" rel="noopener noreferrer"&gt;alexandrecaramaschi.com&lt;/a&gt; e &lt;a href="https://brasilgeo.ai" rel="noopener noreferrer"&gt;brasilgeo.ai&lt;/a&gt;, com código real e resultados verificáveis.&lt;/p&gt;

&lt;h2&gt;
  
  
  O Que é Schema JSON-LD (e Por Que IAs Precisam Disso)
&lt;/h2&gt;

&lt;p&gt;Pense no Schema JSON-LD como a &lt;strong&gt;carteira de identidade da sua página na web&lt;/strong&gt;. Quando você conhece alguém, a pessoa diz o nome, onde trabalha, o que faz. O JSON-LD faz exatamente isso, só que para máquinas.&lt;/p&gt;

&lt;p&gt;É um bloco de dados estruturados em formato JSON que você insere no &lt;code&gt;&amp;lt;head&amp;gt;&lt;/code&gt; da sua página. Ele não aparece visualmente para o usuário — é invisível. Mas para crawlers de busca e pipelines RAG (Retrieval-Augmented Generation) que alimentam LLMs, é ouro puro.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"@context"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://schema.org"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"@graph"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"@type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Organization"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Brasil GEO"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://brasilgeo.ai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"founder"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"@type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Person"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Alexandre Caramaschi"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"jobTitle"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CEO"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"@type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"WebSite"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Alexandre Caramaschi"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://alexandrecaramaschi.com"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;O segredo está no &lt;code&gt;@graph&lt;/code&gt;: em vez de ter múltiplos scripts JSON-LD espalhados pela página, eu consolido tudo em um &lt;strong&gt;grafo único&lt;/strong&gt;. Isso facilita a interpretação tanto por motores de busca tradicionais quanto por sistemas de IA que montam contexto para geração de respostas.&lt;/p&gt;

&lt;h2&gt;
  
  
  Os 30 Tipos de Schema Que Implementei
&lt;/h2&gt;

&lt;p&gt;No &lt;a href="https://alexandrecaramaschi.com" rel="noopener noreferrer"&gt;alexandrecaramaschi.com&lt;/a&gt; — construído com Next.js 16 + React 19, com 41 artigos publicados — implementei 30 tipos de Schema.org organizados em um único &lt;code&gt;@graph&lt;/code&gt;. Aqui está a lista completa com a função de cada um:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Identidade e Entidade Principal&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Organization&lt;/strong&gt; — Define a Brasil GEO como entidade com sameAs para Wikidata (Q138755989)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Person&lt;/strong&gt; — Alexandre Caramaschi com credenciais, vínculos e Wikidata (Q138755507)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WebSite&lt;/strong&gt; — Metadados do site, SearchAction para busca interna&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ProfilePage&lt;/strong&gt; — Página "Sobre" como perfil canônico da entidade&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Conteúdo Editorial&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Article&lt;/strong&gt; — Cada um dos 41 artigos com autor, data, imagem&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BlogPosting&lt;/strong&gt; — Posts do blog com datePublished e dateModified&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TechArticle&lt;/strong&gt; — Artigos técnicos com proficiencyLevel&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NewsArticle&lt;/strong&gt; — Conteúdo com caráter noticioso&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HowTo&lt;/strong&gt; — Guias passo a passo com steps estruturados&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FAQPage&lt;/strong&gt; — Perguntas frequentes com mainEntity em array&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Educação e Cursos&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Course&lt;/strong&gt; — Cursos sobre GEO com provider e hasCourseInstance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CourseInstance&lt;/strong&gt; — Instâncias específicas com datas e modalidade&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EducationalOrganization&lt;/strong&gt; — Brasil GEO como provedora educacional&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LearningResource&lt;/strong&gt; — Recursos educacionais complementares&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Mídia&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;VideoObject&lt;/strong&gt; — Vídeos com thumbnailUrl, duration, uploadDate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ImageObject&lt;/strong&gt; — Imagens estruturadas com contentUrl e caption&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MediaObject&lt;/strong&gt; — Objetos de mídia genéricos&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Navegação e Estrutura&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;BreadcrumbList&lt;/strong&gt; — Trilha de navegação hierárquica em cada página&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SiteNavigationElement&lt;/strong&gt; — Menu principal estruturado&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ItemList&lt;/strong&gt; — Listas ordenadas de conteúdo (ex: top artigos)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CollectionPage&lt;/strong&gt; — Páginas de coleção (categorias, tags)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Eventos e Interação&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Event&lt;/strong&gt; — Webinars, palestras e workshops&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Review&lt;/strong&gt; — Avaliações estruturadas de serviços&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ContactPoint&lt;/strong&gt; — Canais de contato com tipo e idioma&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;SEO Avançado e IA&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Service&lt;/strong&gt; — Serviços oferecidos pela Brasil GEO&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Offer&lt;/strong&gt; — Ofertas vinculadas a cursos e serviços&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AggregateRating&lt;/strong&gt; — Avaliação agregada de serviços&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SpeakableSpecification&lt;/strong&gt; — Trechos otimizados para leitura por voz&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ClaimReview&lt;/strong&gt; — Verificação de afirmações (fact-checking)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DefinedTerm&lt;/strong&gt; — Termos do glossário GEO com definição formal&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  O Que é llms.txt (O Currículo Para IAs)
&lt;/h2&gt;

&lt;p&gt;Se o Schema JSON-LD é a carteira de identidade, o &lt;strong&gt;llms.txt é o currículo que você entrega diretamente para a IA&lt;/strong&gt;. É um arquivo em texto simples, hospedado na raiz do seu domínio (&lt;code&gt;/llms.txt&lt;/code&gt;), que resume toda a estrutura do seu site em formato que LLMs conseguem consumir eficientemente.&lt;/p&gt;

&lt;p&gt;Enquanto o &lt;code&gt;robots.txt&lt;/code&gt; diz ao crawler o que ele &lt;em&gt;pode&lt;/em&gt; acessar, o &lt;code&gt;llms.txt&lt;/code&gt; diz ao LLM o que ele &lt;em&gt;deveria&lt;/em&gt; ler e como seu conteúdo está organizado.&lt;/p&gt;

&lt;p&gt;No &lt;a href="https://brasilgeo.ai" rel="noopener noreferrer"&gt;brasilgeo.ai&lt;/a&gt; — construído com Cloudflare Workers e 28 artigos HTML — mantenho dois arquivos:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;llms.txt&lt;/strong&gt; — 258 linhas, 23KB — mapa conciso com links e descrições&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;llms-full.txt&lt;/strong&gt; — 42KB — conteúdo expandido para LLMs com janela de contexto grande&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A estrutura segue um formato markdown simplificado:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Brasil GEO&lt;/span&gt;
&lt;span class="gt"&gt;
&amp;gt; Consultoria especializada em Generative Engine Optimization (GEO).&lt;/span&gt;
&lt;span class="gt"&gt;&amp;gt; Ajudamos empresas a ganhar visibilidade em ChatGPT, Gemini,&lt;/span&gt;
&lt;span class="gt"&gt;&amp;gt; Perplexity e outros motores de IA generativa.&lt;/span&gt;

&lt;span class="gu"&gt;## Artigos&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;O Guia Completo de GEO&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://brasilgeo.ai/artigos/guia-completo-geo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;: Estratégias para otimizar conteúdo para motores de IA generativa.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Schema JSON-LD para IA&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://brasilgeo.ai/artigos/schema-jsonld-ia&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;: Como estruturar dados para visibilidade em LLMs.

&lt;span class="gu"&gt;## Cursos&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Fundamentos de GEO&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://brasilgeo.ai/cursos/fundamentos-geo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;: Curso introdutório sobre Generative Engine Optimization.

&lt;span class="gu"&gt;## Repositórios Open-Source&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;geo-checklist&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://github.com/alexandrebrt14-sys/geo-checklist&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;: Checklist completo de GEO com 80+ itens verificáveis.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;llms-txt-templates&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://github.com/alexandrebrt14-sys/llms-txt-templates&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;: Templates reutilizáveis para llms.txt.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;entity-consistency-playbook&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://github.com/alexandrebrt14-sys/entity-consistency-playbook&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;: Playbook para consistência de entidades em GEO.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Implementação Prática no Next.js
&lt;/h2&gt;

&lt;p&gt;No alexandrecaramaschi.com, criei um componente &lt;code&gt;JsonLd.tsx&lt;/code&gt; que renderiza o grafo completo no &lt;code&gt;&amp;lt;head&amp;gt;&lt;/code&gt; via &lt;code&gt;layout.tsx&lt;/code&gt;. Aqui está a versão simplificada:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// components/JsonLd.tsx&lt;/span&gt;
&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;JsonLdProps&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;JsonLd&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;graph&lt;/span&gt; &lt;span class="p"&gt;}:&lt;/span&gt; &lt;span class="nx"&gt;JsonLdProps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;jsonLd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@context&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://schema.org&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@graph&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;script&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"application/ld+json"&lt;/span&gt;
      &lt;span class="na"&gt;dangerouslySetInnerHTML&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;__html&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jsonLd&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No &lt;code&gt;layout.tsx&lt;/code&gt;, o componente recebe o grafo montado dinamicamente com base na rota:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/layout.tsx&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;JsonLd&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@/components/JsonLd&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;buildGraph&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@/lib/schema&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;RootLayout&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;children&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;buildGraph&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// monta Organization, Person, WebSite&lt;/span&gt;

  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;html&lt;/span&gt; &lt;span class="na"&gt;lang&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"pt-BR"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;head&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;JsonLd&lt;/span&gt; &lt;span class="na"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt; &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;head&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;body&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;children&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;body&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;html&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cada página de artigo adiciona seus próprios tipos ao grafo (Article, BreadcrumbList, FAQPage), e o componente consolida tudo em um único &lt;code&gt;&amp;lt;script type="application/ld+json"&amp;gt;&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementação no Cloudflare Workers
&lt;/h2&gt;

&lt;p&gt;No &lt;a href="https://brasilgeo.ai" rel="noopener noreferrer"&gt;brasilgeo.ai&lt;/a&gt;, o llms.txt e o llms-full.txt são servidos diretamente pelo Cloudflare Worker. A lógica é simples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// worker.js&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;handleRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/llms.txt&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;LLMS_TXT_CONTENT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text/plain; charset=utf-8&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Cache-Control&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;public, max-age=86400&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/llms-full.txt&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;LLMS_FULL_TXT_CONTENT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text/plain; charset=utf-8&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Cache-Control&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;public, max-age=86400&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// ... demais rotas&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;O cache de 24 horas (&lt;code&gt;max-age=86400&lt;/code&gt;) garante performance sem sacrificar a atualização do conteúdo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resultados Verificáveis
&lt;/h2&gt;

&lt;p&gt;Implementar Schema JSON-LD e llms.txt não é um exercício teórico. Aqui estão os resultados concretos:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Entity consistency score&lt;/strong&gt; validado automaticamente pelo &lt;code&gt;lint-content.js&lt;/code&gt; com 44+ checks por execução — verifica se nomes, credenciais e vínculos estão consistentes em todo o conteúdo&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Presença no Wikidata&lt;/strong&gt; — Person (Q138755507) e Organization (Q138755989) vinculados via &lt;code&gt;sameAs&lt;/code&gt; no Schema, criando uma âncora de entidade que LLMs reconhecem&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;6 repositórios open-source&lt;/strong&gt; no GitHub referenciados no llms.txt, criando sinais de autoridade distribuídos: &lt;a href="https://github.com/alexandrebrt14-sys/geo-checklist" rel="noopener noreferrer"&gt;geo-checklist&lt;/a&gt;, &lt;a href="https://github.com/alexandrebrt14-sys/llms-txt-templates" rel="noopener noreferrer"&gt;llms-txt-templates&lt;/a&gt;, &lt;a href="https://github.com/alexandrebrt14-sys/entity-consistency-playbook" rel="noopener noreferrer"&gt;entity-consistency-playbook&lt;/a&gt;, geo-taxonomy, geo-orchestrator e landing-page-geo&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pipeline multi-LLM&lt;/strong&gt; com o geo-orchestrator usando 5 LLMs (Perplexity para pesquisa, GPT-4o para redação, Gemini para análise, Groq para classificação, Claude para revisão) — garantindo que o conteúdo produzido já nasce otimizado para múltiplos motores&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Crosslinks estruturados&lt;/strong&gt; entre os 41 artigos do alexandrecaramaschi.com e os 28 do brasilgeo.ai, com referências mútuas que reforçam a topical authority&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Guia Passo a Passo Para Começar Hoje
&lt;/h2&gt;

&lt;p&gt;Se você quer implementar Schema JSON-LD e llms.txt no seu projeto, siga estes 5 passos:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Defina sua entidade principal
&lt;/h3&gt;

&lt;p&gt;Crie um Schema &lt;code&gt;Organization&lt;/code&gt; ou &lt;code&gt;Person&lt;/code&gt; com &lt;code&gt;name&lt;/code&gt;, &lt;code&gt;url&lt;/code&gt;, &lt;code&gt;description&lt;/code&gt; e &lt;code&gt;sameAs&lt;/code&gt; (LinkedIn, GitHub, Wikidata). Essa é a fundação de tudo.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Implemente o &lt;a class="mentioned-user" href="https://dev.to/graph"&gt;@graph&lt;/a&gt; único
&lt;/h3&gt;

&lt;p&gt;Em vez de múltiplos &lt;code&gt;&amp;lt;script type="application/ld+json"&amp;gt;&lt;/code&gt;, consolide tudo em um &lt;code&gt;@graph&lt;/code&gt;. Isso evita conflitos e facilita a manutenção.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Adicione tipos por página
&lt;/h3&gt;

&lt;p&gt;Cada página deve ter seus tipos específicos: &lt;code&gt;Article&lt;/code&gt; para posts, &lt;code&gt;FAQPage&lt;/code&gt; para FAQs, &lt;code&gt;Course&lt;/code&gt; para cursos. Use o &lt;a href="https://validator.schema.org/" rel="noopener noreferrer"&gt;Schema.org Validator&lt;/a&gt; para verificar.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Crie seu llms.txt
&lt;/h3&gt;

&lt;p&gt;Comece com a estrutura básica: título, descrição em blockquote, seções com links. Use o template do repositório &lt;a href="https://github.com/alexandrebrt14-sys/llms-txt-templates" rel="noopener noreferrer"&gt;llms-txt-templates&lt;/a&gt; como ponto de partida.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Automatize a validação
&lt;/h3&gt;

&lt;p&gt;Implemente um script de lint que verifique a consistência das entidades. O &lt;a href="https://github.com/alexandrebrt14-sys/entity-consistency-playbook" rel="noopener noreferrer"&gt;entity-consistency-playbook&lt;/a&gt; tem um guia completo de como fazer isso.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusão
&lt;/h2&gt;

&lt;p&gt;Schema JSON-LD e llms.txt não são tendências passageiras — são a infraestrutura de visibilidade para a era da IA generativa. Se o seu site não tem dados estruturados que LLMs consigam interpretar, você está invisível para uma parcela crescente do tráfego digital.&lt;/p&gt;

&lt;p&gt;Comecei com um tipo de Schema. Hoje tenho 30. Comecei sem llms.txt. Hoje tenho dois arquivos que somam 65KB de contexto estruturado. Cada adição foi incremental, testável e verificável.&lt;/p&gt;

&lt;p&gt;Se quiser um roteiro completo, o &lt;a href="https://github.com/alexandrebrt14-sys/geo-checklist" rel="noopener noreferrer"&gt;geo-checklist&lt;/a&gt; tem mais de 80 itens verificáveis para GEO. E o &lt;a href="https://github.com/alexandrebrt14-sys/entity-consistency-playbook" rel="noopener noreferrer"&gt;entity-consistency-playbook&lt;/a&gt; mostra como manter a consistência de entidades que faz diferença real na citação por IAs.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Alexandre Caramaschi é CEO da &lt;a href="https://brasilgeo.ai" rel="noopener noreferrer"&gt;Brasil GEO&lt;/a&gt;, ex-CMO da Semantix (Nasdaq) e cofundador da AI Brasil. Especialista em Generative Engine Optimization, ajuda empresas a serem citadas por ChatGPT, Gemini, Perplexity e Claude.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>seo</category>
      <category>schema</category>
      <category>jsonld</category>
      <category>ai</category>
    </item>
    <item>
      <title>Como construímos uma plataforma educacional de 36 cursos em 10 dias — e o que aprendemos no caminho</title>
      <dc:creator>Alexandre Caramaschi</dc:creator>
      <pubDate>Sun, 29 Mar 2026 18:43:46 +0000</pubDate>
      <link>https://dev.to/alexandrebrt14sys/como-construimos-uma-plataforma-educacional-de-36-cursos-em-10-dias-e-o-que-aprendemos-no-caminho-4flm</link>
      <guid>https://dev.to/alexandrebrt14sys/como-construimos-uma-plataforma-educacional-de-36-cursos-em-10-dias-e-o-que-aprendemos-no-caminho-4flm</guid>
      <description>&lt;p&gt;Em 19 de março de 2026, commitamos a primeira linha de código do que viria a se tornar a plataforma educacional da Brasil GEO. Dez dias depois, tínhamos 36 cursos, 401 módulos, um sistema de gamificação completo e um painel administrativo com auditoria de segurança feita por cinco modelos de linguagem simultaneamente.&lt;/p&gt;

&lt;p&gt;Este artigo documenta o processo — não como vitrine, mas como estudo de caso. Cada decisão arquitetural carregou consequências. Cada incidente revelou premissas erradas. E cada correção ensinou algo que manuais de engenharia raramente cobrem.&lt;/p&gt;

&lt;h2&gt;
  
  
  A tese inicial: educação como infraestrutura de autoridade
&lt;/h2&gt;

&lt;p&gt;A Brasil GEO nasceu como consultoria em Generative Engine Optimization — a disciplina de tornar marcas citáveis por ChatGPT, Gemini e Perplexity. Mas consultoria escala linearmente. Educação escala exponencialmente.&lt;/p&gt;

&lt;p&gt;A hipótese era direta: se criássemos uma plataforma educacional gratuita e aberta sobre GEO, IA e desenvolvimento, construiríamos três ativos simultaneamente — autoridade técnica perante LLMs, uma base de usuários engajados e um pipeline de leads qualificados para consultoria.&lt;/p&gt;

&lt;p&gt;O roadmap foi estruturado em cinco etapas sequenciais, cada uma desbloqueando a próxima:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Etapa 1 — Resolver Invisibilidade (60%)&lt;/strong&gt;&lt;br&gt;
Indexação, sitemap, IndexNow, headers de segurança. Saímos do zero para 78 URLs submetidas a três motores de busca.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Etapa 2 — Eliminar Violações (70%)&lt;/strong&gt;&lt;br&gt;
Consistência de entidade. O mesmo profissional aparecia como "Colunista" em um lugar, "CEO" em outro, com biografias divergentes em oito plataformas. Corrigimos cada uma.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Etapa 3 — Motor de Conteúdo (80%)&lt;/strong&gt;&lt;br&gt;
A plataforma educacional propriamente dita. 36 cursos cobrindo desde Python básico até agentes autônomos de IA. 401 módulos. 51 questões interativas. Sistema de XP, 13 badges, streaks e certificados.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Etapa 4 — Autoridade Externa (20%)&lt;/strong&gt;&lt;br&gt;
Imprensa, academia, backlinks. Cinco pitches escritos, um working paper acadêmico em preparação.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Etapa 5 — Dominar Nicho (15%)&lt;/strong&gt;&lt;br&gt;
Knowledge Panel, ranking SERP, monetização. O horizonte de longo prazo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Os números da plataforma
&lt;/h2&gt;

&lt;p&gt;Após 10 dias de desenvolvimento intensivo com 367 commits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;115.000 linhas de código TypeScript em produção&lt;/li&gt;
&lt;li&gt;344 arquivos TypeScript/TSX&lt;/li&gt;
&lt;li&gt;36 cursos com certificação&lt;/li&gt;
&lt;li&gt;401 módulos de aprendizado&lt;/li&gt;
&lt;li&gt;140 horas de conteúdo estimado&lt;/li&gt;
&lt;li&gt;51 questões interativas (QuizEngine)&lt;/li&gt;
&lt;li&gt;13 badges de gamificação&lt;/li&gt;
&lt;li&gt;46 artigos publicados em 5 plataformas&lt;/li&gt;
&lt;li&gt;16 rotas administrativas (7 páginas + 9 APIs)&lt;/li&gt;
&lt;li&gt;13 fontes de dados ao vivo no dashboard de métricas&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A stack: Next.js 16, React 19, Tailwind CSS 4, Supabase (auth + database), Vercel (deploy), Resend (email transacional).&lt;/p&gt;

&lt;h2&gt;
  
  
  O que quebrou — e o que aprendemos
&lt;/h2&gt;

&lt;p&gt;Nenhum projeto ambicioso sobrevive ao contato com a produção sem cicatrizes. Documentamos três incidentes significativos.&lt;/p&gt;

&lt;h3&gt;
  
  
  Incidente 1: A corrupção silenciosa dos acentos (27 de março)
&lt;/h3&gt;

&lt;p&gt;Criamos um script para corrigir acentuação em texto PT-BR. O script funcionou perfeitamente no texto visível. Mas também corrigiu URLs, transformando &lt;code&gt;/educacao&lt;/code&gt; em &lt;code&gt;/educação&lt;/code&gt; (com cedilha e til). Cinquenta e cinco links internos quebraram simultaneamente.&lt;/p&gt;

&lt;p&gt;A lição: automação sem limites de escopo é uma arma apontada para o próprio pé. Implementamos proteção de URLs como regra permanente — slugs são sempre ASCII, acentos apenas em texto renderizado.&lt;/p&gt;

&lt;h3&gt;
  
  
  Incidente 2: O rate limiter que bloqueou o site inteiro (29 de março)
&lt;/h3&gt;

&lt;p&gt;Implementamos rate limiting de 30 requisições por minuto como proteção contra abuso. O problema: aplicamos o limite a todas as rotas, incluindo páginas HTML, CSS e JavaScript. Uma única visita a uma página dispara 15-20 requisições de assets. Duas visitas consecutivas já estouravam o limite.&lt;/p&gt;

&lt;p&gt;Usuários reais recebiam JSON de erro em vez da página. O site ficou inacessível por 30 minutos até diagnosticarmos a causa.&lt;/p&gt;

&lt;p&gt;A correção: rate limiting exclusivamente em rotas &lt;code&gt;/api/*&lt;/code&gt;, com limite aumentado para 120 requisições por minuto.&lt;/p&gt;

&lt;h3&gt;
  
  
  Incidente 3: O loop infinito do login admin (29 de março)
&lt;/h3&gt;

&lt;p&gt;O painel administrativo tinha um layout que verificava a sessão do usuário e redirecionava para &lt;code&gt;/admin/login&lt;/code&gt; se não autenticado. O problema: &lt;code&gt;/admin/login&lt;/code&gt; era filho de &lt;code&gt;/admin&lt;/code&gt;, então herdava o mesmo layout. O layout verificava a sessão, não encontrava, redirecionava para login, que disparava o layout novamente. Loop infinito.&lt;/p&gt;

&lt;p&gt;A solução exigiu reestruturar a arquitetura de diretório usando Route Groups do Next.js — uma pasta &lt;code&gt;(protected)&lt;/code&gt; para rotas que exigem autenticação, com o login fora dessa estrutura.&lt;/p&gt;

&lt;h2&gt;
  
  
  A auditoria de segurança com cinco LLMs
&lt;/h2&gt;

&lt;p&gt;Submetemos o painel administrativo a uma auditoria completa usando cinco modelos de linguagem em paralelo: Claude Opus para arquitetura, GPT-4o para redação, Gemini para análise, Perplexity para pesquisa de vulnerabilidades conhecidas e Groq para classificação rápida.&lt;/p&gt;

&lt;p&gt;O resultado foi revelador:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uma vulnerabilidade crítica de bypass de autenticação — um endpoint antigo que verificava email sem validar a senha&lt;/li&gt;
&lt;li&gt;Ausência de proteção CSRF em todos os endpoints administrativos&lt;/li&gt;
&lt;li&gt;Rate limiters em memória que resetavam a cada deploy (ineficazes em serverless)&lt;/li&gt;
&lt;li&gt;Logout que não invalidava cookies de sessão no servidor&lt;/li&gt;
&lt;li&gt;Validação de entrada baseada em &lt;code&gt;typeof&lt;/code&gt; manual, sem schema formal&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Corrigimos tudo em uma única sessão: removemos o endpoint vulnerável, implementamos CSRF via validação de Origin/Referer, migramos o rate limiter para Redis distribuído (Upstash), criamos logout server-side que limpa cookies SSR, e substituímos toda validação manual por schemas Zod.&lt;/p&gt;

&lt;h2&gt;
  
  
  O que os alunos ganham
&lt;/h2&gt;

&lt;p&gt;A plataforma é inteiramente gratuita. Qualquer pessoa pode criar uma conta, acessar os 36 cursos e acompanhar seu progresso. O sistema de gamificação não é cosmético — badges, XP e streaks criam ciclos de retenção baseados em reforço positivo.&lt;/p&gt;

&lt;p&gt;Os cursos cobrem um arco que vai do básico ao avançado: configuração de ambiente de desenvolvimento, Python, Node.js, GitHub, Claude Code, MCP (Model Context Protocol), prompt engineering avançado, SEO e GEO, agentes autônomos de IA, dados com Python, e cursos verticais para setores como saúde, agronegócio, turismo e advocacia.&lt;/p&gt;

&lt;p&gt;Cada curso tem certificado digital emitido automaticamente via API, com envio por email. Os quizzes interativos validam compreensão real, não apenas presença.&lt;/p&gt;

&lt;h2&gt;
  
  
  Próximos passos
&lt;/h2&gt;

&lt;p&gt;Três prioridades imediatas definem o próximo trimestre:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Autenticação multi-fator para administradores.&lt;/strong&gt; A infraestrutura TOTP já existe como stub. Falta integrar a biblioteca otplib e gerar QR codes para registro.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Escala de conteúdo via cross-posting automatizado.&lt;/strong&gt; Um pipeline que publica artigos simultaneamente em DEV.to, Medium e Hashnode, com canonical URL apontando para o site principal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Autoridade externa.&lt;/strong&gt; Publicação do working paper acadêmico em SSRN e Preprints.org. Envio de pitches para Meio e Mensagem e veículos de tecnologia.&lt;/p&gt;

&lt;p&gt;A plataforma está em &lt;a href="https://alexandrecaramaschi.com/educacao" rel="noopener noreferrer"&gt;alexandrecaramaschi.com/educacao&lt;/a&gt;. O roadmap completo, com métricas ao vivo de 13 fontes de dados, está em &lt;a href="https://alexandrecaramaschi.com/roadmap" rel="noopener noreferrer"&gt;alexandrecaramaschi.com/roadmap&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Construir em público significa aceitar que o processo é tão valioso quanto o produto. Os três incidentes documentados acima ensinaram mais sobre engenharia de produção do que qualquer tutorial poderia.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Alexandre Caramaschi é CEO da Brasil GEO e ex-CMO da Semantix (Nasdaq). Escreve sobre GEO, IA e visibilidade algorítmica.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>brazildevs</category>
      <category>education</category>
      <category>nextjs</category>
      <category>security</category>
    </item>
    <item>
      <title>How 5 LLMs Built 9 Free Courses in One Afternoon: Multi-LLM Orchestration for Education</title>
      <dc:creator>Alexandre Caramaschi</dc:creator>
      <pubDate>Thu, 26 Mar 2026 20:24:53 +0000</pubDate>
      <link>https://dev.to/alexandrebrt14sys/how-5-llms-built-9-free-courses-in-one-afternoon-multi-llm-orchestration-for-education-4nl0</link>
      <guid>https://dev.to/alexandrebrt14sys/how-5-llms-built-9-free-courses-in-one-afternoon-multi-llm-orchestration-for-education-4nl0</guid>
      <description>&lt;p&gt;Last week, I published 9 free educational courses with 91 modules and approximately 19 hours of hands-on content. The total cost in AI APIs was $10.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;There is no free, integrated, Portuguese-language material that takes someone from absolute zero to mastering AI tools like Claude Code, MCP, and GEO (Generative Engine Optimization). Existing tutorials are fragmented, mostly in English, and lack practical context.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture: Multi-LLM Orchestration
&lt;/h2&gt;

&lt;p&gt;I built a Python orchestrator that coordinates 5 language models working in parallel:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Opus (Anthropic)&lt;/strong&gt; — task decomposition, architecture, and code generation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-4o (OpenAI)&lt;/strong&gt; — long-form writing and copywriting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini 2.5 Flash (Google)&lt;/strong&gt; — fast analysis and classification&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Perplexity Sonar&lt;/strong&gt; — live research with source citations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Llama 3.3 70B (Groq)&lt;/strong&gt; — ultra-fast summarization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pipeline operates in sequential waves: research, analysis, parallel writing, classification, architecture, code generation, and review.&lt;/p&gt;

&lt;p&gt;Each LLM has an adaptive score based on success rate (weight 0.6), cost (0.2), and latency (0.2). The system learns which model performs best for each task type.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;6 courses created simultaneously by 6 parallel Claude Code CLI agents&lt;/li&gt;
&lt;li&gt;6,439 lines of code in approximately 15 minutes&lt;/li&gt;
&lt;li&gt;Build verified automatically before each deploy&lt;/li&gt;
&lt;li&gt;Automatic deployment via Vercel in under 90 seconds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The 9 courses cover: VS Code, GitHub, Python, Node.js, Claude Code CLI, MCP with Chrome, Complete Setup, From SEO to GEO (with real data: 58.5% of searches are zero-click in 2025), and Technical Behind-the-Scenes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Frontend&lt;/strong&gt;: Next.js 16 + React 19 + Tailwind CSS 4&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy&lt;/strong&gt;: Vercel (auto on push to master)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Progress tracking&lt;/strong&gt;: localStorage (no database needed)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Certificates&lt;/strong&gt;: Resend API for email delivery&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Design system&lt;/strong&gt;: Salesforce-inspired (accent #0176d3, radius 8px)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FinOps and Cost Control
&lt;/h2&gt;

&lt;p&gt;The orchestrator includes built-in financial governance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Budget guards: $5 per execution limit&lt;/li&gt;
&lt;li&gt;Rate limiting per provider (token bucket algorithm)&lt;/li&gt;
&lt;li&gt;Circuit breakers for provider resilience&lt;/li&gt;
&lt;li&gt;Daily limits per provider&lt;/li&gt;
&lt;li&gt;Total cost for all content: approximately $10&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Gamification
&lt;/h2&gt;

&lt;p&gt;Each course features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Collectible badges (unique per course)&lt;/li&gt;
&lt;li&gt;Email-delivered certificates via Resend API&lt;/li&gt;
&lt;li&gt;Global cross-course progress bar&lt;/li&gt;
&lt;li&gt;CSS-only celebration animations&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Implications
&lt;/h2&gt;

&lt;p&gt;The cost of $10 to generate 19 hours of structured educational content redefines the economics of corporate education. The same process that created 9 courses could create 90. The limitation is no longer production capacity — it is curation and editorial quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Full portal: &lt;a href="https://alexandrecaramaschi.com/educacao" rel="noopener noreferrer"&gt;alexandrecaramaschi.com/educacao&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Behind-the-scenes course: &lt;a href="https://alexandrecaramaschi.com/educacao/bastidores" rel="noopener noreferrer"&gt;alexandrecaramaschi.com/educacao/bastidores&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All courses are 100% free. No paywall. No mandatory registration.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Alexandre Caramaschi — CEO at Brasil GEO | Former CMO at Semantix (Nasdaq) | Co-founder of AI Brasil&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>education</category>
      <category>llm</category>
      <category>webdev</category>
    </item>
    <item>
      <title>How We Used 5 LLM APIs and 25 AI Agents to Write a 60-Page Book in One Session</title>
      <dc:creator>Alexandre Caramaschi</dc:creator>
      <pubDate>Wed, 25 Mar 2026 22:55:38 +0000</pubDate>
      <link>https://dev.to/alexandrebrt14sys/how-we-used-5-llm-apis-and-25-ai-agents-to-write-a-60-page-book-in-one-session-39ei</link>
      <guid>https://dev.to/alexandrebrt14sys/how-we-used-5-llm-apis-and-25-ai-agents-to-write-a-60-page-book-in-one-session-39ei</guid>
      <description>&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;We wanted to produce a 60-page, 30,000-word book in Portuguese about four Brazilian fintech founders -- Augusto Lins (Stone), Andre Street (Stone/Teya), David Velez (Nubank), and Guilherme Benchimol (XP) -- told through their own reconstructed voices, narrated by Ram Charan. The book needed to feel like four real humans speaking, not like a chatbot paraphrasing Wikipedia.&lt;/p&gt;

&lt;p&gt;A single LLM call cannot do this. You get voice blending (everyone sounds the same by chapter three), factual hallucinations in biographical data, and zero structural coherence across 30k words. We needed an orchestration layer.&lt;/p&gt;

&lt;p&gt;The result: &lt;strong&gt;"5 Fundadores, 5 Segundos, 1 Futuro"&lt;/strong&gt; -- 30,329 words, 4 distinguishable voices, 8 chapters, 7 analytical notes, fact-checked against primary sources, published at &lt;a href="https://alexandrecaramaschi.com/founders" rel="noopener noreferrer"&gt;alexandrecaramaschi.com/founders&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here is what the pipeline looked like, what broke, and what we learned.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture: The 6-Engine Model
&lt;/h2&gt;

&lt;p&gt;The core insight: &lt;strong&gt;use each model for what it does best&lt;/strong&gt;, not one model for everything.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+-------------------+----------------------------------------+
|  ENGINE           |  ROLE                                  |
+-------------------+----------------------------------------+
|  Claude Opus      |  Orchestrator + narrative writing       |
|                   |  Voice personas, assembly, QA          |
|  Perplexity       |  Real-time web research                |
|  (Sonar Pro)      |  Fact-checking with verifiable sources |
|  Gemini 2.5 Pro   |  Full-manuscript coherence analysis    |
|                   |  (1M+ context window)                  |
|  ChatGPT GPT-4o   |  Creative variations: openings,        |
|                   |  titles, dialogue scenes               |
|  Groq/Llama 3.3   |  Fast rough drafts, PT-BR accent fix,  |
|                   |  rapid iteration                       |
|  Claude Sonnet    |  HTML/PDF formatting, React component, |
|                   |  Schema.org, deploy pipeline           |
+-------------------+----------------------------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why not just Claude for everything? Three reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Perplexity's web search&lt;/strong&gt; returns sources you can verify. LLMs trained on static data fabricate citations -- Perplexity anchors facts to real URLs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini's 1M+ context window&lt;/strong&gt; can read the entire manuscript in one pass and detect cross-chapter redundancies that no other model can see.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Groq's speed&lt;/strong&gt; (thousands of tokens/second) makes iteration cheap. Rough drafts that take Opus 90 seconds take Groq 3 seconds.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The Pipeline: 10 Phases, 43 Agent Calls
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PHASE 0: BOOTSTRAP (Orchestrator)
  |  Generate 5 system prompts (1 per persona)
  |  Generate 8 chapter briefs
  |  Generate global style guide
  v
PHASE 1: DEEP RESEARCH (7 agents in PARALLEL)
  |  6x Perplexity: one dossier per founder + Charan + 2026 context
  |  1x Gemini: cross-analysis of all 6 dossiers -&amp;gt; convergence map
  v
PHASE 2: WRITING WAVE 1 -- Chapters 1-4 (9 agents in PARALLEL)
  |  4x Opus: each writes ONE founder's voice for chapters 1-4
  |  1x Opus: Charan writes Preface + Prologue + Notes #1-2
  |  1x GPT-4o: 12 alternative openings + 4 epigraphs
  |  2x Groq: fast rough drafts as raw material
  |  1x Gemini: real-time coherence monitor
  v
PHASE 3: WRITING WAVE 2 -- Chapters 5-8 (9 agents in PARALLEL)
  |  Same structure as Phase 2
  |  + Charan assembles chapters 1-4 (interleaving 4 voices)
  v
PHASE 4: MANUSCRIPT ASSEMBLY (1 Opus agent -- Charan)
  |  Interleave voices, write transitions, write Epilogue
  |  -&amp;gt; manuscrito_v1.md (~48,000 words raw)
  v
PHASE 5: CROSS-MODEL REVIEW (7 agents in PARALLEL)
  |  4x Opus: each founder-persona reads FULL manuscript
  |           "Does this sound like me? Any data wrong?"
  |  1x Perplexity: fact-check every number against live web
  |  1x Gemini: structural analysis (pacing, arcs, redundancy)
  |  1x Groq: fast PT-BR accent/grammar sweep
  v
PHASE 6: INTEGRATED REWRITE (1 Opus agent)
  |  Incorporate all 7 review reports
  |  Fix 19 factual errors, remove fabricated citations
  |  Resolve redundancies, equalize founder presence
  |  -&amp;gt; manuscrito_v2.md
  v
PHASE 7: MULTI-SPECIALIST POLISH (4 agents in PARALLEL)
  |  Opus: narrative flow + chapter hooks
  |  Groq: PT-BR final accent check
  |  Sonnet: Markdown formatting + metadata
  |  GPT-4o: final title selection + back-cover copy
  v
PHASE 8: FINAL QA (1 Opus agent)
  |  Full read-through simulating first-time reader
  |  13-point checklist (voices, hooks, Charan, accents, entities)
  |  -&amp;gt; manuscrito_final.md (30,329 words)
  v
PHASE 9: PUBLISH (3 Sonnet agents in PARALLEL)
  |  HTML + PDF generation
  |  React/Next.js component for /founders
  |  SEO: Schema.org Book markup, OG tags, sitemap
  v
PHASE 10: DEPLOY
  |  Vercel deploy + IndexNow
  |  Health check: /founders returns 200
  |  DONE
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Total: &lt;strong&gt;43 agent calls across 6 APIs, with up to 9 agents running simultaneously&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quality Gates Between Phases
&lt;/h2&gt;

&lt;p&gt;Not every phase transition was automatic. We implemented quality gates -- checkpoints where the orchestrator evaluates whether output meets minimum criteria before proceeding.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GATE 1 (after Phase 1 -&amp;gt; Phase 2):
  CHECK: Each dossier has &amp;gt;= 15 verified citations with sources
  CHECK: Convergence map identifies &amp;gt;= 5 shared patterns
  CHECK: No founder dossier is &amp;lt; 3,000 words
  FAIL ACTION: Re-run Perplexity with expanded queries

GATE 2 (after Phase 2 -&amp;gt; Phase 3):
  CHECK: Voice distinctiveness score (Gemini evaluates)
  CHECK: No two founders share &amp;gt; 30% identical phrasing
  CHECK: Each founder section is within 20% of target word count
  FAIL ACTION: Re-prompt specific founder agents with
               reinforced persona instructions

GATE 3 (after Phase 5 -&amp;gt; Phase 6):
  CHECK: Zero critical factual errors remaining
  CHECK: Fabricated citation count = 0
  CHECK: Redundancy score below threshold
  FAIL ACTION: Return to Phase 5 with targeted re-checks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The gates prevented cascading errors. Without them, a weak dossier in Phase 1 would produce a weak chapter in Phase 2, which would produce a weak review in Phase 5. By catching problems early, we avoided expensive rewrites downstream.&lt;/p&gt;




&lt;h2&gt;
  
  
  The System Prompt Architecture
&lt;/h2&gt;

&lt;p&gt;Each persona's system prompt was not a simple instruction -- it was a layered document with five components:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LAYER 1: IDENTITY
  Who you are, your archetype, your emotional core

LAYER 2: VOICE RULES
  Sentence length distribution, vocabulary whitelist,
  vocabulary blacklist, rhetorical patterns

LAYER 3: ANTI-CONTAMINATION
  "You are NOT [other founder]. If you find yourself
   using [specific phrases], stop and rewrite."

LAYER 4: CHAPTER BRIEF
  What this specific chapter is about, what angle
  this founder brings, what tension to explore

LAYER 5: CONTEXT INJECTION
  Research dossier, convergence map, previous chapters
  (for Wave 2), coherence report
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The anti-contamination layer (Layer 3) was crucial. Without it, Augusto and Guilherme's voices converged within three chapters. With it, convergence was reduced but not eliminated -- which is why we still needed the cross-voice review in Phase 5.&lt;/p&gt;




&lt;h2&gt;
  
  
  Voice Persona Engineering
&lt;/h2&gt;

&lt;p&gt;Each founder got a dedicated system prompt with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PERSONA: Augusto Lins
ARCHETYPE: The Engineer Who Became a Humanist
VOICE: Measured, deep, quiet authority. Longer sentences.
VOCABULARY: "five seconds", "loyalty moat", "the Angels",
            "the most complex component is the human being"
THEMES: Obsessive service, late-career leap, NPS as compass
TENSION: The engineer who discovered the differentiator is not technology
FORBIDDEN: Never sound aggressive. Never use war metaphors.
           That is Andre's register, not yours.
MODEL: Claude Opus
CONTEXT: Full research dossier + ebook "5 Seconds for the Future"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four personas, four distinct registers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Founder&lt;/th&gt;
&lt;th&gt;Voice Signature&lt;/th&gt;
&lt;th&gt;Key Markers&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Augusto Lins&lt;/td&gt;
&lt;td&gt;Measured, reflective&lt;/td&gt;
&lt;td&gt;Engineering metaphors, domestic imagery, NPS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Andre Street&lt;/td&gt;
&lt;td&gt;Aggressive, percussive&lt;/td&gt;
&lt;td&gt;Short sentences, war language, "fire your ego"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;David Velez&lt;/td&gt;
&lt;td&gt;Analytical, contained&lt;/td&gt;
&lt;td&gt;VC vocabulary, "infinite game", strategic distance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Guilherme Benchimol&lt;/td&gt;
&lt;td&gt;Vulnerable, confessional&lt;/td&gt;
&lt;td&gt;Marathon metaphors, admission of pain/shame&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The QA report confirmed all four voices were distinguishable without reading the founder's name -- which was our acceptance criterion.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Fact-Checking Pipeline
&lt;/h2&gt;

&lt;p&gt;This was the most sobering part of the project.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Perplexity found
&lt;/h3&gt;

&lt;p&gt;The fact-checker verified &lt;strong&gt;87 items&lt;/strong&gt; across the manuscript and found &lt;strong&gt;19 errors&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;7 critical&lt;/strong&gt; (wrong data that would embarrass the author)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;8 moderate&lt;/strong&gt; (imprecise data that could mislead)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;4 minor&lt;/strong&gt; (missing context, not wrong)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5 fabricated citations
&lt;/h3&gt;

&lt;p&gt;The most dangerous failure mode: LLMs fabricate convincing quotes and attribute them to real people.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FABRICATED CITATION #1:
  Text: "Give me thirty days. If you're not satisfied,
        I'll come here personally to pick up the machine."
  Attribution: Augusto Lins (at a bakery in Copacabana)
  Status: NOT VERIFIED. The bakery scene does not appear
          in any research dossier. Likely LLM fabrication.

FABRICATED CITATION #2:
  Text: "These people aren't asking for a credit card.
        They're asking to be treated like human beings."
  Attribution: Cristina Junqueira (Nubank co-founder)
  Status: NOT VERIFIED. Not in any dossier. Probably
          fabricated as "narrative reconstruction."

FABRICATED CITATION #5:
  Entire scene: "shopkeeper in rural Minas Gerais"
  (sick wife, 20 minutes on the line, microcredit)
  Status: NOT IN ANY DOSSIER. Fabricated anecdote.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pattern: LLMs generate "too perfect" anecdotes that fit the narrative thesis exactly. They feel real because they are structurally plausible -- but they have no source.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson: every quote attributed to a real person must be cross-referenced against primary sources. LLMs cannot be trusted with attribution.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The David Velez education error
&lt;/h3&gt;

&lt;p&gt;One critical factual error: the manuscript stated Velez graduated from "Universidad de los Andes" in Colombia. The research dossier shows his undergraduate degree was from &lt;strong&gt;Stanford&lt;/strong&gt; (Management Science and Engineering, class of 2005). This is the kind of error that destroys credibility -- and it passed through multiple writing agents before the fact-checker caught it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Redundancy Problem
&lt;/h2&gt;

&lt;p&gt;This was the &lt;strong&gt;hardest engineering challenge&lt;/strong&gt; -- harder than voice distinction, harder than fact-checking.&lt;/p&gt;

&lt;h3&gt;
  
  
  What happens when 4 agents write independently
&lt;/h3&gt;

&lt;p&gt;Four Opus instances, each writing as a different founder about the same themes, produce &lt;strong&gt;remarkably similar strong points&lt;/strong&gt;. The structural analysis (run by Gemini on the full manuscript) found:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;REDUNDANCY REPORT (selected):

"Fire your ego every morning" (Andre Street)
  -&amp;gt; Appears in: Ch.3, Ch.4, Ch.6, Ch.8
  -&amp;gt; Verdict: EXCESSIVE -- 4 occurrences

"Educate before you sell" (Guilherme Benchimol)
  -&amp;gt; Appears in: Ch.2, Ch.3, Ch.5, Ch.8
  -&amp;gt; Verdict: EXCESSIVE -- 4 occurrences

Angel traveling 50km at night to deliver a card machine:
  -&amp;gt; Appears in: Ch.3 AND Ch.5 with nearly identical details
  -&amp;gt; Verdict: DUPLICATE -- keep in Ch.3 only

Medellin kidnapping + shopping mall bomb (David Velez):
  -&amp;gt; Appears in: Prologue, Ch.1, Ch.6
  -&amp;gt; Verdict: 3 occurrences -- reduce to 2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why this happens
&lt;/h3&gt;

&lt;p&gt;Each agent receives the same chapter brief and dossier. The strongest anecdotes -- the ones with the most narrative power -- get selected by every agent independently. The redundancy is not a bug in any single agent; it is an emergent property of parallel writing.&lt;/p&gt;

&lt;h3&gt;
  
  
  The fix
&lt;/h3&gt;

&lt;p&gt;We implemented a &lt;strong&gt;redundancy budget&lt;/strong&gt;: each catchphrase gets a maximum of 2 appearances in the book (first occurrence as revelation, second as deliberate callback). The third and fourth occurrences were cut or paraphrased during Phase 6.&lt;/p&gt;

&lt;p&gt;The broader principle: &lt;strong&gt;multi-agent writing requires a deduplication pass that no single agent can do alone&lt;/strong&gt;. Gemini's 1M+ context window was essential here -- it could read the entire manuscript and identify cross-chapter repetitions that individual agents, writing in isolation, could never see.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Voice Confusion Problem
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Chapters where two founders became indistinguishable
&lt;/h3&gt;

&lt;p&gt;The structural analysis flagged Chapters 3 and 5 as problem zones. In these chapters, Augusto Lins and Guilherme Benchimol's voices converged -- both reflective, both talking about customer service, both using similar vocabulary.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;VOICE ANALYSIS:

Augusto: Partially distinguishable
  Markers: engineer vocabulary, domestic imagery, longer sentences
  PROBLEM: In Ch.3 and Ch.5, sounds too much like Guilherme

Guilherme: Partially distinguishable
  Markers: marathon metaphors, confession of shame, financial refs
  PROBLEM: In Ch.3 and Ch.5, sounds too much like Augusto

Andre: Clearly distinguishable (always)
David: Clearly distinguishable (always)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fix: intensify each persona's unique markers. Augusto gets more engineering language and NPS references. Guilherme gets more marathon/running metaphors and admissions of vulnerability. The rewrite in Phase 6 sharpened these distinctions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson: voice persona prompts are necessary but not sufficient. You need a cross-voice review pass where each persona reads the other three and flags convergence.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Accent Pipeline Bug
&lt;/h2&gt;

&lt;p&gt;One assembly agent (responsible for merging four voices into interleaved chapters) &lt;strong&gt;dropped all Portuguese diacritical marks&lt;/strong&gt; from the output. "Producao" instead of "producao" (which should be "producao" -- wait, that is the point: "producao" vs "produção"). The entire Part 1 manuscript came out accent-free.&lt;/p&gt;

&lt;p&gt;The fix was trivial (run &lt;code&gt;fix_accents.py&lt;/code&gt;), but the root cause was interesting: the assembly agent was processing so much text that its output quality degraded on surface-level features (accents, em-dashes) even as the narrative content remained good.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson: always run a dedicated accent/encoding check as a separate pipeline step, not as part of the writing agent's responsibilities.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The final QA report confirmed: &lt;strong&gt;zero words without proper PT-BR accents&lt;/strong&gt; in the published manuscript.&lt;/p&gt;




&lt;h2&gt;
  
  
  Chapter 7: The "Everyone Agrees" Problem
&lt;/h2&gt;

&lt;p&gt;The structural analysis flagged Chapter 7 (about AI) as lacking narrative tension:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Chapter 7 (AI): MEDIUM intensity
  Content relevant, but tone more essayistic than narrative.

  PROBLEM: All four founders say essentially the same thing:
  "AI is a tool, not a replacement." No tension, no disagreement,
  no risk. The chapter needs a moment of doubt or real failure.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When four agents are told "write what this founder thinks about AI," and all four founders are publicly optimistic about AI, you get four versions of the same optimistic take. The emergent pattern: &lt;strong&gt;multi-agent systems amplify consensus and suppress dissent&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The fix: we manually introduced a moment of doubt -- a concrete failure anecdote -- to create the tension the agents could not generate on their own.&lt;/p&gt;




&lt;h2&gt;
  
  
  The "Street Always Delivers First" Pattern
&lt;/h2&gt;

&lt;p&gt;An unexpected observation from the pipeline: Andre Street's persona consistently produced output faster and with more energy than the other three. His system prompt specified "aggressive, percussive, short sentences, urgency" -- and the writing agent internalized this as raw speed.&lt;/p&gt;

&lt;p&gt;The agents writing Augusto (measured, reflective) and David (analytical, strategic) produced longer, more deliberate text. Guilherme's agent produced the most emotionally charged text but took the longest to reach the word count.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The persona's urgency mapped to the agent's behavior.&lt;/strong&gt; We did not design this. The writing model (Opus) treated the persona's emotional register as an instruction about pacing. This has implications for agent design: persona engineering affects not just output quality but output characteristics like length, density, and generation speed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Final word count&lt;/td&gt;
&lt;td&gt;30,329&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total agent calls&lt;/td&gt;
&lt;td&gt;43&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;APIs used&lt;/td&gt;
&lt;td&gt;5 (Claude Opus, Perplexity, Gemini, GPT-4o, Groq/Llama)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max parallel agents&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pipeline phases&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Factual errors caught&lt;/td&gt;
&lt;td&gt;19 (7 critical, 8 moderate, 4 minor)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fabricated citations caught&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Duplicate anecdotes removed&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voice confusion zones fixed&lt;/td&gt;
&lt;td&gt;2 chapters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Accent bug: words without diacriticals&lt;/td&gt;
&lt;td&gt;0 (after fix)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total API cost&lt;/td&gt;
&lt;td&gt;Under $10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Published at&lt;/td&gt;
&lt;td&gt;&lt;a href="https://alexandrecaramaschi.com/founders" rel="noopener noreferrer"&gt;alexandrecaramaschi.com/founders&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The estimated cost from the orchestration plan was $110-165 for the full 48,000-word target. The actual book came in at 30,329 words (we cut aggressively for quality), and the actual API spend was under $10.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Redundancy is the primary failure mode of parallel multi-agent writing
&lt;/h3&gt;

&lt;p&gt;Not hallucination, not voice confusion -- redundancy. When N agents write about the same topic independently, they converge on the same strong points. You need a deduplication pass with a model that can see the entire manuscript at once.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Fact-checking must be a separate agent with web access
&lt;/h3&gt;

&lt;p&gt;LLMs hallucinate citations with high confidence. Perplexity's web-grounded search was the only reliable way to verify quotes and data points. 5 fabricated citations in 30,000 words is a 0.016% rate -- small in percentage, catastrophic in credibility.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Voice personas need cross-validation, not just prompts
&lt;/h3&gt;

&lt;p&gt;System prompts create initial voice distinction. But over 30,000 words, voices drift toward the mean. The fix is a review pass where each persona reads the full manuscript and flags where it sounds like another founder.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Use each model for its strength
&lt;/h3&gt;

&lt;p&gt;Opus for narrative depth. Perplexity for verified facts. Gemini for manuscript-level coherence. Groq for speed. GPT-4o for creative variations. Sonnet for code and formatting. No single model excels at all of these.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Multi-agent systems amplify consensus
&lt;/h3&gt;

&lt;p&gt;If all sources agree, all agents will agree, and the output will lack tension. Editorial judgment -- the decision to introduce conflict where the data shows none -- remains a human responsibility.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Persona urgency maps to agent behavior
&lt;/h3&gt;

&lt;p&gt;An aggressive, urgent persona prompt produces faster, shorter output. A reflective, measured persona prompt produces slower, longer output. This is not documented anywhere -- it is emergent behavior worth designing for.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Surface-level quality degrades under load
&lt;/h3&gt;

&lt;p&gt;An agent handling complex narrative assembly may drop accents, formatting, or em-dashes. Always run dedicated quality passes for surface features as separate pipeline steps.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. The cost is negligible; the architecture is everything
&lt;/h3&gt;

&lt;p&gt;Under $10 in API calls for a 30,000-word, fact-checked, multi-voice book. The engineering cost is in the orchestration design, not the API spend.&lt;/p&gt;




&lt;h2&gt;
  
  
  The FinOps Perspective
&lt;/h2&gt;

&lt;p&gt;The original orchestration plan estimated $110-165 for the full 48,000-word target across 43 agent calls. Here is the breakdown by API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;API                    Calls  Est. Tokens   Est. Cost
----------------------------------------------------
Claude Opus              19    ~1,500,000   $80-120
Perplexity Sonar Pro      7      ~350,000   $8-12
Gemini 2.5 Pro            4      ~800,000   $10-15
ChatGPT GPT-4o            3      ~200,000   $3-5
Groq Llama 3.3 70B        6      ~600,000   $1-2
Claude Sonnet              4      ~400,000   $8-10
----------------------------------------------------
TOTAL                    43    ~3,850,000   $110-165
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The actual spend came in under $10. Why the 10x difference?&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Aggressive editing cut 18,000 words.&lt;/strong&gt; The manuscript went from a 48,000-word target to 30,329 published words. Less text = fewer generation tokens.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Groq is nearly free.&lt;/strong&gt; At $0.59/M input tokens, the 6 Groq calls cost pennies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini's free tier covered our usage.&lt;/strong&gt; The 4 Gemini calls fit within Google's generous free allocation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;We reused outputs aggressively.&lt;/strong&gt; Dossiers from Phase 1 were passed to every subsequent phase without regeneration.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The cost per word of the final manuscript: approximately $0.0003. For context, a human ghostwriter charges $0.50-$2.00 per word for this type of work.&lt;/p&gt;




&lt;h2&gt;
  
  
  What We Would Do Differently
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Anti-redundancy briefs&lt;/strong&gt;: give each agent a list of anecdotes already claimed by other agents, updated in real-time as they write.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adversarial voice testing&lt;/strong&gt;: before the full pipeline, run a blind test where a reviewer tries to identify which founder is speaking from unmarked excerpts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tension injection&lt;/strong&gt;: explicitly assign one agent the role of "dissenter" -- someone whose job is to find disagreements and introduce doubt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streaming coherence monitor&lt;/strong&gt;: instead of checking coherence after each wave, stream outputs to Gemini in real-time and get incremental feedback.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Stack Reference
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Orchestrator&lt;/strong&gt;: geo-orchestrator (custom multi-model pipeline)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Primary writing&lt;/strong&gt;: Claude Opus 4.6 (Anthropic)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Research + fact-check&lt;/strong&gt;: Perplexity Sonar Pro&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coherence analysis&lt;/strong&gt;: Gemini 2.5 Pro (Google)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Creative variations&lt;/strong&gt;: ChatGPT GPT-4o (OpenAI)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fast iteration&lt;/strong&gt;: Groq (Llama 3.3 70B)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Formatting + deploy&lt;/strong&gt;: Claude Sonnet (Anthropic)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frontend&lt;/strong&gt;: Next.js 16 + React 19 + Tailwind 4&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hosting&lt;/strong&gt;: Vercel&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Published&lt;/strong&gt;: &lt;a href="https://alexandrecaramaschi.com/founders" rel="noopener noreferrer"&gt;alexandrecaramaschi.com/founders&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Alexandre Caramaschi is CEO of Brasil GEO, former CMO of Semantix (Nasdaq), and co-founder of AI Brasil. This article documents the technical pipeline behind "5 Fundadores, 5 Segundos, 1 Futuro," a multi-agent editorial production experiment.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>agents</category>
      <category>architecture</category>
    </item>
    <item>
      <title>The Invisible Brand Paradox: How Companies With Great Products Disappear in AI Search</title>
      <dc:creator>Alexandre Caramaschi</dc:creator>
      <pubDate>Tue, 24 Mar 2026 00:50:49 +0000</pubDate>
      <link>https://dev.to/alexandrebrt14sys/the-invisible-brand-paradox-how-companies-with-great-products-disappear-in-ai-search-307c</link>
      <guid>https://dev.to/alexandrebrt14sys/the-invisible-brand-paradox-how-companies-with-great-products-disappear-in-ai-search-307c</guid>
      <description>&lt;p&gt;There is a new kind of corporate crisis emerging -- one that does not show up in quarterly reports until it is too late. Companies with excellent products, strong customer satisfaction, and healthy revenue are discovering that they simply do not exist in the eyes of AI.&lt;/p&gt;

&lt;p&gt;Ask ChatGPT, Gemini, or Perplexity about their market category, and they are nowhere in the response. Their competitors -- sometimes with inferior products -- are cited, recommended, and explained in detail. The invisible company has better NPS scores, better retention rates, and better technology. But the AI does not know that.&lt;/p&gt;

&lt;p&gt;This is the Invisible Brand Paradox.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Paradox Defined
&lt;/h2&gt;

&lt;p&gt;The Invisible Brand Paradox occurs when a company with demonstrably strong products or services has zero or near-zero visibility in AI-generated responses. The paradox is that traditional success metrics -- revenue growth, customer satisfaction, market share -- do not correlate with AI visibility. A company can be the market leader by every conventional measure and still be completely absent from AI search results.&lt;/p&gt;

&lt;p&gt;This matters because AI-mediated discovery is rapidly becoming the primary channel through which buyers find solutions. According to Gartner's 2025 research, over 70% of B2B technology buyers use AI assistants during their purchasing research. If your brand is invisible to these AI systems, you are invisible to a growing majority of your potential customers.&lt;/p&gt;

&lt;p&gt;The paradox is particularly cruel because the companies most likely to suffer from it are often the ones most confident they do not need to worry. They have strong brands -- in the human sense. They rank well on Google. They win industry awards. But none of these achievements translate automatically into AI visibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  Five Reasons AI Engines Ignore Good Brands
&lt;/h2&gt;

&lt;p&gt;After conducting over 50 entity audits across industries, we have identified five root causes that explain why strong brands become invisible in AI search.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reason 1: No Structured Data
&lt;/h3&gt;

&lt;p&gt;AI models process structured data orders of magnitude more efficiently than unstructured prose. A company that presents its expertise, offerings, and authority exclusively through marketing copy on web pages is making it extremely difficult for AI to understand what it does.&lt;/p&gt;

&lt;p&gt;Structured data includes Schema.org markup (Organization, Product, Service, Person schemas), JSON-LD, OpenAPI specifications for any APIs, and the emerging llms.txt standard -- a file specifically designed to help AI systems understand your organization.&lt;/p&gt;

&lt;p&gt;The absence of structured data is the single most common cause of AI invisibility. It is also the easiest to fix. Yet most companies, even technically sophisticated ones, have incomplete or absent Schema.org markup. Their websites look beautiful to humans but are semantically opaque to machines.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reason 2: Entity Fragmentation
&lt;/h3&gt;

&lt;p&gt;AI models build internal representations of entities -- companies, people, products, concepts. These representations are constructed by aggregating information from multiple sources. When the information is inconsistent, the model's entity representation becomes fragmented or ambiguous.&lt;/p&gt;

&lt;p&gt;Entity fragmentation occurs when your company name is rendered differently across platforms (Acme Corp on LinkedIn, ACME Corporation on Crunchbase, Acme on your website). When your founding date differs between sources. When your CEO's title is listed differently. When your product descriptions vary.&lt;/p&gt;

&lt;p&gt;Each inconsistency does not just create confusion -- it dilutes the strength of your entity signal. AI models that encounter ambiguous entities handle them by reducing confidence, which means reducing citation frequency. In practice, this means your brand gets mentioned less often or not at all.&lt;/p&gt;

&lt;p&gt;I have seen companies where three different founding years appeared across their web properties and directories. The AI model, unable to determine which was correct, simply avoided mentioning the company's history -- and by extension, reduced its overall authority signal.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reason 3: No Information Gain
&lt;/h3&gt;

&lt;p&gt;Information gain is a concept from information theory that, in the GEO context, refers to whether your content provides knowledge that cannot be found elsewhere. AI models are trained on vast corpora. If your content merely restates what dozens of other sources already say, it provides zero incremental value to the model.&lt;/p&gt;

&lt;p&gt;Content with high information gain includes: original research with proprietary data, novel frameworks or methodologies, unique case studies with specific metrics, contrarian perspectives backed by evidence, and first-person expert analysis that synthesizes experience into actionable insight.&lt;/p&gt;

&lt;p&gt;Content with zero information gain includes: generic industry overviews, rephrased competitor content, listicles compiled from other listicles, and thought leadership that leads no thoughts.&lt;/p&gt;

&lt;p&gt;The irony is that many companies invest heavily in content marketing but produce exclusively low-information-gain content. They publish three blog posts per week, each one a variation of what every other company in their space is publishing. The volume is impressive. The AI impact is zero.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reason 4: No External Authority
&lt;/h3&gt;

&lt;p&gt;AI models do not just assess your content -- they assess what others say about you. External authority includes mentions in recognized publications, citations in academic or industry research, entries in authoritative directories (Wikipedia, Wikidata, Crunchbase), backlinks from high-authority domains, and consistent presence in industry analyst reports.&lt;/p&gt;

&lt;p&gt;A company that exists only on its own website and social media profiles has weak external authority. The AI model has only the company's self-description to work with, and self-descriptions are inherently less trustworthy than third-party validation.&lt;/p&gt;

&lt;p&gt;Building external authority is the new link building. But instead of optimizing for PageRank, you are optimizing for what we call Entity Authority -- the density and consistency of third-party references that confirm your expertise, existence, and relevance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reason 5: No Freshness Signals
&lt;/h3&gt;

&lt;p&gt;AI models increasingly incorporate recency as a ranking factor, especially for topics that evolve rapidly (which includes most technology and business categories). A company whose most recent blog post is from 2024, whose press releases stopped in 2023, and whose social media has been dormant for months sends a clear signal: this entity may no longer be active or relevant.&lt;/p&gt;

&lt;p&gt;Freshness does not mean publishing daily. It means maintaining a consistent cadence of new, substantive content that signals ongoing expertise and activity. Companies that publish one genuinely original piece per month outperform those that published 100 derivative pieces two years ago.&lt;/p&gt;

&lt;h2&gt;
  
  
  Case Study Framework: A Step-by-Step Audit Process
&lt;/h2&gt;

&lt;p&gt;To diagnose whether your company suffers from the Invisible Brand Paradox, we recommend a systematic audit process:&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 1: AI Visibility Baseline (Day 1-3)
&lt;/h3&gt;

&lt;p&gt;Query the five major AI platforms (ChatGPT, Gemini, Perplexity, Copilot, Claude) with 20 questions that your target customers would ask. Document:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Does your brand appear in any response?&lt;/li&gt;
&lt;li&gt;When it appears, is the information accurate?&lt;/li&gt;
&lt;li&gt;Which competitors appear instead?&lt;/li&gt;
&lt;li&gt;What specific claims do the AI models make about your category?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Phase 2: Entity Consistency Scan (Day 4-7)
&lt;/h3&gt;

&lt;p&gt;Catalog every platform and directory where your company appears. For each, document: company name (exact rendering), description, founding date, leadership names and titles, key metrics, and product/service descriptions. Flag every inconsistency. Quantify the fragmentation score: number of inconsistencies divided by total data points checked.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 3: Content Information Gain Assessment (Day 8-14)
&lt;/h3&gt;

&lt;p&gt;Review your 20 most recent published pieces. For each, answer: Does this contain any data, framework, or insight that cannot be found elsewhere? If the answer is no for more than 70% of your content, you have an information gain deficit.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 4: Structured Data Audit (Day 15-18)
&lt;/h3&gt;

&lt;p&gt;Run your website through Schema.org validators. Check for: Organization schema, Person schemas for leadership, Product/Service schemas, Article schemas for blog content, FAQ schemas where applicable. Check whether you have an llms.txt file. Test your APIs (if any) for documentation quality.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 5: External Authority Map (Day 19-21)
&lt;/h3&gt;

&lt;p&gt;Document all third-party sources that mention your brand. Categorize by authority level: Tier 1 (Wikipedia, major publications, academic citations), Tier 2 (industry directories, analyst reports), Tier 3 (blogs, minor publications). Calculate your authority density: Tier 1 mentions divided by total mentions.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 90-Day Turnaround: From Invisible to Cited
&lt;/h2&gt;

&lt;p&gt;Based on our experience with entity remediation across multiple industries, a focused 90-day program can move a company from AI invisibility to consistent citation. Here is the framework:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Days 1-30: Foundation&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fix all entity inconsistencies across platforms&lt;/li&gt;
&lt;li&gt;Implement complete Schema.org markup&lt;/li&gt;
&lt;li&gt;Deploy llms.txt file&lt;/li&gt;
&lt;li&gt;Publish 4 high-information-gain pieces (one per week)&lt;/li&gt;
&lt;li&gt;Submit Wikidata entry if not present&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Days 31-60: Authority&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Secure 3-5 mentions in recognized industry publications&lt;/li&gt;
&lt;li&gt;Publish original research with proprietary data&lt;/li&gt;
&lt;li&gt;Update all directory listings for consistency&lt;/li&gt;
&lt;li&gt;Begin structured outreach to analysts and journalists&lt;/li&gt;
&lt;li&gt;Create comprehensive FAQ content addressing every question your customers ask&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Days 61-90: Amplification&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Publish contrarian thought leadership backed by your proprietary data&lt;/li&gt;
&lt;li&gt;Ensure all new content has maximum structured data markup&lt;/li&gt;
&lt;li&gt;Monitor AI citations weekly and adjust strategy&lt;/li&gt;
&lt;li&gt;Build programmatic accessibility (API documentation, structured catalogs)&lt;/li&gt;
&lt;li&gt;Conduct second AI visibility audit to measure progress&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Companies that execute this program consistently see a 40-60% improvement in AI citation frequency within the 90-day window. The improvement compounds: once AI models begin citing you, each new piece of authoritative content reinforces the citation pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Urgency
&lt;/h2&gt;

&lt;p&gt;The Invisible Brand Paradox is solvable -- but the window for easy solutions is closing. As AI-mediated discovery becomes the default, the companies that establish entity authority early will enjoy compounding advantages. The models learn patterns: once they associate your brand with authoritative answers in your category, that association becomes self-reinforcing.&lt;/p&gt;

&lt;p&gt;Conversely, companies that remain invisible face a compounding disadvantage. Every month of AI invisibility is a month of training data where competitors are cited and you are not. The longer you wait, the deeper the deficit.&lt;/p&gt;

&lt;p&gt;You may have the best product. You may have the happiest customers. But if the AI does not know you exist, none of that matters to the buyers who are asking the AI what to buy.&lt;/p&gt;

&lt;p&gt;The invisible brand does not lose a competition. It never enters one.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://brasilgeo.ai/v1/auditoria-entidade-digital" rel="noopener noreferrer"&gt;Digital Entity Audit: A Complete Guide&lt;/a&gt; -- Brasil GEO&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://brasilgeo.ai/v1/divida-dados" rel="noopener noreferrer"&gt;Data Debt: The Hidden Cost of Inconsistent Information&lt;/a&gt; -- Brasil GEO&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://caramaschi.hashnode.dev/entity-consistency-why-it-matters-for-ai-visibility" rel="noopener noreferrer"&gt;Entity Consistency: Why It Matters for AI Visibility&lt;/a&gt; -- Hashnode&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://medium.com/@alexandrecaramaschi/digital-entity-audit-the-complete-process" rel="noopener noreferrer"&gt;Digital Entity Audit: The Complete Process&lt;/a&gt; -- Medium&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Alexandre Caramaschi is CEO of Brasil GEO (brasilgeo.ai), the first Brazilian GEO consultancy. Former CMO at Semantix (Nasdaq), co-founder of AI Brasil. More at alexandrecaramaschi.com&lt;/em&gt;&lt;/p&gt;

</description>
      <category>geo</category>
      <category>ai</category>
      <category>branding</category>
      <category>marketing</category>
    </item>
  </channel>
</rss>
