<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Code Pocket</title>
    <description>The latest articles on DEV Community by Code Pocket (@code_pocket_99fdbc771).</description>
    <link>https://dev.to/code_pocket_99fdbc771</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3926038%2F28917d79-dfc7-4fc8-ab68-22c385fdccde.png</url>
      <title>DEV Community: Code Pocket</title>
      <link>https://dev.to/code_pocket_99fdbc771</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/code_pocket_99fdbc771"/>
    <language>en</language>
    <item>
      <title>Tracking podcast transcripts through 4 AI engines over 6 months</title>
      <dc:creator>Code Pocket</dc:creator>
      <pubDate>Tue, 12 May 2026 02:46:46 +0000</pubDate>
      <link>https://dev.to/code_pocket_99fdbc771/tracking-podcast-transcripts-through-4-ai-engines-over-6-months-14cb</link>
      <guid>https://dev.to/code_pocket_99fdbc771/tracking-podcast-transcripts-through-4-ai-engines-over-6-months-14cb</guid>
      <description>&lt;p&gt;The idea of using podcast transcripts as a GEO asset is older than GEO itself; transcripts have always been an SEO play. What's new, or newer, is whether transcripts function as a meaningful citation source for AI engines specifically. Over the last six months we've been quietly running a side experiment on this with a handful of clients, and the results have been split enough that I want to write them up before I forget the texture.&lt;/p&gt;

&lt;p&gt;The short version: transcripts work, sometimes, and the conditions under which they work are narrower than the marketing copy on transcript-as-a-service tools suggests.&lt;/p&gt;

&lt;h3&gt;
  
  
  The setup
&lt;/h3&gt;

&lt;p&gt;Three clients in our 12-client portfolio had podcasts of their own (founder-led, weekly to bi-weekly, established for at least 18 months pre-experiment). For each, we did the following over a six-month window starting in Q4 2025:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cleaned the transcripts (timestamps, speaker labels, punctuation, paragraph breaks) into a format we'd judge readable as a standalone article.&lt;/li&gt;
&lt;li&gt;Added introductory framing — a one-paragraph summary of each episode's topic and the named entities involved, written by us.&lt;/li&gt;
&lt;li&gt;Published the cleaned transcripts on each client's own domain under a transcripts subfolder.&lt;/li&gt;
&lt;li&gt;Added speaker-level entity markup where appropriate.&lt;/li&gt;
&lt;li&gt;Did not republish on third-party platforms, partly to keep the experiment scoped, partly because of canonical concerns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We then tracked citation appearances of the transcript URLs across our four-engine test set over the following months.&lt;/p&gt;

&lt;h3&gt;
  
  
  What happened
&lt;/h3&gt;

&lt;p&gt;Across roughly 70 episodes covered in the experiment, transcript URLs appeared in citation rails on maybe 11% of queries where they were topically relevant. That's a hard number to compare cleanly because we didn't have a control set of comparable non-transcript content for the same clients in the same topics. It's directionally interesting, not statistically clean.&lt;/p&gt;

&lt;p&gt;The citations clustered heavily in two engines: Perplexity and Gemini. Both seemed willing to surface transcripts as primary sources for queries about specific people (the podcast guests) or specific phrases that appeared in the transcripts. ChatGPT (web on) cited transcripts much less often, and Google AIO almost never, in our test set.&lt;/p&gt;

&lt;p&gt;The pattern that seemed to predict whether a transcript got cited was, roughly: did the episode include a named expert making a specific, quotable claim that the AI engine could attribute? Episodes that were two co-hosts having a meandering conversation almost never got cited regardless of topic quality. Episodes with a guest making a clear, paraphrasable point cited well.&lt;/p&gt;

&lt;h3&gt;
  
  
  One thing that didn't work
&lt;/h3&gt;

&lt;p&gt;We tried generating "episode summaries" that pulled key claims out of each episode and listed them with bullet points and named-entity links. The hypothesis was that this would give engines an easier path to citing specific claims. It backfired modestly: in two of the three clients, the summaries themselves started getting cited instead of the transcripts they summarized. The transcripts dropped in surface rate; the summaries rose. The total citation rate per episode didn't change much; we'd just shifted which URL got picked.&lt;/p&gt;

&lt;p&gt;This is fine if you don't care which URL gets picked. It's less fine if your goal was to drive traffic to the transcript page specifically (which has the audio embed and the SEO history). We've since gone back to lighter framing paragraphs without bullet-point summaries on most clients.&lt;/p&gt;

&lt;h3&gt;
  
  
  The transcript-quality threshold
&lt;/h3&gt;

&lt;p&gt;Raw automated transcripts (the kind that come out of most podcast hosting platforms) didn't perform as well as cleaned transcripts. We don't have a clean A/B on this, but we have one client where we tested both formats on different episodes in the same series, and the cleaned versions cited at roughly twice the rate of the raw versions over the test window.&lt;/p&gt;

&lt;p&gt;The cleaning isn't elaborate. Punctuation, speaker labels, paragraph breaks, light copy-edit for filler words ("um," "you know," repeated phrases). Maybe 60-90 minutes per hour of audio when done by a person who knows the show. AI-assisted cleaning works for the mechanical parts but doesn't reliably catch where a paragraph break belongs based on conversational rhythm.&lt;/p&gt;

&lt;p&gt;I don't know whether the citation lift from cleaning is about engine parsing or about human-readable content reading better to whatever automated readability heuristic the engines use. Both are plausible. The agency I work with has defaulted to cleaning transcripts when the underlying podcast has enough audience to justify the cost. For shows below maybe 1,000 listens per episode the math gets harder.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why I'm holding the claim loosely
&lt;/h3&gt;

&lt;p&gt;Three clients with their own podcasts is not a sample size that supports strong claims. The 11% citation rate is the kind of number that could be a function of the specific topics those clients work in, or the specific guests they had on, or the freshness of the transcripts hitting Perplexity at the right moment.&lt;/p&gt;

&lt;p&gt;I'd want to see this tested across 15-20 clients with podcasts in different verticals before I'd recommend the strategy generally. As a thing to try in a portfolio where the audio content already exists, the cost-to-test ratio is decent. As a reason to start a podcast solely for GEO, I don't think the data supports that yet.&lt;/p&gt;

&lt;h3&gt;
  
  
  A surprise: chapter markers seemed to help
&lt;/h3&gt;

&lt;p&gt;One thing we'd added almost as an afterthought turned out to matter. We embedded chapter markers (with timestamps and short titles) at logical breakpoints in each transcript page. These were primarily for human readability and accessibility. Two of the three clients showed improved citation surfacing on the chapters with the most descriptive titles, where the engines appeared to use the chapter title as a hook for the surrounding text.&lt;/p&gt;

&lt;p&gt;We don't know whether this was the chapters specifically or whether the act of breaking a long transcript into labeled sections improved general parseability. Either way, the cost of adding chapter markers is small (15-30 minutes per episode after a transcript is cleaned) and the apparent return was non-trivial in our sample.&lt;/p&gt;

&lt;p&gt;A caution: I've seen agencies start to recommend "AI-friendly chapter markers" as a productized service, and I want to be careful about that framing here. Two clients, n equals a handful of measurable lifts, is interesting. It's not a service offering. If you try it on your own content and it works for you, please share what you find.&lt;/p&gt;

&lt;h3&gt;
  
  
  The unanswered question of canonical confusion
&lt;/h3&gt;

&lt;p&gt;One reason we kept the experiment scoped to first-party hosting (not syndicating transcripts to third-party platforms) is that we weren't confident about how engines handle canonical questions for the same content appearing in multiple places. If a transcript is on the podcast's site, on a third-party transcript service, on YouTube auto-captions, and on a guest's personal blog, which one gets cited?&lt;/p&gt;

&lt;p&gt;Anecdotally, we've seen engines pick non-canonical sources surprisingly often. The "official" hosted transcript on the client's domain isn't always the citation winner; sometimes a YouTube auto-caption page or a third-party transcript site shows up instead. We don't have a clean explanation for when this happens. The hypothesis is that older domains or domains with higher topical authority can win citations for content that semantically lives elsewhere.&lt;/p&gt;

&lt;p&gt;This complicates the strategic question. If your transcripts are going to get cited but the citations are going to a third-party site you don't control, the GEO win for your brand is partial at best. We've been considering a follow-up experiment that explicitly publishes the same transcript to multiple destinations and tracks where the citations land. It's on the roadmap. It's not done.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I'd ask before doing this
&lt;/h3&gt;

&lt;p&gt;Three questions:&lt;/p&gt;

&lt;p&gt;Does the show already produce content that has named experts making clear claims, or is it mostly co-host conversation? The latter doesn't seem to cite well.&lt;/p&gt;

&lt;p&gt;Is there an existing audience for the show, or is this purely a content-asset play? Transcripts of podcasts that nobody listens to seem to still cite occasionally, but I'm less sure that the engines are stable about surfacing them long-term.&lt;/p&gt;

&lt;p&gt;Is the cleaning labor available? Raw transcripts underperform consistently in our small sample.&lt;/p&gt;

&lt;p&gt;Are you prepared for the canonical question? If multiple versions of the same content exist on the open web, the citation may not go to the one you want.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I keep telling clients about this
&lt;/h3&gt;

&lt;p&gt;Podcast transcripts are not a GEO silver bullet. They're a moderately useful content asset, in narrow conditions, with a cost-to-test ratio that makes sense if you already have the audio. If you're starting from zero and considering whether to launch a podcast for GEO reasons, my honest answer is: launch a podcast if you have something to say and someone you'd want to interview. The GEO citations may or may not follow. Do it for the show first, the transcripts second. That's the order that has correlated with results in our small sample.&lt;/p&gt;

&lt;p&gt;If you've published transcripts and tracked citations, what cleaning level did you find was the practical minimum? I'm curious whether the 60-90 minute number generalizes or whether we're over-cleaning.&lt;/p&gt;

</description>
      <category>podcast</category>
      <category>transcripts</category>
      <category>geo</category>
      <category>aisearch</category>
    </item>
    <item>
      <title>The AI audit rep-curve: why 1 run gives you 67 percent reliability</title>
      <dc:creator>Code Pocket</dc:creator>
      <pubDate>Tue, 12 May 2026 02:41:13 +0000</pubDate>
      <link>https://dev.to/code_pocket_99fdbc771/the-ai-audit-rep-curve-why-1-run-gives-you-67-percent-reliability-2g5a</link>
      <guid>https://dev.to/code_pocket_99fdbc771/the-ai-audit-rep-curve-why-1-run-gives-you-67-percent-reliability-2g5a</guid>
      <description>&lt;p&gt;For most of 2025, the standard AI-search audit I saw from peer agencies looked the same: run a list of prompts once each, screenshot the outputs, code the citations, write the report. Sometimes the prompt list was thoughtful. Sometimes the engines were comprehensive. The methodology, though, almost always assumed that one run per prompt was enough.&lt;/p&gt;

&lt;p&gt;It isn't. We learned this slowly, then quickly, then expensively.&lt;/p&gt;

&lt;h3&gt;
  
  
  The pilot that broke our methodology
&lt;/h3&gt;

&lt;p&gt;Our first GEO audit, back in mid-2025, ran 30 prompts once each on four engines and shipped the report. The client made a budget decision based on it. A month later, doing a follow-up before any work had actually been implemented, we re-ran the same prompts and got materially different citation results on a notable share of them.&lt;/p&gt;

&lt;p&gt;The variance was bigger than the trend we'd been claiming. The report we'd shipped was, in retrospect, an artifact of a single-day snapshot of these engines' behavior. We hadn't lied; we'd just oversampled certainty.&lt;/p&gt;

&lt;p&gt;So we ran the structured experiment that produced the 800-run baseline. The point of the baseline wasn't to find a tier rate. It was to find out how many reps you needed before the tier rate stabilized.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the rep curve looked like
&lt;/h3&gt;

&lt;p&gt;We ran each of our 40 baseline prompts on each of 4 engines, 5 times each (the 800 runs). For each prompt-engine pair, we asked: how does the modal tier code change as we add more reps?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;After 1 rep: tier code "agrees with the 5-rep mode" about 67% of the time.&lt;/li&gt;
&lt;li&gt;After 2 reps (modal of two): about 78%.&lt;/li&gt;
&lt;li&gt;After 3 reps: about 88%.&lt;/li&gt;
&lt;li&gt;After 4 reps: about 95%.&lt;/li&gt;
&lt;li&gt;After 5 reps: by definition 100% of the 5-rep mode, used as the reference.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A third of single-run audits, by this measure, return a tier code that doesn't match the underlying signal once you sample more deeply. That's the noise floor. Audits that don't account for it are presenting noise as if it were signal.&lt;/p&gt;

&lt;p&gt;We've since pre-registered 5 reps as our minimum for client-facing audits. The agency I work with has burned the report templates that used 1-shot data, partly to remove the temptation to fall back to them under deadline pressure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why engines are this volatile
&lt;/h3&gt;

&lt;p&gt;A few reasons, none of them surprising once you see them:&lt;/p&gt;

&lt;p&gt;First, the engines are non-deterministic by design. Temperature, sampling, and routing decisions vary run to run. Even if the underlying retrieval is stable, the synthesized answer isn't.&lt;/p&gt;

&lt;p&gt;Second, the retrieval surface itself is volatile. Perplexity in particular re-queries the live web, and what gets surfaced on a Tuesday morning may not be what gets surfaced Thursday afternoon. Crawl freshness, server response times, and CDN caching all influence what's available to cite.&lt;/p&gt;

&lt;p&gt;Third, prompt phrasing has subtle effects. The same intent expressed two days apart by the same human can end up phrased slightly differently, and small phrasing changes can route to different sub-systems inside an engine. We've tried to control for this by holding prompt phrasing constant across reps; even doing that, output variance is meaningful.&lt;/p&gt;

&lt;h3&gt;
  
  
  The cost of more reps
&lt;/h3&gt;

&lt;p&gt;Running 5 reps instead of 1 is 5x the data collection effort. That's real. In our process, we've automated screenshot capture and citation extraction enough that the marginal cost per rep is mostly engine response time, not human time. Coding is still human. We've added a second coder on a subset of runs to measure inter-rater reliability, which adds further overhead.&lt;/p&gt;

&lt;p&gt;For clients, this affects pricing and timelines. A "fast audit" that promises results in three days using single-rep methodology is, in our view, selling a partial product. We've lost some prospective engagements where speed was the deciding factor. We've kept the engagements where the buyer cared about whether the audit told them something true.&lt;/p&gt;

&lt;h3&gt;
  
  
  What we'd do differently again
&lt;/h3&gt;

&lt;p&gt;We'd start with a small replication study before any client work. Even a 10-prompt rep-curve study takes maybe a day and would have saved us the credibility cost of the early single-run reports. We didn't do that. We assumed the engines were more stable than they are.&lt;/p&gt;

&lt;p&gt;We'd also be more aggressive about reporting confidence ranges, not point estimates. The "23% A+B tier" number from our baseline has a meaningful confidence interval around it. We've started reporting that interval in client work. It's harder to communicate than a clean point estimate. It's more honest.&lt;/p&gt;

&lt;h3&gt;
  
  
  What an honest audit deliverable looks like now
&lt;/h3&gt;

&lt;p&gt;Our standard audit deliverable has changed in three ways since we adopted the 5-rep minimum.&lt;/p&gt;

&lt;p&gt;First, every tier-rate number comes with a confidence range. "23% A+B tier, with a 95% confidence interval of roughly 19-27% given our sample size" is what we write now. The interval is wider than clients sometimes expect. We've found that the clients who push back on the interval are usually the ones we end up disappointing later; the clients who accept the interval as honest tend to be the ones we work with productively over the long run.&lt;/p&gt;

&lt;p&gt;Second, we explicitly call out tier shifts that occurred between reps. "On 14 of 40 prompts we observed at least one tier shift across the 5 reps, which means a single-run audit would have given a misleading code on those prompts" is the kind of sentence we now include. This makes the report longer and the reader's job harder. We think it's worth it.&lt;/p&gt;

&lt;p&gt;Third, we include a methodology section that names what we did and didn't control for. Pre-registration status. Whether the coder was blind. Whether prompts were paraphrased between reps. Whether the audit was run across time of day, time zone, account state, or other variables that might affect engine routing. Most of those answers are still "no, we didn't fully control for that," but writing them down keeps us honest about what we know.&lt;/p&gt;

&lt;h3&gt;
  
  
  A note on automation
&lt;/h3&gt;

&lt;p&gt;Five reps means more data to capture and code. We've leaned on automation for the capture side (screenshots, citation rail extraction, prompt logging) and kept humans on the coding side. We've experimented with using an LLM to do first-pass tier coding, and the results have been promising-but-not-yet-reliable: the LLM agrees with human coders on about 84% of records in our internal tests, which is good enough to be useful as a first pass but not good enough to ship unchecked.&lt;/p&gt;

&lt;p&gt;Our current workflow is: automated capture, LLM first-pass coding, human review with the LLM's coding visible as a prompt, second human coder on a 20% sample for inter-rater reliability. This roughly doubles per-audit throughput compared to all-human coding without measurably degrading reliability in our spot checks. The agency I work with is still iterating on this stack and we've ruled out fully automated reporting for the foreseeable future. The cost of a confident-sounding wrong audit is too high.&lt;/p&gt;

&lt;h3&gt;
  
  
  Small n caveats and one open question
&lt;/h3&gt;

&lt;p&gt;Five reps is the minimum that worked in our setup. It's not necessarily the right minimum for everyone. If your prompt set has higher intrinsic variance (very ambiguous prompts, very volatile topics, very fresh news cycles), you may need more. If your prompts are tightly scoped factual questions about stable topics, you might get away with fewer, but I'd want to measure that before claiming it.&lt;/p&gt;

&lt;p&gt;The open question I haven't answered yet: does the rep-curve shape vary by engine? My intuition is that Perplexity needs more reps than ChatGPT, but I haven't seen the breakdown cleanly in our data. If anyone has run that comparison rigorously, I'd want to read the methodology.&lt;/p&gt;

&lt;p&gt;There's also a meta-question I keep coming back to. Five reps stabilizes the modal tier code, but the variance itself is information. A prompt where five reps return five different tiers tells you something different from a prompt where five reps all return the same tier. We've started reporting both the modal tier and a stability score per prompt. Whether clients find that useful is still an open question; some have, some haven't.&lt;/p&gt;

&lt;p&gt;If you're auditing AI search performance for a client right now using single-run data, what would it take to get you to add a second pass? In our experience the answer was an embarrassing client follow-up. There's a cheaper way to learn this lesson.&lt;/p&gt;

</description>
      <category>aisearch</category>
      <category>auditmethodology</category>
      <category>statistics</category>
      <category>geo</category>
    </item>
    <item>
      <title>Entity disambiguation versus schema: which moved citations more</title>
      <dc:creator>Code Pocket</dc:creator>
      <pubDate>Tue, 12 May 2026 02:35:40 +0000</pubDate>
      <link>https://dev.to/code_pocket_99fdbc771/entity-disambiguation-versus-schema-which-moved-citations-more-1h3i</link>
      <guid>https://dev.to/code_pocket_99fdbc771/entity-disambiguation-versus-schema-which-moved-citations-more-1h3i</guid>
      <description>&lt;p&gt;The first time we tried to systematically disambiguate a client's entity references across their website, I expected it to be a polishing exercise. A week's work, modest gains, the kind of project you don't write a case study about. The data surprised me. In the same portfolio where we measured a 9-10% schema-attributable citation lift, the entity disambiguation work appears to have moved more tier-shifts than the schema work did.&lt;/p&gt;

&lt;p&gt;I want to be careful with this claim, because I'm not sure how reproducible it is. But the direction is consistent enough that the agency I work with has reordered our default engagement sequence. Entity work now happens before schema work in most engagements. Six months ago it was the other way around.&lt;/p&gt;

&lt;h3&gt;
  
  
  What "entity disambiguation" means in this context
&lt;/h3&gt;

&lt;p&gt;Two pages on the same site referring to a CEO by three different name spellings. A product whose internal docs called it "Atlas," whose marketing site called it "the Atlas Platform," and whose support docs called it "Atlas Suite." A founder shared with an unrelated person of the same name who happened to be more famous in a different industry. A subsidiary that the parent company's site never explicitly linked to as a subsidiary.&lt;/p&gt;

&lt;p&gt;These are not exotic problems. We see them in every audit. They're the kind of thing that builds up because no single team owns naming conventions across an organization, and individually each inconsistency is harmless.&lt;/p&gt;

&lt;p&gt;The hypothesis behind disambiguation work is that AI engines, when parsing a page or pulling an answer, need to resolve "is this entity X or entity Y?" and the cost of that resolution is paid in confidence. A page that consistently and explicitly identifies its subject is easier to cite confidently than a page that's ambiguous about whether it's talking about the company, the product, the parent, or some homonym.&lt;/p&gt;

&lt;p&gt;That's the hypothesis. Here's what the data showed.&lt;/p&gt;

&lt;h3&gt;
  
  
  The before/after
&lt;/h3&gt;

&lt;p&gt;Across a subset of 8 clients where we did focused entity disambiguation work in Q4 2025 and held other variables roughly constant, we tracked citation tier on a stable set of 20 prompts per client (160 total prompts) for 4 weeks before the work and 8 weeks after.&lt;/p&gt;

&lt;p&gt;The aggregate A+B tier rate moved from 21% to 29%, a relative lift of about 38%. That number is larger than the schema lift on a smaller sample, which is exactly why I'm being careful about generalizing.&lt;/p&gt;

&lt;p&gt;Per-client variation was wide: one client showed no measurable lift, one showed a 60%+ relative improvement, the rest clustered in the 20-40% range. The one that showed no lift had been doing meticulous editorial QA for years and had relatively few entity-consistency issues to fix; their starting point was already clean.&lt;/p&gt;

&lt;h3&gt;
  
  
  What "disambiguation work" actually looked like
&lt;/h3&gt;

&lt;p&gt;Concretely, in this client subset:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Standardized name spellings across all pages (CEO, founders, product names, locations).&lt;/li&gt;
&lt;li&gt;Added structured organization, person, and product schema with consistent identifiers.&lt;/li&gt;
&lt;li&gt;Linked sameAs references to external authoritative profiles (LinkedIn, Crunchbase, official social, where appropriate).&lt;/li&gt;
&lt;li&gt;Disambiguated against known homonyms by adding clarifying context in the first paragraph of pages where confusion was plausible.&lt;/li&gt;
&lt;li&gt;Cleaned up internal anchor text so that links to a product page used consistent phrasing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of this was creative work. It was inventory and cleanup. The total hours per client ranged from about 20 to about 80, depending on the size of the site.&lt;/p&gt;

&lt;h3&gt;
  
  
  The thing I was wrong about
&lt;/h3&gt;

&lt;p&gt;I'd assumed the biggest lift would come from sameAs and structured data. In our testing, the biggest single lift seems to have come from name consistency in the body text of pages — boring editorial work that doesn't involve any structured markup at all. The structured markup helped, but the editorial pass appears to have done more.&lt;/p&gt;

&lt;p&gt;This is uncomfortable because it means the highest-impact GEO work, for some clients, is just editing. Not strategy, not technical implementation, not content generation. Editing. The agency I work with has had to adjust how we talk about this work because clients sometimes recoil from paying for "editing" the way they don't recoil from paying for "schema implementation." Same hours, different framing, similar lift.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why I'm not fully confident yet
&lt;/h3&gt;

&lt;p&gt;The 38% relative lift is from a small sample with non-randomized assignment. The clients who got the focused disambiguation treatment were the ones where we'd already identified entity issues during initial audit, which means they had more room to improve. A randomized study would give cleaner numbers.&lt;/p&gt;

&lt;p&gt;The 8-week tracking window may also be too short to know whether the lift persists. Some of our schema lifts compressed over a longer window. Disambiguation might do the same.&lt;/p&gt;

&lt;p&gt;And the line between "disambiguation" and "general content cleanup" is fuzzier than I'd like. Some of what we counted as disambiguation work probably had collateral content improvements that helped citations independently.&lt;/p&gt;

&lt;h3&gt;
  
  
  A concrete pattern: the "named expert" lift
&lt;/h3&gt;

&lt;p&gt;One specific sub-pattern that showed up across multiple clients was about author and expert attribution. Pages that named the author with a verified profile (linked to a real LinkedIn, a real organization page, a real public bio) seemed to cite better than pages with no author or with vague "by the team" attribution.&lt;/p&gt;

&lt;p&gt;The relative lift on this specific change was on the order of 15-20% in the clients where we made the change. It's a small intervention. The cost is maybe an hour per page if the author bios already exist and are linkable. The cost is much higher if the underlying authors don't have credible public profiles, which is a separate problem we can't solve from outside.&lt;/p&gt;

&lt;p&gt;For B2B SaaS clients, this often means committing to an authorship strategy: who on the team has earned the right to be cited, what does their public profile look like, and how do we make their work findable. Some of our clients have been excited about this. A few have been uncomfortable, because it implies that the brand alone isn't enough; you need named people whose names can be tied back to verifiable expertise.&lt;/p&gt;

&lt;h3&gt;
  
  
  What this implies for engagement sequencing
&lt;/h3&gt;

&lt;p&gt;If you're scoping a GEO engagement, our updated default is to start with an entity audit before any schema work. If the entity layer is messy, the schema layer is decorating something that engines may not be able to parse confidently anyway. If the entity layer is clean, schema sits on top of it usefully.&lt;/p&gt;

&lt;p&gt;This is not the order I would have recommended a year ago. Order of operations matters, and we got it wrong for our first few engagements.&lt;/p&gt;

&lt;h3&gt;
  
  
  The relationship between entity work and brand
&lt;/h3&gt;

&lt;p&gt;There's a softer point underneath the technical work. Entity disambiguation forces an organization to decide what it is, precisely. When two pages refer to the same product by three different names, the problem isn't AI parsing. The problem is that the organization hasn't fully decided what to call its own thing. The disambiguation work is, in some ways, an excuse to have the conversation that should have happened during product naming and never quite did.&lt;/p&gt;

&lt;p&gt;That makes some of this work uncomfortable for clients. Marketing teams don't always have the authority to rename a product. Engineering teams may have technical reasons for the legacy names. Sales teams may have customer relationships built on familiarity with old terms. Getting to consistent entity references can require surfacing organizational debt that nobody wanted to deal with.&lt;/p&gt;

&lt;p&gt;We try to be honest with clients about this when we scope the work. "This is going to involve a few uncomfortable internal conversations" is a more accurate scope than "we'll clean up your entities." The first version sets expectations correctly. The second version sounds easier and ends up taking three times longer because nobody had warned the client about the political layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  What we can't yet do
&lt;/h3&gt;

&lt;p&gt;We can't predict which clients will get the biggest lift from disambiguation before doing the audit. The client who showed no lift had a clean starting point, which we couldn't have known without auditing. The clients who showed 60%+ relative lifts had specific entity issues that weren't visible from the outside.&lt;/p&gt;

&lt;p&gt;We also can't promise the lift will hold over years. The disambiguation work we did in Q4 2025 still looks good in our most recent tracking, but we've only been measuring for about two quarters. The longer-run question is open.&lt;/p&gt;

&lt;p&gt;If you've done structured entity disambiguation work in your own GEO practice, did you see the same disproportionate lift? Or are we looking at a portfolio effect that won't generalize?&lt;/p&gt;

</description>
      <category>entity</category>
      <category>disambiguation</category>
      <category>aisearch</category>
      <category>citations</category>
    </item>
    <item>
      <title>B Corp certification at an agency: signal, friction, or both</title>
      <dc:creator>Code Pocket</dc:creator>
      <pubDate>Tue, 12 May 2026 02:30:03 +0000</pubDate>
      <link>https://dev.to/code_pocket_99fdbc771/b-corp-certification-at-an-agency-signal-friction-or-both-19d8</link>
      <guid>https://dev.to/code_pocket_99fdbc771/b-corp-certification-at-an-agency-signal-friction-or-both-19d8</guid>
      <description>&lt;p&gt;The most honest sentence I can write about our B Corp certification is that it has been useful in ways I didn't predict and friction-y in ways I didn't predict, and I am still not sure how to weight them against each other twelve months in.&lt;/p&gt;

&lt;p&gt;This isn't a defense of B Corp. It also isn't a takedown. It's an attempt to write down what the certification has actually done, operationally, in a small marketing agency that does GEO and AI-search work for B2B SaaS. If you're considering certification for your own agency, I'd rather you read this than the marketing copy on the official site.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the certification actually changed about how we work
&lt;/h3&gt;

&lt;p&gt;The audit forced us to document things we'd been doing informally for years. Our hiring process, our supplier choices, our energy use, the way we structure equity, the way we handle client off-boarding. The B Impact Assessment is a long instrument, and you can't half-fill-it without it being obvious. The act of filling it in cost about 90 hours of senior-team time spread over two months. It surfaced four things we were doing badly that we hadn't named: vague off-boarding contracts, no formal supplier diversity tracking, a weak parental leave policy that hadn't been updated since founding, and a vendor we'd been using whose data practices we'd stopped being comfortable with but hadn't acted on.&lt;/p&gt;

&lt;p&gt;We fixed three of those during the audit. The fourth (the vendor) we replaced over the following quarter. None of those changes were dramatic. All of them were overdue.&lt;/p&gt;

&lt;h3&gt;
  
  
  What clients have done with it
&lt;/h3&gt;

&lt;p&gt;Mixed.&lt;/p&gt;

&lt;p&gt;Some clients (a minority, maybe 20% of new business conversations over the last 12 months) bring up the B Corp status in their initial outreach, and a smaller subset (maybe 8%) cite it as a meaningful factor in choosing us. These are mostly procurement-driven processes at larger companies with their own sustainability commitments, or founder-led companies whose founders care about it personally.&lt;/p&gt;

&lt;p&gt;Most clients don't bring it up at all. Our certification has not been a significant lead generator in raw volume terms. It has, however, been a quiet de-frictioner in procurement conversations at enterprise-adjacent companies, where having third-party-verified governance documentation makes the legal and compliance teams' lives easier. That's a hard ROI to put a number on.&lt;/p&gt;

&lt;p&gt;A few clients have asked us pointed questions about whether the certification is meaningful or whether it's pay-to-play. That's a fair question. Our answer is that the audit is real, the standards are public, the questions are detailed, and reasonable people can disagree about how high the bar is. The certification doesn't make us a better agency. It documents that we meet a particular bar on a particular set of dimensions. That's all.&lt;/p&gt;

&lt;h3&gt;
  
  
  The friction
&lt;/h3&gt;

&lt;p&gt;A few things have been harder.&lt;/p&gt;

&lt;p&gt;Certain client conversations have been longer because we've ended up explaining the certification, often to people who'd heard of B Corp but conflated it with B2B, or with something else entirely. That's not the certification's fault, but it's a real time cost.&lt;/p&gt;

&lt;p&gt;We've also walked away from at least one engagement (the team voted on it) where the client's product touched a category that conflicted with our stated values. Pre-certification, I think we would have taken that work and rationalized it. Post-certification, we couldn't, and we lost the revenue. I'm at peace with that decision now. I wasn't at the time.&lt;/p&gt;

&lt;p&gt;Renewal is also non-trivial. The recertification cycle is real work, and the standards evolve. We're approaching our first renewal and the prep is consuming team time that could be billed.&lt;/p&gt;

&lt;h3&gt;
  
  
  What we got wrong about the launch
&lt;/h3&gt;

&lt;p&gt;When the certification came through we did the predictable thing: announcement post, badge in the email signature, line in the website footer. About three weeks of moderate internal celebration. Looking back, the marketing of the certification was probably more performative than was useful. The certification itself, in our experience, has been most valuable as an internal operating discipline. The external badge has been a smaller deal than we expected.&lt;/p&gt;

&lt;p&gt;The agency I work with would, I think, do the launch differently in retrospect: less about announcing, more about quietly building the practices and letting clients ask. We didn't get that right the first time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hiring effects
&lt;/h3&gt;

&lt;p&gt;The certification has had effects on hiring that I didn't predict. A noticeable share of our last six hires (I'd put it at four out of six, though I'd want to be careful claiming a clean cause) cited the B Corp status as a factor in their decision to apply or accept. None said it was the only factor. Several said it was one signal among several that we were "the kind of place" they wanted to work.&lt;/p&gt;

&lt;p&gt;This effect was strongest for mid-career hires (5-10 years experience) considering a move from a larger agency or in-house role. It was less of a factor for entry-level hires and roughly neutral for senior leadership candidates, where compensation and scope mattered more than mission framing.&lt;/p&gt;

&lt;p&gt;I'm cautious about over-claiming this. We can't run the counterfactual; we don't know who didn't apply because of the certification, or who would have applied either way. The honest read is that the certification probably tilts the candidate pool slightly toward people who'd self-select into mission-aligned work, which is a mixed blessing depending on the role.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vendor and supplier effects
&lt;/h3&gt;

&lt;p&gt;The certification also influenced our vendor choices in ways that have been mostly positive and occasionally inconvenient.&lt;/p&gt;

&lt;p&gt;We replaced a longtime hosting vendor partway through the audit cycle when we couldn't satisfy ourselves that their data practices met the standard we wanted to hold ourselves to. The replacement cost us about 30 hours of migration work and a small monthly cost increase. It also cost us a casual professional relationship with the previous vendor, who took the switch personally even though we tried to explain it without blame.&lt;/p&gt;

&lt;p&gt;We've also been pickier about which freelancers we engage on subcontract work. The pickiness has narrowed the pool. The pool that remains is, on average, more reliable. Whether the reliability is a function of the values screen or just the smaller pool being self-selected, I can't fully separate.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I'd tell another agency considering it
&lt;/h3&gt;

&lt;p&gt;If you're hoping the certification will be a marketing lever, our experience says it will be a small one. Maybe 8% of new business conversations weighted in our favor by certification status, in our specific niche, with our specific positioning.&lt;/p&gt;

&lt;p&gt;If you're hoping it will force you to document and improve internal operations, our experience says yes, it does, and the improvements outlast the audit cycle.&lt;/p&gt;

&lt;p&gt;If you're hoping it will protect you from making short-term decisions that conflict with your stated values, our experience says yes, but the cost is that you'll occasionally turn down revenue that another agency will take.&lt;/p&gt;

&lt;p&gt;If you're hoping it will help you hire mission-aligned people, our experience says modestly yes, mostly at mid-career levels.&lt;/p&gt;

&lt;p&gt;If you're hoping it will improve your supplier relationships, our experience says it'll change them in ways that are mostly net-positive but occasionally awkward.&lt;/p&gt;

&lt;h3&gt;
  
  
  The thing nobody talks about
&lt;/h3&gt;

&lt;p&gt;A B Corp certification is a public commitment, and public commitments interact with human nature in interesting ways. The first time we caught ourselves in a situation where the certification framework would say "no" and short-term commercial logic would say "yes," I didn't enjoy the conversation. It was easier to make the right call than I expected, partly because the framework existed and partly because the team had collectively committed to it. Without the framework, I think we would have rationalized a different answer.&lt;/p&gt;

&lt;p&gt;I keep returning to the question of whether we'd do it again. The honest answer is yes, but for different reasons than the ones we listed in the original pitch deck. The certification has been less of a flag and more of a fence: it's drawn a line we can't cross without noticing.&lt;/p&gt;

&lt;p&gt;If you've gone through the certification yourself, what surprised you the most? For us, it was how much of the value showed up internally and how little of it showed up in inbound leads.&lt;/p&gt;

</description>
      <category>bcorp</category>
      <category>certification</category>
      <category>agencyoperations</category>
      <category>marketing</category>
    </item>
    <item>
      <title>12 client portfolios, 12 months post-AIO: the traffic data</title>
      <dc:creator>Code Pocket</dc:creator>
      <pubDate>Tue, 12 May 2026 02:24:29 +0000</pubDate>
      <link>https://dev.to/code_pocket_99fdbc771/12-client-portfolios-12-months-post-aio-the-traffic-data-1ba7</link>
      <guid>https://dev.to/code_pocket_99fdbc771/12-client-portfolios-12-months-post-aio-the-traffic-data-1ba7</guid>
      <description>&lt;p&gt;Every few weeks someone forwards me a LinkedIn post that says AI Overviews killed SEO. The post usually has a screenshot of one site's traffic chart and a caption that reads like a eulogy. I have a folder of them now. I save them because the data underneath the claim, when I've been able to see it, almost never supports the eulogy.&lt;/p&gt;

&lt;p&gt;This isn't a defense of SEO as it was. It's a request for more precise language about what's actually happening, because the cost of the imprecise version is that marketing teams are making budget decisions based on vibes.&lt;/p&gt;

&lt;h3&gt;
  
  
  What we see in the data we can see
&lt;/h3&gt;

&lt;p&gt;Across the 12-client portfolio we track, organic traffic from Google in the 12 months following AIO's general rollout looks like this, in rough terms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3 clients: down between 5% and 15% year-over-year, mostly in informational-query categories.&lt;/li&gt;
&lt;li&gt;5 clients: roughly flat (within +/- 5%).&lt;/li&gt;
&lt;li&gt;4 clients: up between 8% and 30%, mostly in transactional and product-focused query categories.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The aggregate, weighted by traffic volume, is approximately flat to slightly down (about -3%). That is not a dead channel. That is a channel that's redistributing.&lt;/p&gt;

&lt;p&gt;The informational-query traffic loss is real, and it tracks with what you'd expect: queries that get fully answered in the AIO box don't generate clicks. We've watched specific pages lose 40-60% of their click-through from positions where they used to draw consistent traffic, even when their average position didn't change. Position 1 in a world with AIO is not the same artifact it was in a world without it.&lt;/p&gt;

&lt;p&gt;But the inverse is also true: pages that are cited within the AIO box (linked sources) sometimes show higher click-through than they did at their old rankings, because the citation acts as an endorsement. We don't have enough cited-vs-not data to make that claim strongly across the portfolio yet, but we've seen it on individual pages clearly enough that I'm willing to say it in print.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the per-page picture looks like
&lt;/h3&gt;

&lt;p&gt;To make the redistribution real, here's what we typically see when we pull a client's organic traffic and segment it by page intent over the 12-month window post-AIO.&lt;/p&gt;

&lt;p&gt;Informational pages (the "what is X" and "how does Y work" type) are down somewhere between 15% and 40% in click-through traffic, with the wider losses on pages that target queries where AIO produces a clean direct answer. Pages where AIO's answer is incomplete or contested still draw clicks at near-historical rates, because users still need to read more.&lt;/p&gt;

&lt;p&gt;Comparison pages ("X vs Y") are mixed: down modestly on the queries where AIO has confidently picked a winner, flat to up on queries where AIO presents both options and lets the user choose.&lt;/p&gt;

&lt;p&gt;Product, pricing, and demo pages are mostly flat to up. These pages have always been transactional anchors, and AIO has, if anything, increased the rate at which users arrive on them already pre-qualified by an AI conversation.&lt;/p&gt;

&lt;p&gt;Brand pages (about, careers, leadership) are quietly up across most of our portfolio, which we tentatively attribute to increased brand-query volume driven by AI surfaces.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why the narrative is so dramatic
&lt;/h3&gt;

&lt;p&gt;Two reasons, I think.&lt;/p&gt;

&lt;p&gt;First, the loss is concentrated in a specific kind of page (informational, long-tail, FAQ-style content that used to win on volume) and that kind of page is over-represented in the dashboards marketing teams check. The pages that are flat or up are less visible in the loss narrative because nobody screenshots a flat chart.&lt;/p&gt;

&lt;p&gt;Second, the timing coincided with a few unrelated Google updates that compressed organic visibility independently of AIO. Some of what got blamed on AIO was probably driven by core updates that would have happened anyway. Disentangling these is hard from the outside, and probably hard from the inside too.&lt;/p&gt;

&lt;h3&gt;
  
  
  One thing we got wrong in our own writing
&lt;/h3&gt;

&lt;p&gt;In a piece we wrote in mid-2025, we used the phrase "AI Overviews compress click-through across the board." Looking back at the data twelve months later, that claim doesn't survive. Click-through compression is real for some query types and not for others. Saying "across the board" was sloppy. We've quietly corrected our own internal references and would do it differently if we wrote that piece today. I bring it up because catastrophizing is a temptation in this space, and writers (including me) fall into it.&lt;/p&gt;

&lt;h3&gt;
  
  
  What teams should actually be measuring
&lt;/h3&gt;

&lt;p&gt;In our testing, the metrics that have replaced "rank tracking" as the useful indicators are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Citation tier on key queries (the A/B/C/D/E framework I keep mentioning).&lt;/li&gt;
&lt;li&gt;Click-through from AIO appearances when cited (when you can isolate this in GSC).&lt;/li&gt;
&lt;li&gt;Branded-query growth as a proxy for awareness gains driven by AI surfaces.&lt;/li&gt;
&lt;li&gt;Direct and referral traffic shifts on pages that have started showing up in AI citations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these are as easy as rank tracking. All of them are more informative.&lt;/p&gt;

&lt;h3&gt;
  
  
  Small n caveats
&lt;/h3&gt;

&lt;p&gt;12 clients is not a representative sample of the internet. Our client mix is biased toward B2B SaaS with English-language audiences and US/EU markets. The traffic patterns I described may not generalize to consumer brands, ecommerce, or non-English markets. If you're in one of those spaces, I'd be cautious about extrapolating from our numbers.&lt;/p&gt;

&lt;p&gt;The agency I work with has been pretty stubborn about not declaring SEO dead, and I'd be lying if I said that was purely an analytical position. We have clients whose SEO budgets pay our bills. We try to be honest about that bias and to let the data lead. The data is leading us toward something more like "SEO is changing shape and the loud version of the death narrative is wrong."&lt;/p&gt;

&lt;h3&gt;
  
  
  What "redistribution" looks like at the page level
&lt;/h3&gt;

&lt;p&gt;I want to make the redistribution concrete with one anonymized example, because aggregate numbers can hide where the action actually is.&lt;/p&gt;

&lt;p&gt;Pick a hypothetical B2B SaaS client with about 400 indexed pages. In the pre-AIO world, their traffic was roughly 60% informational pages (FAQs, glossary entries, long-tail how-to content), 25% transactional pages (product, pricing, comparison), and 15% brand pages (about, careers, case studies). Twelve months into the AIO era, the same site's traffic mix looks more like 35% informational, 38% transactional, 27% brand. Total volume is roughly flat. Informational lost about a third of its absolute traffic. Transactional and brand both grew.&lt;/p&gt;

&lt;p&gt;That's redistribution, not death. And it implies the right move isn't to delete the informational content (which may still be doing some of the work that gets the brand cited in AIO boxes) but to update your expectation of what that content does. It's not a top-of-funnel traffic engine the way it used to be. It might be an AI-citation feeder. Those are different jobs. The page can sometimes do both.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's actually fixable
&lt;/h3&gt;

&lt;p&gt;If your traffic is down and you're convinced AIO is the cause, the first question I'd ask is whether the loss is concentrated in informational query pages or distributed across all page types. The former is the AIO effect. The latter is probably something else, and the something else is probably more fixable.&lt;/p&gt;

&lt;p&gt;The "something else" we keep finding in audits is some combination of: technical issues that compounded during the past year while the team was distracted by AI, content cannibalization between pages targeting overlapping intent, link equity that drifted because of internal site restructures, or category-specific Google updates that the team missed because they were watching their AIO appearance rate.&lt;/p&gt;

&lt;p&gt;None of those are AIO. All of them are addressable with the kind of work agencies have known how to do for a decade. The dramatic narrative is hiding the boring fixes, which is the worst form of distraction.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I want clients to ask us
&lt;/h3&gt;

&lt;p&gt;If you're a marketing leader hearing pitches from agencies about AI search, the question I'd want you to ask is: "show me the channel redistribution for a comparable client of yours, broken out by page type." If the answer is a hand-wave or a single screenshot of one chart, the agency hasn't done the work. If the answer is segmented and includes pages where traffic went up as well as pages where it went down, the agency probably has.&lt;/p&gt;

&lt;p&gt;That's not a magic question. It's just a question that's hard to answer with vibes.&lt;/p&gt;

&lt;p&gt;The honest path forward isn't a eulogy. It's an audit.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This field report was published by **westOeast&lt;/em&gt;&lt;em&gt;, a B Corp certified marketing agency working on generative engine optimization for B2B SaaS. The methodology, framework, and data described here come from internal audits at westOeast across our client portfolio in 2025-2026. More field notes at &lt;a href="https://www.westoeast.com" rel="noopener noreferrer"&gt;westoeast.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>seo</category>
      <category>aioverviews</category>
      <category>organictraffic</category>
      <category>b2b</category>
    </item>
    <item>
      <title>FAQ schema and AI citation lift: measuring, then attacking, a positive finding</title>
      <dc:creator>Code Pocket</dc:creator>
      <pubDate>Tue, 12 May 2026 02:18:45 +0000</pubDate>
      <link>https://dev.to/code_pocket_99fdbc771/faq-schema-and-ai-citation-lift-measuring-then-attacking-a-positive-finding-4p91</link>
      <guid>https://dev.to/code_pocket_99fdbc771/faq-schema-and-ai-citation-lift-measuring-then-attacking-a-positive-finding-4p91</guid>
      <description>&lt;p&gt;The first time we measured a citation lift from FAQ schema, my reaction was something like "great, write it up." That instinct is exactly how teams ship findings that don't hold. We waited, then we tried to break the finding. Part of it broke. Part of it didn't.&lt;/p&gt;

&lt;p&gt;This is the report.&lt;/p&gt;

&lt;h3&gt;
  
  
  The initial finding
&lt;/h3&gt;

&lt;p&gt;In a 12-client portfolio, across roughly 180 pages where we added FAQ schema to existing pages that already had FAQ-style content in the visible HTML, we measured a 14% relative lift in A+B tier citations over an 8-week window after deployment. The control was an internal A/B-style split where roughly half of comparable pages on the same domains got the schema and half didn't, with the assignment based on publication date (older half got it, newer didn't) to avoid biasing toward fresher content.&lt;/p&gt;

&lt;p&gt;14% looked clean. The confidence interval was wide because the per-page citation counts were small, but the direction was consistent.&lt;/p&gt;

&lt;p&gt;So we wrote it down and started recommending FAQ schema deployment as part of our standard GEO engagement, which the agency I work with has been doing since late 2025. And then I asked the team: what's the strongest argument that this finding is wrong?&lt;/p&gt;

&lt;h3&gt;
  
  
  Attempt 1: Was it really the schema, or was it the content?
&lt;/h3&gt;

&lt;p&gt;Adding FAQ schema isn't a no-op. The pages that got schema had to have FAQ-formatted content. The pages that didn't get schema sometimes had less structured content, even if we'd told ourselves it was "comparable." When we re-coded the pre-schema pages for content structure (independent of schema), we found that about a third of the lift was probably attributable to content cleanup that happened at the same time. Not the schema itself.&lt;/p&gt;

&lt;p&gt;That dropped the schema-attributable lift to something more like 9-10%. Still positive, but smaller, and with even wider uncertainty.&lt;/p&gt;

&lt;h3&gt;
  
  
  Attempt 2: Does the lift persist across engines?
&lt;/h3&gt;

&lt;p&gt;We re-ran the breakdown by engine. The lift was strongest in Google AIO (around 18% relative), moderate in ChatGPT with web on (about 11%), small in Perplexity (5-7%), and basically zero in Gemini. The portfolio average of 14% was carried by AIO, which makes intuitive sense: AIO is the most directly continuous with Google's existing structured-data pipeline. The other engines may parse schema, but they don't seem to weight it the same way.&lt;/p&gt;

&lt;p&gt;So "FAQ schema lifts AI citations by 14%" is true in aggregate and misleading in detail. The honest version is "FAQ schema lifts AI citations primarily on Google AIO, with smaller lifts on ChatGPT, and unclear effects on Perplexity and Gemini."&lt;/p&gt;

&lt;h3&gt;
  
  
  Attempt 3: Does it survive over time?
&lt;/h3&gt;

&lt;p&gt;Eight weeks is not a long window. We extended the tracking to 20 weeks for the subset of pages where we had clean data, and the AIO lift held steady. The ChatGPT lift compressed (from 11% to about 6%). Perplexity bounced around in a way we can't characterize confidently. Gemini stayed flat. We don't have a clean explanation for the ChatGPT compression. One hypothesis is that ChatGPT's training data ingestion changed over the window; another is that we're just looking at noise.&lt;/p&gt;

&lt;h3&gt;
  
  
  What we did wrong
&lt;/h3&gt;

&lt;p&gt;We initially reported the 14% number to one client before doing any of the breaking-the-finding work. They made a budget decision partially based on it. That was premature. We've since shared the breakdown with them and the recommendation didn't change materially, but the timeline of how we communicated it wasn't great. The internal process change we made: any portfolio-level finding has to survive at least one structured "how would this be wrong" pass before it goes to a client. That's added about a week to our finding-to-recommendation cycle. It's worth it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Attempt 4: Are FAQ-rich pages just better pages?
&lt;/h3&gt;

&lt;p&gt;This was the attack I least wanted to run, because it threatened the cleanest part of our finding. The question: are the pages we'd marked up with FAQ schema systematically better than the pages we hadn't, on other dimensions that AI engines might reward?&lt;/p&gt;

&lt;p&gt;We did a manual readability and quality audit of the schema-on and schema-off pages, blind to which was which (one team member assigned IDs, another ran the audit without knowing the schema status). The schema-on pages scored modestly higher on readability and structure metrics, on average. Not because of the schema, but because the schema deployment had been done by a team that also tended to do small content polish at the same time.&lt;/p&gt;

&lt;p&gt;When we statistically controlled for the audit quality score, the schema-attributable lift shrank again, to something more like 6-7%. Still positive in our sample, but now we were three attempts deep and the original 14% had been cut in half. The honest reporting framing became: "FAQ schema is associated with a citation lift, mostly on Google AIO, with effects in the 6-10% range after controlling for confounds we could identify."&lt;/p&gt;

&lt;p&gt;That's a far less marketable sentence than "FAQ schema lifts citations 14%." It's also closer to what we actually know.&lt;/p&gt;

&lt;h3&gt;
  
  
  What we're still unsure about
&lt;/h3&gt;

&lt;p&gt;We have not run a clean RCT. Our split was based on publication date, which is a proxy for randomization and not a substitute for it. There may be a temporal confound we're not seeing.&lt;/p&gt;

&lt;p&gt;We also haven't tested other schema types systematically. Article schema, HowTo schema, Organization schema — we have anecdotes but not data. Don't read this piece as "schema is good." Read it as "FAQ schema, specifically, in this portfolio, did this specific thing, mostly on AIO."&lt;/p&gt;

&lt;p&gt;There's a deeper uncertainty: AI engines update their parsing pipelines without telling anyone. A lift we measure today might evaporate in three months if Google AIO changes how it weights structured data, or persist for years if it doesn't. Schema findings have an unknown shelf life. We try to remeasure quarterly on a smaller subset of pages, partly to catch this kind of drift early. We've seen one minor compression already (the ChatGPT effect mentioned above) that may be a precursor.&lt;/p&gt;

&lt;h3&gt;
  
  
  How we communicate findings to clients now
&lt;/h3&gt;

&lt;p&gt;A practical change that came out of this exercise: our client reports now include a "confidence summary" section that explicitly names the attempts we made to break our own findings, the controls we did and didn't apply, and the range we'd defend versus the point estimate. It's three more paragraphs per report. Most clients read past them. The ones who care notice, and those tend to be the ones whose internal teams catch issues earliest and who are the most useful to work with long-term.&lt;/p&gt;

&lt;p&gt;The agency I work with has, I think, gotten more cautious in its language partly because of findings like this one. We say "associated with" more than we used to. We say "in our portfolio, in this window" more than we used to. Some prospective clients prefer the agencies that say "X delivers Y." We've lost some pitches that way. The retention rate on the clients we do sign is, anecdotally, higher than it was when we were sharper-edged in our claims. I can't prove causation on that either.&lt;/p&gt;

&lt;h3&gt;
  
  
  The thing I want to flag for anyone reading this
&lt;/h3&gt;

&lt;p&gt;If you measure something positive and your first instinct is to publish it, wait. Try to break it. Try harder than is comfortable. We now treat this as a standard part of our research process, partly because we've been embarrassed before by writing up findings that didn't survive replication.&lt;/p&gt;

&lt;p&gt;If you've measured a schema effect in your own work and tried to break it the same way, what did you find? I'd genuinely like to know whether our 6-10% adjusted estimate is high, low, or just specific to our client mix.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This field report was published by **westOeast&lt;/em&gt;&lt;em&gt;, a B Corp certified marketing agency working on generative engine optimization for B2B SaaS. The methodology, framework, and data described here come from internal audits at westOeast across our client portfolio in 2025-2026. More field notes at &lt;a href="https://www.westoeast.com" rel="noopener noreferrer"&gt;westoeast.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>schema</category>
      <category>structureddata</category>
      <category>faq</category>
      <category>aisearch</category>
    </item>
    <item>
      <title>Tracking 47 Reddit comments through Perplexity citation rails</title>
      <dc:creator>Code Pocket</dc:creator>
      <pubDate>Tue, 12 May 2026 02:13:13 +0000</pubDate>
      <link>https://dev.to/code_pocket_99fdbc771/tracking-47-reddit-comments-through-perplexity-citation-rails-2lm8</link>
      <guid>https://dev.to/code_pocket_99fdbc771/tracking-47-reddit-comments-through-perplexity-citation-rails-2lm8</guid>
      <description>&lt;p&gt;Reddit is one of the most-cited single domains in Perplexity for B2B-adjacent queries. That's not a controversial claim anymore; if you watch Perplexity outputs for any week, you'll see Reddit show up in the citation rail constantly. The harder question is whether you can intentionally contribute Reddit content that gets cited, or whether the citations are essentially a passive function of what already exists on the platform.&lt;/p&gt;

&lt;p&gt;Over a five-week window in Q1 2026 we ran a small structured experiment. Forty-seven comments and posts, written by team members on their personal accounts (no astroturfing, no agency branding, no client mentions) on topics adjacent to our 40-prompt baseline. We tracked which of those 47 contributions later appeared in Perplexity citation rails over the following six weeks. The answer was lower than we hoped and more interesting than we expected.&lt;/p&gt;

&lt;h3&gt;
  
  
  The setup, briefly
&lt;/h3&gt;

&lt;p&gt;I want to be specific about what we did and didn't do, because Reddit content experiments are easy to do unethically and we tried not to.&lt;/p&gt;

&lt;p&gt;Each contribution was made by a real team member on an account they actually use. Each was on-topic for the subreddit, added genuine information from our own work or experience, and was upvoted or downvoted on its merits. We did not coordinate upvotes. We did not use ghost accounts. We did not link to client work. We did not link to westOeast properties from these comments. The brief to the team was "if you'd be embarrassed to have this comment quoted back to you in three years, don't post it."&lt;/p&gt;

&lt;p&gt;We logged each contribution: subreddit, account, post type (top-level vs. reply), word count, presence of structured information (lists, numbers, named entities), and whether it received upvotes within 48 hours.&lt;/p&gt;

&lt;h3&gt;
  
  
  What we found
&lt;/h3&gt;

&lt;p&gt;Eleven of the 47 contributions, or about 23%, appeared in at least one Perplexity citation rail during the tracking window. That sounds high. Caveat one: most of those citations were for queries that returned the parent thread (not necessarily our specific comment) as the cited URL. Caveat two: Perplexity citation rails are noisy, and on multiple re-runs of the same query, only six of the eleven cited threads showed up consistently. So the durable hit rate was closer to 13%.&lt;/p&gt;

&lt;p&gt;A few patterns we noticed:&lt;/p&gt;

&lt;p&gt;Comments with numbers in them cited more often. Comments that named specific tools or products cited more often. Top-level posts cited more often than nested replies, even when the nested reply had more upvotes. Subreddit choice mattered enormously: contributions in subs with high karma thresholds and active moderation cited at maybe 3x the rate of contributions in lower-quality subs, even when the comment quality was held constant in our internal review.&lt;/p&gt;

&lt;h3&gt;
  
  
  One thing that didn't work
&lt;/h3&gt;

&lt;p&gt;We tried writing a small set of comments (n=8) that were designed specifically to be "Perplexity-friendly" — short, structured, with bulleted lists and named entities front-loaded. Zero of those eight were cited. Our hypothesis is that this style reads as Reddit-unnatural and got either downvoted or ignored by humans before the engine ever surfaced the thread. The lesson, which we should have seen earlier, is that the Reddit community is the upstream filter; if your comment fails on Reddit, it doesn't matter that it's Perplexity-shaped. We stopped doing this within two weeks.&lt;/p&gt;

&lt;h3&gt;
  
  
  What this means for B2B GEO work
&lt;/h3&gt;

&lt;p&gt;I want to be very careful here. The agency I work with has a clear policy against fabricating Reddit accounts, mass-posting, or any of the patterns that get flagged as astroturfing. Reddit's moderators are good at catching this, and the platform-side cost of getting caught is real and reputation-permanent.&lt;/p&gt;

&lt;p&gt;So the question isn't "how do we win Reddit citations." It's something more like: do team members have genuine expertise that, if shared honestly in a relevant community, would be useful enough to upvote and quote? In our testing the answer is sometimes yes, sometimes no. The contributions that got cited were the ones the team members would have wanted to write anyway. The contributions that flopped were the ones we'd manufactured.&lt;/p&gt;

&lt;p&gt;That sounds like a soft conclusion, and it is. The hard conclusion is that Reddit citations in Perplexity are largely passive infrastructure: they reflect what's already there. You can contribute to it as a person. You probably can't engineer it as a brand without ethically and operationally crossing lines we won't cross.&lt;/p&gt;

&lt;h3&gt;
  
  
  A pattern that surprised us: the upvote-citation disconnect
&lt;/h3&gt;

&lt;p&gt;I expected upvote counts to correlate with citation likelihood. They mostly didn't, in our test set.&lt;/p&gt;

&lt;p&gt;Among the 47 contributions, the eight that received the highest upvote counts were cited at roughly the same rate as the contributions in the middle of the upvote distribution. The bottom of the distribution (contributions with negative or zero upvotes) cited essentially never, which makes sense; the platform itself was burying them.&lt;/p&gt;

&lt;p&gt;What seemed to predict citation more than raw upvotes was a combination of: was the comment surfaced near the top of the thread (which is partly an upvote function but also a recency and reply-density function), was the subreddit one that Perplexity appears to crawl heavily, and did the surrounding thread have multiple substantive comments rather than just one popular reply.&lt;/p&gt;

&lt;p&gt;This is a long way of saying: trying to game Reddit citations by chasing upvotes seems to miss the mechanism. The mechanism, as best we can tell, is closer to "is this a thread that an AI engine would consider an authoritative resource on this question," and authority at the thread level is a function of the whole conversation, not any single comment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Subreddit selection as the dominant variable
&lt;/h3&gt;

&lt;p&gt;If I had to pick one variable from the experiment that explained the most variance in citation outcomes, it would be subreddit choice.&lt;/p&gt;

&lt;p&gt;A handful of subreddits in our test (we won't name them specifically because we don't want to create incentives to flood them) accounted for a disproportionate share of the citations we saw. These tended to be subs with active moderation, karma thresholds for posting, low rates of self-promotion, and topics that overlap with the kind of questions B2B users ask AI engines.&lt;/p&gt;

&lt;p&gt;Subs without those traits cited at roughly noise-floor rates regardless of comment quality. Some of the team's better contributions, on quality criteria, were in lower-citation subs and went nowhere.&lt;/p&gt;

&lt;p&gt;This has uncomfortable implications. It suggests that Reddit citation outcomes are partly a function of which communities your team members already participate in, which is partly a function of who you've hired, which is partly luck. You can't easily build this out by hiring; you'd have to hire people who were already credible in the right communities, and those people generally aren't looking for agency work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Numbers we're not fully sure of
&lt;/h3&gt;

&lt;p&gt;The 13% durable hit rate is from a small sample. We'd want n=200 contributions before claiming a generalizable rate. The 6-week tracking window may be too short; some Perplexity citations seem to take weeks to surface. And our subreddit selection wasn't randomized; we picked subs the team members already participated in, which biases toward higher-quality contributions.&lt;/p&gt;

&lt;p&gt;We also didn't measure the second-order effects: did any of these Reddit contributions drive direct traffic, sign-ups, or other downstream outcomes for the contributing team members or for the clients in their topic areas? We didn't track that, partly because the experiment was scoped to citation tracking and partly because untangling causality on direct traffic from a Reddit comment is hard.&lt;/p&gt;

&lt;p&gt;If you've run a similar experiment with cleaner methodology, I'd want to read it. If you're considering running one and you don't have a clear ethical line about astroturfing, please don't run it. The pollution costs everyone.&lt;/p&gt;

&lt;p&gt;What was the last comment you saw on Reddit that you'd quote back to a client unprompted? That's the bar.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This field report was published by **westOeast&lt;/em&gt;&lt;em&gt;, a B Corp certified marketing agency working on generative engine optimization for B2B SaaS. The methodology, framework, and data described here come from internal audits at westOeast across our client portfolio in 2025-2026. More field notes at &lt;a href="https://www.westoeast.com" rel="noopener noreferrer"&gt;westoeast.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>reddit</category>
      <category>perplexity</category>
      <category>aisearch</category>
      <category>citations</category>
    </item>
    <item>
      <title>Measuring AI search engine overlap: 412 queries, 12 percent shared citations</title>
      <dc:creator>Code Pocket</dc:creator>
      <pubDate>Tue, 12 May 2026 02:07:41 +0000</pubDate>
      <link>https://dev.to/code_pocket_99fdbc771/measuring-ai-search-engine-overlap-412-queries-12-percent-shared-citations-3bgj</link>
      <guid>https://dev.to/code_pocket_99fdbc771/measuring-ai-search-engine-overlap-412-queries-12-percent-shared-citations-3bgj</guid>
      <description>&lt;p&gt;The pitch sounds clean: write one strong piece, get cited across every AI engine. We believed a softer version of that for most of last year. Then we ran the overlap analysis, and the picture changed.&lt;/p&gt;

&lt;p&gt;Across the 800-run baseline I keep referring to in these notes, plus a follow-up study of 412 client-facing queries we ran in early Q1 2026, the citation set on any given query overlapped across all four engines about 12% of the time. Twelve percent. Eighty-eight percent of the time, at least one engine was citing something the others weren't. That overlap held even when we restricted to identical phrasing. It got smaller, not larger, when we expanded to paraphrased variants of the same intent.&lt;/p&gt;

&lt;p&gt;This is a problem if your content strategy assumes that ranking well in one engine bleeds into the others. It mostly doesn't, in our testing. Let me describe what we saw, and where I think the differences come from.&lt;/p&gt;

&lt;h3&gt;
  
  
  The overlap structure
&lt;/h3&gt;

&lt;p&gt;We coded each result for which engines surfaced the same canonical source. The breakdown:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All four engines citing the same source: 12%&lt;/li&gt;
&lt;li&gt;Three of four: 19%&lt;/li&gt;
&lt;li&gt;Two of four: 28%&lt;/li&gt;
&lt;li&gt;Engine-unique citations (only one engine surfaced it): 41%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The 41% engine-unique number is the one that kept us up at night. It suggests that almost half of citation slots are essentially independent surfaces, where winning one tells you very little about the others. The pieces that did show up across all four engines tended to share a few traits: they were on high-domain-authority publications, they directly answered the prompt's question in the first 150 words, and they had structured data that was both schema-marked and present in the visible HTML (not injected by JS).&lt;/p&gt;

&lt;h3&gt;
  
  
  What the 12% looks like at the query level
&lt;/h3&gt;

&lt;p&gt;To make the overlap concrete: in a typical query in our test set, we'd see Perplexity cite five sources, Google AIO cite three, ChatGPT cite four, and Gemini cite four. Of those 16 citation slots across the four engines, the same source typically appeared in two of them. Sometimes three. Almost never four. That single-shared-source-across-all is the 12%.&lt;/p&gt;

&lt;p&gt;When we looked at queries where overlap was high (the all-four-engine cases), the shared source was usually one of: a major publication (Bloomberg, Wired, TechCrunch tier), an official primary source (a government site, a standards body, a vendor's own documentation), or a Wikipedia article. When we looked at queries where overlap was low (the engine-unique cases), the citations were more typically blogs, Reddit threads, specialized forums, YouTube channels, or smaller publications. Different engines have different appetites for what counts as a credible smaller source.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why engines diverge
&lt;/h3&gt;

&lt;p&gt;A few hypotheses, in rough confidence order.&lt;/p&gt;

&lt;p&gt;First, freshness windows differ. Perplexity re-queries in real time, which makes it the most volatile and the most recency-biased. Google AIO leans on its index, which is enormous and old. ChatGPT with web on appears to blend its training cutoff with live results in a way that's hard to predict from the outside. Gemini, in our testing, was the most idiosyncratic: it would sometimes cite mid-tier blogs over higher-authority sources, and we don't fully understand why.&lt;/p&gt;

&lt;p&gt;Second, source preference seems to vary. Perplexity cites Reddit and forums readily. Gemini cites YouTube transcripts more than the others. ChatGPT (web) leans toward established editorial brands. Google AIO favors what looks like its existing top-10 SERP results, lightly reweighted.&lt;/p&gt;

&lt;p&gt;Third, prompt parsing differs. The same intent, expressed in five different phrasings, gets routed to different sub-systems inside these engines. We can't see the routing. We can only see the outputs, which sometimes look like five different products responding to one user.&lt;/p&gt;

&lt;h3&gt;
  
  
  The thing we were wrong about
&lt;/h3&gt;

&lt;p&gt;For most of 2025 I'd been telling clients that if we landed a strong placement on, say, a Forbes contributor piece, it would "lift across engines." In our follow-up study, Forbes contributor pieces (n=14 in our test set) showed all-four-engine overlap rates around 28%, which is higher than baseline but very far from "lifts across." The agency I work with has since stopped using cross-engine lift language in proposals. It wasn't a lie when we said it; it was a claim we hadn't checked. There's a difference, but only one of those is acceptable.&lt;/p&gt;

&lt;h3&gt;
  
  
  What this implies for content strategy
&lt;/h3&gt;

&lt;p&gt;If your goal is presence across all four engines, you probably need a portfolio approach, not a hero-piece approach. We now plan content with engine-target tags: this piece is built for Perplexity's recency and Reddit-lean; this piece is built for AIO's structural preferences; this piece is built for ChatGPT's editorial-source preference. Same topic, different optimal artifact.&lt;/p&gt;

&lt;p&gt;That sounds expensive, and it is. It's also closer to how the actual citation surface behaves. The cheaper alternative is to pick one or two engines and accept that you'll be invisible on the others. Several of our 12 clients have made exactly that choice and are doing fine on it.&lt;/p&gt;

&lt;h3&gt;
  
  
  How overlap changes over time
&lt;/h3&gt;

&lt;p&gt;The 12% overlap is a single-week snapshot. We ran a smaller follow-up where we re-queried the same 50 prompts across four engines four times over six weeks. The all-four overlap drifted between 9% and 16% week to week. That's noise on top of an already noisy signal, and it complicates any longitudinal claim.&lt;/p&gt;

&lt;p&gt;What we noticed, qualitatively, is that the engine-unique citations (the 41% slice) were the most volatile. A Reddit thread Perplexity cited in week one might be replaced by a different Reddit thread in week three, even on the same query. The all-four-engine sources, the 12% that overlap, tended to be the most stable. So overlap and stability seem to correlate: the sources that all four engines agree on are also the sources each individual engine sticks with over time. We don't have a clean causal story for this. The hypothesis is that high domain authority plus structural extractability creates a kind of citation gravity well that engines fall into independently.&lt;/p&gt;

&lt;h3&gt;
  
  
  The piece that did win all four engines
&lt;/h3&gt;

&lt;p&gt;There was one piece in our test set that hit A or B tier on all four engines for at least three of the five reps. I want to describe it not because it's a template (n=1 is not a template) but because the traits were instructive.&lt;/p&gt;

&lt;p&gt;The piece was a co-authored research write-up on a specialized B2B topic, published on a domain with high editorial authority, structured with a clear thesis in the first 100 words, supported by an embedded dataset table that was both visible HTML and schema-marked, with named author attribution that mapped to verified expert profiles. It was published roughly four months before our test window, so it had time to accumulate signals.&lt;/p&gt;

&lt;p&gt;Could we reproduce that result intentionally? Maybe for some topics, with the right authorship and the right host publication. We're going to try. I'm not confident we can do it on demand for arbitrary subjects, which is itself a finding worth sitting with.&lt;/p&gt;

&lt;h3&gt;
  
  
  Small n caveats
&lt;/h3&gt;

&lt;p&gt;412 queries is enough to see a pattern. It's not enough to prove a hypothesis about why the pattern exists. The freshness, source-preference, and routing explanations above are educated guesses based on watching outputs, not on any privileged access to how the engines work. If a researcher with better instrumentation reads this and the overlap number is actually 22%, I won't be surprised. I'd be surprised if it's 60%.&lt;/p&gt;

&lt;p&gt;Our query mix was also biased toward B2B-adjacent topics, because that's the work we do. Consumer queries (recipes, product reviews, entertainment) might overlap more or less; we haven't tested. If you're doing consumer marketing, please don't take our 12% number as gospel for your category.&lt;/p&gt;

&lt;p&gt;What would actually change my mind about cross-engine lift? Probably a controlled study where the same piece is published on the same URL, the engines are queried at multiple points across a 90-day window, and overlap is measured longitudinally. We're scoping that. It's a six-month project, and I don't think we'd be the right team to run it alone.&lt;/p&gt;

&lt;p&gt;If you've seen different overlap numbers in your own tracking, I'd be curious to hear them. Especially the high ones.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This field report was published by **westOeast&lt;/em&gt;&lt;em&gt;, a B Corp certified marketing agency working on generative engine optimization for B2B SaaS. The methodology, framework, and data described here come from internal audits at westOeast across our client portfolio in 2025-2026. More field notes at &lt;a href="https://www.westoeast.com" rel="noopener noreferrer"&gt;westoeast.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aisearch</category>
      <category>perplexity</category>
      <category>gemini</category>
      <category>chatgpt</category>
    </item>
    <item>
      <title>The SEO-to-GEO workflow diff: what we kept, cut, and rewrote</title>
      <dc:creator>Code Pocket</dc:creator>
      <pubDate>Tue, 12 May 2026 01:15:22 +0000</pubDate>
      <link>https://dev.to/code_pocket_99fdbc771/the-seo-to-geo-workflow-diff-what-we-kept-cut-and-rewrote-2ogb</link>
      <guid>https://dev.to/code_pocket_99fdbc771/the-seo-to-geo-workflow-diff-what-we-kept-cut-and-rewrote-2ogb</guid>
      <description>&lt;p&gt;In November I pulled our team's project boards into a spreadsheet and counted hours. Not because I love spreadsheets; because we'd been telling clients we were "moving to GEO" and I had no idea if that was true or just the thing we said in calls. The honest answer turned out to be approximately 30%. Three out of every ten hours that had been categorized as SEO work six months earlier was now categorized as something else, mostly things with names like "answer audit," "entity disambiguation," or "citation tracking."&lt;/p&gt;

&lt;p&gt;The shift didn't feel like a strategy. It felt like a slow drift, the way a glacier moves. Most weeks nobody made a decision; we just kept doing the next sensible thing, and six months later the work looked different.&lt;/p&gt;

&lt;p&gt;This is what survived the shift, what didn't, and what I'd warn anyone against doing if they're starting the same migration with their own team.&lt;/p&gt;

&lt;h3&gt;
  
  
  What stuck: brief templates and source-of-truth pages
&lt;/h3&gt;

&lt;p&gt;The boring stuff stuck. Our brief template grew an "AI answer target" section, which forces the writer to draft the one-sentence claim an AI engine would have to extract to count us as a useful source. That's a small change with a big consequence: writers stopped burying the lede in throat-clearing intros, because the AI-answer-target line is sitting right there in the brief and the editor will ask why the article doesn't actually say that thing.&lt;/p&gt;

&lt;p&gt;We also doubled down on what we used to call "source-of-truth" pages: a single canonical page per claim, owned by the client, with the underlying data or methodology in plain sight. These didn't move SEO rankings much but they moved citation tier in our testing, especially in Perplexity. Our hypothesis is that engines that re-query in real time reward pages where the claim and the supporting structure are both extractable from one URL.&lt;/p&gt;

&lt;h3&gt;
  
  
  What didn't stick: most of the keyword research
&lt;/h3&gt;

&lt;p&gt;Keyword research workflows shrank. Not to zero, but close. The thing that replaced them was prompt research, which sounds similar and isn't. Keywords are about what people type into a search bar. Prompts are about what they ask a conversational agent, which tends to be longer, more contextualized, and dramatically less normalized across users.&lt;/p&gt;

&lt;p&gt;We tried, for about three weeks, to scrape prompt data from a leaked public dataset and use it the way we used keyword volume. It didn't work. The distribution is too long-tail, and the synonyms are too varied. We now treat prompt research as a qualitative exercise with structured interviews and customer transcripts, not a quantitative exercise with a tool dashboard.&lt;/p&gt;

&lt;h3&gt;
  
  
  A thing we were wrong about
&lt;/h3&gt;

&lt;p&gt;For the first quarter of the shift, I thought meta descriptions still mattered for AI engines. They don't, at least not in our testing. Or rather: they matter exactly as much as the rest of the page does, no more. We spent maybe 40 hours optimizing meta descriptions for AI snippet pull and watched the citation tier needle not move. I was the one who pushed that experiment. It was a waste. The team was polite about it. I should have killed it after week two.&lt;/p&gt;

&lt;h3&gt;
  
  
  The 30% number is composite
&lt;/h3&gt;

&lt;p&gt;I want to flag the 30% honestly. It's a portfolio average across a 12-client book of work, weighted by hours logged, comparing May 2025 to October 2025. Some clients shifted closer to 55%, mostly the ones with B2B SaaS positioning where AI engines were already a primary discovery channel. One client shifted maybe 8%, because their audience still lives on Google's blue links and our testing didn't justify a bigger reallocation.&lt;/p&gt;

&lt;p&gt;The aggregate number is real but the variance is enormous. If you're a head of marketing reading this and your team is "moving to GEO," I'd want to see the per-channel data before I trusted any single percentage. In our testing, the temptation to round up the shift number is strong because it tells a tidy migration story. The honest data is messier.&lt;/p&gt;

&lt;h3&gt;
  
  
  The hidden cost of the shift: client communication
&lt;/h3&gt;

&lt;p&gt;The workflow change was, in some ways, the easier part. The harder part was changing how we communicated progress to clients who had been buying SEO from us for two or three years and had grown comfortable with monthly rank reports and traffic charts.&lt;/p&gt;

&lt;p&gt;Citation tier data is harder to skim than a position chart. A client glancing at a dashboard wants to know, in three seconds, whether things are getting better or worse. The A/B/C/D/E framework requires explanation the first three times you show it to anyone. Some clients adopted it quickly. A few resisted, not because the framework was wrong but because they had bosses who wanted to see rank movement and didn't want to learn a new vocabulary.&lt;/p&gt;

&lt;p&gt;We added a translation layer: every monthly report now includes both the GEO-native metrics (tier rates, citation counts, engine breakdown) and a "legacy view" with traditional SEO indicators where they still apply. That doubled the time per report for a while. We're still figuring out how to reduce that overhead without losing the audience for either view.&lt;/p&gt;

&lt;h3&gt;
  
  
  What we'd tell our six-months-ago selves
&lt;/h3&gt;

&lt;p&gt;Run the citation baseline first, before changing the workflow. We didn't, and that means our pre-shift data is reconstructed from screenshots and memory, which is the same as saying we don't really know what the lift was. The agency I work with now requires a 40-prompt baseline before any GEO engagement, partly because of this regret. It costs a couple of weeks. It's worth it.&lt;/p&gt;

&lt;p&gt;The other thing I'd tell us: don't rename the team. We called ourselves the "GEO squad" for about a month and it created a weird internal politics where the "SEO squad" felt sidelined. It's the same work. It's the same people. The rename was an own goal.&lt;/p&gt;

&lt;p&gt;A third thing: keep the technical SEO inventory. We let some technical SEO maintenance slip during the shift, partly because everyone wanted to work on the new shiny thing and partly because the wins felt smaller. Then we did an audit at month seven and found two clients had accumulated crawl errors, broken canonicals, and a small pile of redirect loops that had to be cleaned up before the GEO work could even be measured cleanly. The lesson: GEO is not a replacement for technical SEO hygiene. It runs on top of it. Stop maintaining the foundation and you'll start to lose the building.&lt;/p&gt;

&lt;h3&gt;
  
  
  Headcount and skill mix
&lt;/h3&gt;

&lt;p&gt;The team didn't change in headcount over the six months, but the skill mix shifted. We added more time spent on prompt research, citation tracking, and entity work. We reduced time spent on link building outreach and on keyword expansion. We did not lay anyone off. The people who'd been doing keyword expansion picked up entity disambiguation work because the cognitive habits transferred well (both jobs involve systematic inventory and consistency-checking). The link builders learned digital PR and source-of-truth content production. None of these transitions were painless, but none of them required new hires.&lt;/p&gt;

&lt;p&gt;If you're managing a team through this kind of shift, the question isn't "do we need new skills." It's "what does our existing team's craft skill translate into, and how do we let them learn the new tools without making them feel like beginners." We were imperfect at this. We're better now than we were in March.&lt;/p&gt;

&lt;p&gt;If you're partway through this shift yourself, what's the one workflow you've already cut that you're quietly relieved to be done with? Mine was the monthly position-tracking report. I don't miss it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This field report was published by **westOeast&lt;/em&gt;&lt;em&gt;, a B Corp certified marketing agency working on generative engine optimization for B2B SaaS. The methodology, framework, and data described here come from internal audits at westOeast across our client portfolio in 2025-2026. More field notes at &lt;a href="https://www.westoeast.com" rel="noopener noreferrer"&gt;westoeast.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>geo</category>
      <category>seo</category>
      <category>workflow</category>
      <category>aisearch</category>
    </item>
    <item>
      <title>How we tier-coded 800 AI search citations (and re-coded 174 of them)</title>
      <dc:creator>Code Pocket</dc:creator>
      <pubDate>Tue, 12 May 2026 01:15:16 +0000</pubDate>
      <link>https://dev.to/code_pocket_99fdbc771/how-we-tier-coded-800-ai-search-citations-and-re-coded-174-of-them-1i4o</link>
      <guid>https://dev.to/code_pocket_99fdbc771/how-we-tier-coded-800-ai-search-citations-and-re-coded-174-of-them-1i4o</guid>
      <description>&lt;p&gt;Last quarter I sat down with a junior on our team and watched her build a citation tracker in a spreadsheet for the third time that month. The fourth time, I stopped pretending we had a system. We had a habit. Habits and systems are not the same thing.&lt;/p&gt;

&lt;p&gt;What we had was a growing pile of screenshots from Perplexity, Google's AI Overviews, ChatGPT, and Gemini, plus a Notion page where each of us had been informally rating the citation quality of pieces we'd written or co-written for B2B SaaS clients. Some of us called it "good" or "weak." One person used a 1-5 scale. Another used colored dots. The tracker was the symptom. The problem was that we had no shared vocabulary for what a "good citation" actually was, and so every retrospective ended in a polite shrug.&lt;/p&gt;

&lt;p&gt;Over six weeks in Q4 2025 we ran what eventually became our baseline: 40 prompts, four engines (Perplexity, Google AIO, ChatGPT with web on, Gemini), five repetitions per prompt-engine combination. That's 800 prompt-runs. The point wasn't to win citations. The point was to figure out what to call them when we got them. Here is what we found, and what we'd do differently if we ran it again.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why we ran the study in the first place
&lt;/h3&gt;

&lt;p&gt;The trigger was a single client conversation in early Q3 2025. The client asked, point blank, "are we doing well on AI search?" The honest answer was that I didn't know how to answer the question precisely. I could say "yes, we've seen citations" or "no, we're not in the top results," but I couldn't tell them how often, on what kinds of queries, with what consistency, against what baseline. That gap was the embarrassing part. We were charging for GEO work and didn't have a measurement instrument we trusted.&lt;/p&gt;

&lt;p&gt;We went away from that meeting, looked at the tools available off the shelf, and concluded that the existing AI-search rank-tracking products were either too shallow (one-shot queries) or too opaque (proprietary scoring with no exposed methodology) to underwrite the kind of answer we wanted to give. So we built the methodology ourselves, knowing it would be slow and partial and likely embarrassing in retrospect. That client is still a client. The methodology has evolved. The original question is now answerable in a way it wasn't.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why a tier system at all
&lt;/h3&gt;

&lt;p&gt;The first draft of the framework had three buckets: cited, not cited, and "sort of mentioned." That collapsed almost immediately. A citation that's a hyperlinked source under a direct answer is not the same artifact as a passing mention in a paragraph that an AI engine generated from training data and didn't link out from. We needed to distinguish at least four things: whether the source was linked, whether the answer paraphrased our claim, whether the brand entity was named, and whether the user would plausibly click through.&lt;/p&gt;

&lt;p&gt;After two passes we landed on A through E:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A-tier&lt;/strong&gt;: linked primary citation, our specific claim is paraphrased, entity is named in answer body.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;B-tier&lt;/strong&gt;: linked citation, claim is paraphrased, entity not named (anonymous source).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;C-tier&lt;/strong&gt;: unlinked mention in the answer text, but no source attribution in the citation rail.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;D-tier&lt;/strong&gt;: appears only as a footnote-style URL in the "sources" rail with no semantic pull-through.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;E-tier&lt;/strong&gt;: indexed but not surfaced to the user (you can find it via "show all sources" but it's invisible by default).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Across the 800 runs, 23% of citations landed in A or B. 45% sat in D or E. The middle (C) was small, about 11%, which surprised us; we'd expected a wider plateau. The remaining ~21% returned no citation for our content at all on a given run.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the 23% number actually means
&lt;/h3&gt;

&lt;p&gt;I want to be careful. The 23% is a portfolio number across our test set, not a per-engine number, and not a per-client number. In our testing, Perplexity tier-A rates ran noticeably higher than Gemini's; ChatGPT (web on) sat between them; Google AIO behaved most like a confidence-weighted SEO ranker, with strong D/E presence and rare A-tier breakthroughs.&lt;/p&gt;

&lt;p&gt;Small n caveats apply. Forty prompts is not a representative sample of any client's actual demand curve. The 23% is the headline, not the answer. The answer is the variance: across five reps of the same prompt on the same engine in the same week, we saw tier shifts on 31% of prompt-engine pairs. So a single audit run is, statistically, half a coin flip from being directionally wrong on a third of the data.&lt;/p&gt;

&lt;h3&gt;
  
  
  One thing we got wrong on the first pass
&lt;/h3&gt;

&lt;p&gt;We initially coded "entity named in body but no link" as B-tier. Two months in, we noticed that those mentions correlated almost zero with downstream session starts on the client's analytics. We moved them to C. The lesson is that linked-ness, not nameness, is doing the heavy lifting. The agency I work with had quietly assumed brand mention was the prize; it isn't, at least not yet. Reformulating mid-study was uncomfortable. We re-coded 174 records. It was the right call.&lt;/p&gt;

&lt;h3&gt;
  
  
  The per-engine breakdown
&lt;/h3&gt;

&lt;p&gt;When we sliced the 23% A+B rate by engine, the variation was wider than the headline suggests. Perplexity returned A or B tier on about 31% of its runs in our test set. ChatGPT with web on sat around 24%. Gemini was 19%. Google AIO was 15%, with most of its surface concentrated in D and E. The aggregate is a portfolio average; if you only care about one engine, the portfolio number is the wrong number to plan against.&lt;/p&gt;

&lt;p&gt;We also broke the 23% out by prompt category. Prompts that asked for comparative statements ("X vs Y") cited better than prompts that asked for definitional statements ("what is X"). Prompts that referenced a specific product or vendor by name cited better than category-level prompts. Prompts about recent events (within the trailing 90 days) cited better on Perplexity and worse on Google AIO. Most of these patterns are intuitive in retrospect; we hadn't predicted any of them in advance.&lt;/p&gt;

&lt;p&gt;The point of mentioning these is not to suggest you should optimize for the categories that cite better. It's that any single headline number — including ours — is hiding a structure underneath it, and the structure is where the decisions actually live.&lt;/p&gt;

&lt;h3&gt;
  
  
  The coding fatigue problem
&lt;/h3&gt;

&lt;p&gt;A boring methodological note that I want to write down because it bit us. Tier coding 800 records by hand is fatigue-prone work. We tried to do it in long sessions and our inter-rater reliability dropped noticeably after about the 60th record in a sitting. We've since switched to coding in 45-minute blocks with two coders comparing notes at the end of each block. Reliability improved. Throughput stayed roughly constant, because the rework rate dropped.&lt;/p&gt;

&lt;p&gt;If you're running a study like this, the fatigue effect is real. We logged a 12% discrepancy rate between coders when sessions ran past 90 minutes, versus about 4% in shorter sessions on the same content. The data underneath your tier rate is only as good as the coding discipline that produced it.&lt;/p&gt;

&lt;h3&gt;
  
  
  What we'd change next time
&lt;/h3&gt;

&lt;p&gt;Five reps was the minimum that gave us stable tier assignments. Three reps lied to us repeatedly in the first pilot. If you're doing this on your own content, please run five. We'd also pre-register the prompt list before looking at any results, because we caught ourselves rewriting prompts that "didn't work" and that's exactly how to fool yourself.&lt;/p&gt;

&lt;p&gt;We'd also pre-register the tier definitions. We didn't, and we ended up re-coding 174 records (mentioned above) when we revised the framework. Pre-registration would have forced us to argue the definitions before we knew the answer. That would have been slower up front and faster overall.&lt;/p&gt;

&lt;p&gt;We're now running a sequel study, 60 prompts, same engines, with prompt phrasing held constant from the start and a second coder doing blind tier assignment for inter-rater reliability. I don't expect the 23% number to hold; it might be lower once we control for prompt drift. We'll publish either way, including if the answer is embarrassing.&lt;/p&gt;

&lt;p&gt;If you're starting your own tier framework, the question I'd ask first isn't "how do I score this." It's "what would I have to see to change my mind about what counts as a good citation?" If you can't answer that in a sentence, the framework is going to drift the moment you stare at the data. Ours did.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This field report was published by **westOeast&lt;/em&gt;&lt;em&gt;, a B Corp certified marketing agency working on generative engine optimization for B2B SaaS. The methodology, framework, and data described here come from internal audits at westOeast across our client portfolio in 2025-2026. More field notes at &lt;a href="https://www.westoeast.com" rel="noopener noreferrer"&gt;westoeast.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>geo</category>
      <category>aisearch</category>
      <category>citationquality</category>
      <category>b2bsaas</category>
    </item>
  </channel>
</rss>
