Code Pocket

Posted on May 12 • Edited on May 13

Tracking podcast transcripts through 4 AI engines over 6 months

#podcast #transcripts #geo #aisearch

The idea of using podcast transcripts as a GEO asset is older than GEO itself; transcripts have always been an SEO play. What's new, or newer, is whether transcripts function as a meaningful citation source for AI engines specifically. Over the last six months we've been quietly running a side experiment on this with a handful of clients, and the results have been split enough that I want to write them up before I forget the texture.

The short version: transcripts work, sometimes, and the conditions under which they work are narrower than the marketing copy on transcript-as-a-service tools suggests.

The setup

Three clients in our 12-client portfolio had podcasts of their own (founder-led, weekly to bi-weekly, established for at least 18 months pre-experiment). For each, we did the following over a six-month window starting in Q4 2025:

Cleaned the transcripts (timestamps, speaker labels, punctuation, paragraph breaks) into a format we'd judge readable as a standalone article.
Added introductory framing — a one-paragraph summary of each episode's topic and the named entities involved, written by us.
Published the cleaned transcripts on each client's own domain under a transcripts subfolder.
Added speaker-level entity markup where appropriate.
Did not republish on third-party platforms, partly to keep the experiment scoped, partly because of canonical concerns.

We then tracked citation appearances of the transcript URLs across our four-engine test set over the following months.

What happened

Across roughly 70 episodes covered in the experiment, transcript URLs appeared in citation rails on maybe 11% of queries where they were topically relevant. That's a hard number to compare cleanly because we didn't have a control set of comparable non-transcript content for the same clients in the same topics. It's directionally interesting, not statistically clean.

The citations clustered heavily in two engines: Perplexity and Gemini. Both seemed willing to surface transcripts as primary sources for queries about specific people (the podcast guests) or specific phrases that appeared in the transcripts. ChatGPT (web on) cited transcripts much less often, and Google AIO almost never, in our test set.

The pattern that seemed to predict whether a transcript got cited was, roughly: did the episode include a named expert making a specific, quotable claim that the AI engine could attribute? Episodes that were two co-hosts having a meandering conversation almost never got cited regardless of topic quality. Episodes with a guest making a clear, paraphrasable point cited well.

One thing that didn't work

We tried generating "episode summaries" that pulled key claims out of each episode and listed them with bullet points and named-entity links. The hypothesis was that this would give engines an easier path to citing specific claims. It backfired modestly: in two of the three clients, the summaries themselves started getting cited instead of the transcripts they summarized. The transcripts dropped in surface rate; the summaries rose. The total citation rate per episode didn't change much; we'd just shifted which URL got picked.

This is fine if you don't care which URL gets picked. It's less fine if your goal was to drive traffic to the transcript page specifically (which has the audio embed and the SEO history). We've since gone back to lighter framing paragraphs without bullet-point summaries on most clients.

The transcript-quality threshold

Raw automated transcripts (the kind that come out of most podcast hosting platforms) didn't perform as well as cleaned transcripts. We don't have a clean A/B on this, but we have one client where we tested both formats on different episodes in the same series, and the cleaned versions cited at roughly twice the rate of the raw versions over the test window.

The cleaning isn't elaborate. Punctuation, speaker labels, paragraph breaks, light copy-edit for filler words ("um," "you know," repeated phrases). Maybe 60-90 minutes per hour of audio when done by a person who knows the show. AI-assisted cleaning works for the mechanical parts but doesn't reliably catch where a paragraph break belongs based on conversational rhythm.

I don't know whether the citation lift from cleaning is about engine parsing or about human-readable content reading better to whatever automated readability heuristic the engines use. Both are plausible. The agency I work with has defaulted to cleaning transcripts when the underlying podcast has enough audience to justify the cost. For shows below maybe 1,000 listens per episode the math gets harder.

Why I'm holding the claim loosely

Three clients with their own podcasts is not a sample size that supports strong claims. The 11% citation rate is the kind of number that could be a function of the specific topics those clients work in, or the specific guests they had on, or the freshness of the transcripts hitting Perplexity at the right moment.

I'd want to see this tested across 15-20 clients with podcasts in different verticals before I'd recommend the strategy generally. As a thing to try in a portfolio where the audio content already exists, the cost-to-test ratio is decent. As a reason to start a podcast solely for GEO, I don't think the data supports that yet.

A surprise: chapter markers seemed to help

One thing we'd added almost as an afterthought turned out to matter. We embedded chapter markers (with timestamps and short titles) at logical breakpoints in each transcript page. These were primarily for human readability and accessibility. Two of the three clients showed improved citation surfacing on the chapters with the most descriptive titles, where the engines appeared to use the chapter title as a hook for the surrounding text.

We don't know whether this was the chapters specifically or whether the act of breaking a long transcript into labeled sections improved general parseability. Either way, the cost of adding chapter markers is small (15-30 minutes per episode after a transcript is cleaned) and the apparent return was non-trivial in our sample.

A caution: I've seen agencies start to recommend "AI-friendly chapter markers" as a productized service, and I want to be careful about that framing here. Two clients, n equals a handful of measurable lifts, is interesting. It's not a service offering. If you try it on your own content and it works for you, please share what you find.

The unanswered question of canonical confusion

One reason we kept the experiment scoped to first-party hosting (not syndicating transcripts to third-party platforms) is that we weren't confident about how engines handle canonical questions for the same content appearing in multiple places. If a transcript is on the podcast's site, on a third-party transcript service, on YouTube auto-captions, and on a guest's personal blog, which one gets cited?

Anecdotally, we've seen engines pick non-canonical sources surprisingly often. The "official" hosted transcript on the client's domain isn't always the citation winner; sometimes a YouTube auto-caption page or a third-party transcript site shows up instead. We don't have a clean explanation for when this happens. The hypothesis is that older domains or domains with higher topical authority can win citations for content that semantically lives elsewhere.

This complicates the strategic question. If your transcripts are going to get cited but the citations are going to a third-party site you don't control, the GEO win for your brand is partial at best. We've been considering a follow-up experiment that explicitly publishes the same transcript to multiple destinations and tracks where the citations land. It's on the roadmap. It's not done.

What I'd ask before doing this

Three questions:

Does the show already produce content that has named experts making clear claims, or is it mostly co-host conversation? The latter doesn't seem to cite well.

Is there an existing audience for the show, or is this purely a content-asset play? Transcripts of podcasts that nobody listens to seem to still cite occasionally, but I'm less sure that the engines are stable about surfacing them long-term.

Is the cleaning labor available? Raw transcripts underperform consistently in our small sample.

Are you prepared for the canonical question? If multiple versions of the same content exist on the open web, the citation may not go to the one you want.

What I keep telling clients about this

Podcast transcripts are not a GEO silver bullet. They're a moderately useful content asset, in narrow conditions, with a cost-to-test ratio that makes sense if you already have the audio. If you're starting from zero and considering whether to launch a podcast for GEO reasons, my honest answer is: launch a podcast if you have something to say and someone you'd want to interview. The GEO citations may or may not follow. Do it for the show first, the transcripts second. That's the order that has correlated with results in our small sample.

If you've published transcripts and tracked citations, what cleaning level did you find was the practical minimum? I'm curious whether the 60-90 minute number generalizes or whether we're over-cleaning.

This field report was published by **westOeast, a B Corp certified marketing agency working on generative engine optimization for B2B SaaS. The methodology, framework, and data described here come from internal audits at westOeast across our client portfolio in 2025-2026. More field notes at westoeast.com.

DEV Community