<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Paul Reaney</title>
    <description>The latest articles on DEV Community by Paul Reaney (@paul_reaney_c6c80b43a48b2).</description>
    <link>https://dev.to/paul_reaney_c6c80b43a48b2</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3974537%2F1379e3e9-83eb-49dd-8f20-990ea36c656e.png</url>
      <title>DEV Community: Paul Reaney</title>
      <link>https://dev.to/paul_reaney_c6c80b43a48b2</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/paul_reaney_c6c80b43a48b2"/>
    <language>en</language>
    <item>
      <title>The 'John Smith' problem: detecting podcast guest appearances without false positives</title>
      <dc:creator>Paul Reaney</dc:creator>
      <pubDate>Mon, 08 Jun 2026 17:13:30 +0000</pubDate>
      <link>https://dev.to/paul_reaney_c6c80b43a48b2/the-john-smith-problem-detecting-podcast-guest-appearances-without-false-positives-125p</link>
      <guid>https://dev.to/paul_reaney_c6c80b43a48b2/the-john-smith-problem-detecting-podcast-guest-appearances-without-false-positives-125p</guid>
      <description>&lt;p&gt;I listen to podcasts because of people, not shows. When a researcher or founder I like goes on someone's podcast, I want that one episode — but I don't want to subscribe to all 400 episodes of every show they might ever appear on.&lt;/p&gt;

&lt;p&gt;There's no button for that anywhere. So I built one: &lt;a href="https://guestvine.fm" rel="noopener noreferrer"&gt;GuestVine&lt;/a&gt;. You follow people; whenever one of them shows up as a guest on any podcast, that single episode lands in a custom RSS feed you subscribe to once, in whatever player you already use.&lt;/p&gt;

&lt;p&gt;The fun part wasn't the web app — it was the detection. "Did this person appear as a guest on this episode?" sounds trivial and absolutely is not. Here's how I built it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shape of the system
&lt;/h2&gt;

&lt;p&gt;No new player, no re-hosting audio. The whole thing is RSS in, RSS out:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Podcast Index] --&amp;gt; [Detection Pipeline] --&amp;gt; [Postgres] --&amp;gt; [Feed Service] --&amp;gt; your RSS URL
                                                  ^
                                         [Control Panel] &amp;lt;--&amp;gt; [you]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The feed items we emit point at the &lt;em&gt;original publisher's&lt;/em&gt; audio file. You can play episodes right there — inline on the site, or in whatever podcast app you subscribe the feed into — but we never &lt;strong&gt;re-host&lt;/strong&gt; the audio: every enclosure is the publisher's own file, served from their CDN. We just decide what goes in the feed. Which means everything hinges on one question being answered correctly, at scale, with no human in the loop.&lt;/p&gt;

&lt;h2&gt;
  
  
  The actual hard problem: "did they appear, or were they just mentioned?"
&lt;/h2&gt;

&lt;p&gt;Say you follow &lt;strong&gt;John Smith&lt;/strong&gt;. I pull candidate episodes from &lt;a href="https://podcastindex.org/" rel="noopener noreferrer"&gt;Podcast Index&lt;/a&gt; and now have to classify each one. The failure modes are everywhere:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;His name is in the title because he's the guest. ✅&lt;/li&gt;
&lt;li&gt;His name is in the description because the host &lt;em&gt;mentions&lt;/em&gt; him in passing. ❌&lt;/li&gt;
&lt;li&gt;His name is in the title of an episode about a &lt;em&gt;different&lt;/em&gt; John Smith. ❌&lt;/li&gt;
&lt;li&gt;The episode has a structured &lt;code&gt;&amp;lt;podcast:person&amp;gt;&lt;/code&gt; tag naming him as guest. ✅&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A naive substring match delivers garbage. So detection is three layers: &lt;strong&gt;match → score → verify.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1 — ranked match signals
&lt;/h3&gt;

&lt;p&gt;Not all evidence is equal. I match in priority order and record &lt;em&gt;which&lt;/em&gt; signal fired:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;MatchSignal&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;person_tag&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;          &lt;span class="c1"&gt;// &amp;lt;podcast:person role="guest"&amp;gt; — structured, strongest&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;title_guest&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;         &lt;span class="c1"&gt;// full name in TITLE + a guest cue ("with", "feat.")&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;title_plain&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;         &lt;span class="c1"&gt;// full name in TITLE, no cue&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;description_guest&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;   &lt;span class="c1"&gt;// full name in DESCRIPTION + guest cue&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;description_plain&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// full name in DESCRIPTION, no cue (weakest)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The gold standard is the &lt;a href="https://github.com/Podcastindex-org/podcast-namespace/blob/main/docs/1.0.md#person" rel="noopener noreferrer"&gt;&lt;code&gt;&amp;lt;podcast:person&amp;gt;&lt;/code&gt; tag&lt;/a&gt; from the podcast namespace — structured metadata where a publisher explicitly says "this person was a guest." When it's present, the guesswork disappears. It usually isn't present, so I fall back to text, and lean on "guest cue" words — &lt;em&gt;with, featuring, ft, joins, sits down with, in conversation with&lt;/em&gt; — to separate a guest from a name-drop.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2 — scoring, and the "John Smith" tax
&lt;/h3&gt;

&lt;p&gt;Each signal has a base confidence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;SIGNAL_SCORE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;MatchSignal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;person_tag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.98&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;title_guest&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;title_plain&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;description_guest&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.55&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;description_plain&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then the part I'm fondest of. A name made of two extremely common tokens — "John Smith," "Mike Jones" — is &lt;em&gt;far&lt;/em&gt; more likely to be a coincidental match than "Lex Fridman" is. So common names pay a tax:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;commonNamePenalty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;+/&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Boolean&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;commonCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;COMMON_TOKENS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;commonCount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;// "john smith" — heavy damp&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;commonCount&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.08&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// "john fridman" — light damp&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Crucially, the penalty is &lt;strong&gt;exempt for &lt;code&gt;person_tag&lt;/code&gt; matches&lt;/strong&gt; — if a publisher structurally tagged the guest, I trust it regardless of how common the name is. The penalty only applies to the fuzzy text signals where coincidence is actually possible.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3 — verify, and "start strict"
&lt;/h3&gt;

&lt;p&gt;Score collapses to three tiers, and the tier decides the &lt;em&gt;action&lt;/em&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;tier&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Tier&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;tier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;A&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;       &lt;span class="c1"&gt;// auto-deliver&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;tier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;B&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// hold for verification&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="nx"&gt;tier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;C&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;                    &lt;span class="c1"&gt;// drop, silently&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;A&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;structured tag, or titular guest context&lt;/td&gt;
&lt;td&gt;auto-deliver&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;name present but ambiguous&lt;/td&gt;
&lt;td&gt;hold; verify before delivering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;C&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;passing mention / low signal&lt;/td&gt;
&lt;td&gt;drop&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The product decision baked in here: &lt;strong&gt;start strict.&lt;/strong&gt; Only Tier A auto-delivers. A missed appearance is invisible — you just never knew it existed. A &lt;em&gt;wrong&lt;/em&gt; appearance is loud and corrosive: it teaches you the feed is junk, and you unsubscribe. For a trust product, precision beats recall every time. I'd rather under-deliver and stay credible.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Tier-B escape hatch: an LLM as a tie-breaker
&lt;/h3&gt;

&lt;p&gt;Tier B is the interesting middle — real signal, real ambiguity. Rather than drop it, I optionally hand it to an LLM (Claude) with the episode metadata and the person's disambiguating context, and ask one narrow question: &lt;em&gt;is this plausibly this specific person, as a guest?&lt;/em&gt; If it promotes the match, it ships; otherwise it stays held.&lt;/p&gt;

&lt;p&gt;The key restraint: &lt;strong&gt;the LLM is a tie-breaker, not the pipeline.&lt;/strong&gt; It never sees Tier A (no need) or Tier C (not worth the tokens). It only adjudicates the genuinely ambiguous middle band. That keeps cost bounded and keeps the deterministic scoring in charge of the easy 90%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Things that bit me
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Unspecified &lt;code&gt;&amp;lt;podcast:person&amp;gt;&lt;/code&gt; role defaults to "host," not "guest."&lt;/strong&gt; Per the spec, a missing role means host. Get this backwards and you deliver every host as if they were a guest — a flood of false positives from the &lt;em&gt;highest-trust&lt;/em&gt; signal. Brutal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Players cache RSS aggressively.&lt;/strong&gt; "Why isn't my new episode showing up" was almost always the player, not me. Worth knowing before you debug your own feed generator for an hour.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The whole thing is testable without the network.&lt;/strong&gt; Match and score are pure functions over normalized episode structs, so the test suite runs against recorded fixtures — no API key, no flakiness. The detection logic above is all covered by plain Vitest unit tests, which made tuning the penalties safe.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The stack, briefly
&lt;/h2&gt;

&lt;p&gt;Next.js (App Router) for the control panel, API, and RSS serving · Postgres + Prisma for people/feeds/episodes/appearances and the fan-out · passwordless auth (magic link + OTP in one email) · the detection worker above on a cron · Claude for the Tier-B verifier · Vitest for the matching/scoring/feed logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;That precision-first detection is the core of &lt;a href="https://guestvine.fm" rel="noopener noreferrer"&gt;GuestVine&lt;/a&gt;&lt;br&gt;
  — follow people, not shows, and their guest appearances land in whatever&lt;br&gt;
  podcast app you already use. Free for a few follows.&lt;/p&gt;

&lt;p&gt;If you try it, the one piece of feedback I'd love: is getting your feed into&lt;br&gt;
  your podcast app smooth enough? That's the step I'm least sure about. Follow some people, grab your feed URL, paste it into any podcast app once — guest appearances arrive on their own. Play them inline or in your player; either way the audio streams from the original publisher, never re-hosted. There's a free tier.&lt;/p&gt;

&lt;p&gt;I'm happy to go deeper on any layer — the namespace parsing, the scoring tuning, or how the RSS fan-out works across multiple feeds per user. Ask in the comments.&lt;/p&gt;

</description>
      <category>showdev</category>
      <category>webdev</category>
      <category>saas</category>
      <category>indiehackers</category>
    </item>
  </channel>
</rss>
