<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: trycuebird</title>
    <description>The latest articles on DEV Community by trycuebird (@trycuebird_9a08bd77cbe639).</description>
    <link>https://dev.to/trycuebird_9a08bd77cbe639</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3971843%2F6a9d0c93-939f-4449-b3c0-715fd2e59ab4.png</url>
      <title>DEV Community: trycuebird</title>
      <link>https://dev.to/trycuebird_9a08bd77cbe639</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/trycuebird_9a08bd77cbe639"/>
    <language>en</language>
    <item>
      <title>The hardest part of my AI meeting app was the audio, not the AI</title>
      <dc:creator>trycuebird</dc:creator>
      <pubDate>Sun, 07 Jun 2026 01:42:34 +0000</pubDate>
      <link>https://dev.to/trycuebird_9a08bd77cbe639/the-hardest-part-of-my-ai-meeting-app-was-the-audio-not-the-ai-3320</link>
      <guid>https://dev.to/trycuebird_9a08bd77cbe639/the-hardest-part-of-my-ai-meeting-app-was-the-audio-not-the-ai-3320</guid>
      <description>&lt;p&gt;I'm building an AI copilot for live calls. I assumed the AI would be the hard part. It wasn't — it was getting clean audio off the machine. Here's what I wish I'd known.&lt;/p&gt;

&lt;p&gt;The constraint: no bot in the call&lt;br&gt;
Most meeting AIs join your call as a participant ("X's Notetaker has joined"). I didn't want that. The goal: capture both sides of a conversation with nothing in the meeting — just the local device.&lt;/p&gt;

&lt;p&gt;Two streams:&lt;/p&gt;

&lt;p&gt;Your mic (you) — easy.&lt;br&gt;
The system audio (everyone else) — the hard, platform-specific part.&lt;br&gt;
The lessons that cost me weeks&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Don't turn on system "voice processing."&lt;br&gt;
The OS has a comms/voice mode that looks helpful — echo cancellation and gain control for free. Enable it and it globally drops your speaker volume and stacks auto-gain on your mic. It's invisible on your machine — it only shows up when you're in a real call and the other person sounds quiet or pumped. Leave it off.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Don't stack echo cancellation / AGC.&lt;br&gt;
The meeting app (Zoom/Meet/Teams) already runs AEC + AGC + noise suppression. Add your own on the same stream and the two fight over the reference signal — quality drops. Process a separate copy, never what the meeting app hears.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run capture in a separate process.&lt;br&gt;
I moved native capture into a small helper that pipes raw PCM over IPC. Isolates the heavy native work, survives a crash (re-spawn instead of taking down the app), keeps the UI thread free.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;"Ready" means frames are flowing — not a status flag.&lt;br&gt;
My first version trusted a one-shot "capture started" signal. It lied: after sleep/wake or a slow cold start, the flag fired but no audio came. Ground truth = actual frames arriving. Gate on liveness, not a boolean.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Keep a fallback path — but tear it down when native wakes.&lt;br&gt;
I run a browser-based fallback when the native helper isn't ready. The bug: leaving it running alongside native → double-captured audio. Make the switch reversible, and kill the fallback fully once native frames arrive.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Normalize early.&lt;br&gt;
Downsample to 16 kHz mono before transcription, keep the two parties on separate channels so "them" and "you" never blur in the transcript. Cheap at capture time, painful later.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Takeaway&lt;br&gt;
The AI/LLM layer had the most tutorials and fewest surprises. The audio layer — OS quirks, processing that stacks invisibly, startup races — was where every real bug lived. Building anything that listens to a call? Budget your time accordingly.&lt;/p&gt;

&lt;p&gt;Happy to go deeper in the comments — especially how others handle AEC/AGC stacking cross-platform.&lt;/p&gt;

&lt;p&gt;(Building this as TryCuebird — a real-time interview copilot. The capture lessons above are the generalizable bits.)&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>electron</category>
      <category>career</category>
    </item>
  </channel>
</rss>
