<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Flo</title>
    <description>The latest articles on DEV Community by Flo (@flo152121063061).</description>
    <link>https://dev.to/flo152121063061</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3707335%2F135c27ec-9da0-44a7-bada-c4e8430fe475.png</url>
      <title>DEV Community: Flo</title>
      <link>https://dev.to/flo152121063061</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/flo152121063061"/>
    <language>en</language>
    <item>
      <title>I built a real-time audio pipeline from the browser to my server. Here's what actually works.</title>
      <dc:creator>Flo</dc:creator>
      <pubDate>Thu, 26 Feb 2026 22:41:49 +0000</pubDate>
      <link>https://dev.to/flo152121063061/i-built-a-real-time-audio-pipeline-from-the-browser-to-my-server-heres-what-actually-works-5465</link>
      <guid>https://dev.to/flo152121063061/i-built-a-real-time-audio-pipeline-from-the-browser-to-my-server-heres-what-actually-works-5465</guid>
      <description>&lt;p&gt;Getting audio from a browser to a server in real-time sounds like a two-line solution. It isn't.&lt;/p&gt;

&lt;p&gt;I built this pipeline for &lt;a href="https://www.livesuggest.com" rel="noopener noreferrer"&gt;LiveSuggest&lt;/a&gt;, an AI assistant that listens to meetings and gives suggestions as the conversation happens. That means streaming audio continuously, with as little delay as possible, across a WebSocket connection that can drop at any time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pipeline
&lt;/h2&gt;

&lt;p&gt;Here's the full chain:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Capture audio with &lt;code&gt;getUserMedia&lt;/code&gt; (mic) or &lt;code&gt;getDisplayMedia&lt;/code&gt; (tab audio)&lt;/li&gt;
&lt;li&gt;Feed it into a &lt;code&gt;MediaRecorder&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Slice it into chunks every N seconds&lt;/li&gt;
&lt;li&gt;Encode each chunk to base64&lt;/li&gt;
&lt;li&gt;Send it over WebSocket to the server&lt;/li&gt;
&lt;li&gt;Server decodes and sends to a transcription API&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every step has a gotcha.&lt;/p&gt;

&lt;h2&gt;
  
  
  MediaRecorder is great until it isn't
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;MediaRecorder&lt;/code&gt; handles encoding for you. I use &lt;code&gt;audio/webm;codecs=opus&lt;/code&gt; because it's widely supported and compresses well.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;mediaRecorder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;MediaRecorder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;mimeType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;audio/webm;codecs=opus&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problem: you don't control the chunk boundaries. &lt;code&gt;ondataavailable&lt;/code&gt; fires when the browser feels like it, not when you need it. If you call &lt;code&gt;mediaRecorder.stop()&lt;/code&gt; and &lt;code&gt;start()&lt;/code&gt; to force a new chunk, you get a new WebM header each time. That's fine, but the chunks aren't standalone files you can just concatenate.&lt;/p&gt;

&lt;p&gt;I settled on 10-second segments. Short enough for responsive transcription, long enough for the transcription API to have decent context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Base64 is wasteful but practical
&lt;/h2&gt;

&lt;p&gt;Binary WebSocket frames would be more efficient. But base64 over JSON keeps the payload inspectable, works with Socket.io out of the box, and makes debugging way easier.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;FileReader&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nx"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;readAsDataURL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;blob&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onloadend&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;base64&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="nx"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;emit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;audio-chunk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;base64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;webm&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The 33% size overhead hasn't been an issue in practice. A 10-second Opus chunk is tiny.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mixing two audio sources
&lt;/h2&gt;

&lt;p&gt;If you want both mic and system audio (from a browser tab), you need to mix them. The Web Audio API makes this possible but unintuitive:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;audioContext&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AudioContext&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;destination&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;audioContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createMediaStreamDestination&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;micSource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;audioContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createMediaStreamSource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;micStream&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tabSource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;audioContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createMediaStreamSource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tabStream&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;micSource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;destination&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;tabSource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;destination&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// destination.stream is your mixed stream&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The resulting stream goes into &lt;code&gt;MediaRecorder&lt;/code&gt;. Both sides of the conversation end up in one stream. It works better than you'd expect.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned about reliability
&lt;/h2&gt;

&lt;p&gt;The stream can die at any time. Chrome's "Stop sharing" button kills &lt;code&gt;getDisplayMedia&lt;/code&gt; streams instantly. Listening for the &lt;code&gt;ended&lt;/code&gt; event on every track is not optional.&lt;/p&gt;

&lt;p&gt;Rate limiting saved me from a nasty bug. I do sliding-window rate limiting in Redis: 60 chunks per minute per session. Without it, a buggy client can silently flood the transcription API for hours.&lt;/p&gt;

&lt;p&gt;Small chunks are almost always noise. Buffers under 2KB get filtered before hitting the API. Same for transcriptions under 4 words — silence, breathing, keyboard sounds. The transcription model isn't cheap, and garbage in means garbage out regardless.&lt;/p&gt;

&lt;p&gt;Reconnection is non-trivial. WebSocket drops happen. I use exponential backoff with jitter, and the server restores session state from Redis when a client reconnects to a different instance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Was it worth building from scratch?
&lt;/h2&gt;

&lt;p&gt;I considered third-party services that handle the whole pipeline. But owning the audio layer means controlling latency, cost, and what data leaves the app. For a product where those three things matter, it was worth the complexity.&lt;/p&gt;

&lt;p&gt;The pipeline now handles thousands of audio chunks per day. Not glamorous code, but it's the plumbing everything else depends on.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>javascript</category>
      <category>showdev</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I tried to capture system audio in the browser. Here's what I learned.</title>
      <dc:creator>Flo</dc:creator>
      <pubDate>Mon, 12 Jan 2026 16:12:56 +0000</pubDate>
      <link>https://dev.to/flo152121063061/i-tried-to-capture-system-audio-in-the-browser-heres-what-i-learned-1f99</link>
      <guid>https://dev.to/flo152121063061/i-tried-to-capture-system-audio-in-the-browser-heres-what-i-learned-1f99</guid>
      <description>&lt;p&gt;I'm building LiveSuggest, a real-time AI assistant that listens to your meetings and gives you suggestions as you talk. Simple idea, right?&lt;/p&gt;

&lt;p&gt;Turns out, capturing audio from a browser tab is... complicated.&lt;/p&gt;

&lt;h2&gt;
  
  
  The good news
&lt;/h2&gt;

&lt;p&gt;Chrome and Edge support it. You use &lt;code&gt;getDisplayMedia&lt;/code&gt;, the same API for screen sharing, but with an audio option:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;navigator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;mediaDevices&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getDisplayMedia&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;video&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;systemAudio&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;include&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The user picks a tab to share, checks "Share tab audio", and boom — you get the audio stream. Works great for Zoom, Teams, Meet, whatever runs in a browser tab.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bad news
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Firefox?&lt;/strong&gt; Implements &lt;code&gt;getDisplayMedia&lt;/code&gt; but completely ignores the audio part. No error, no warning. You just... don't get audio.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Safari?&lt;/strong&gt; Same story. The API exists, audio doesn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mobile browsers?&lt;/strong&gt; None of them support it. iOS, Android, doesn't matter.&lt;/p&gt;

&lt;p&gt;So if you're building something that needs system audio, you're looking at Chrome/Edge desktop only. That's maybe 60-65% of your potential users.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I ended up doing
&lt;/h2&gt;

&lt;p&gt;I detect the browser upfront and show a clear message:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Firefox doesn't support system audio capture for meetings. Use Chrome or Edge for this feature. Microphone capture is still available."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;No tricks, no workarounds. Just honesty. Users appreciate knowing why something doesn't work rather than wondering if they did something wrong.&lt;/p&gt;

&lt;p&gt;For Firefox/Safari users, the app falls back to microphone-only mode. It's not ideal for capturing both sides of a conversation, but it's better than nothing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The annoying details
&lt;/h2&gt;

&lt;p&gt;A few things that wasted my time so they don't waste yours:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You have to request video.&lt;/strong&gt; Even if you only want audio. &lt;code&gt;video: true&lt;/code&gt; is mandatory. I immediately stop the video track after getting the stream, but you can't skip it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The "Share tab audio" checkbox is easy to miss.&lt;/strong&gt; Chrome shows it in the sharing dialog, but it's not checked by default. If your user doesn't check it, you get a stream with zero audio tracks. No error, just silence.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The stream can die anytime.&lt;/strong&gt; User clicks "Stop sharing" in Chrome's toolbar? Your stream ends. You need to listen for the &lt;code&gt;ended&lt;/code&gt; event and handle it gracefully.&lt;/p&gt;

&lt;h2&gt;
  
  
  Was it worth it?
&lt;/h2&gt;

&lt;p&gt;Absolutely. For the browsers that support it, capturing tab audio is a game-changer. You can build things that weren't possible before — meeting assistants, live translators, accessibility tools.&lt;/p&gt;

&lt;p&gt;Just go in knowing that you'll spend time on browser detection and fallbacks. That's the web in 2025.&lt;/p&gt;

&lt;p&gt;If you're curious about what I built, check out &lt;a href="https://livesuggest.ai" rel="noopener noreferrer"&gt;LiveSuggest&lt;/a&gt;. And if you've found better workarounds for Firefox/Safari, I'd love to hear about them in the comments.&lt;/p&gt;

</description>
      <category>api</category>
      <category>javascript</category>
      <category>learning</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
