<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Iam Suriyan</title>
    <description>The latest articles on DEV Community by Iam Suriyan (@iam_suriyan_b9078a5b3a553).</description>
    <link>https://dev.to/iam_suriyan_b9078a5b3a553</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4008954%2F906d2049-67f3-48be-956a-f25697489dc6.png</url>
      <title>DEV Community: Iam Suriyan</title>
      <link>https://dev.to/iam_suriyan_b9078a5b3a553</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/iam_suriyan_b9078a5b3a553"/>
    <language>en</language>
    <item>
      <title>Building a real-time voice-agent runtime in Rust: no GIL, one binary, 2,000 calls a box</title>
      <dc:creator>Iam Suriyan</dc:creator>
      <pubDate>Tue, 30 Jun 2026 04:28:07 +0000</pubDate>
      <link>https://dev.to/iam_suriyan_b9078a5b3a553/building-a-real-time-voice-agent-runtime-in-rust-no-gil-one-binary-2000-calls-a-box-12ko</link>
      <guid>https://dev.to/iam_suriyan_b9078a5b3a553/building-a-real-time-voice-agent-runtime-in-rust-no-gil-one-binary-2000-calls-a-box-12ko</guid>
      <description>&lt;p&gt;We built Flowcat, an Apache-2.0 native-Rust runtime for real-time voice AI agents&lt;br&gt;
(phone + WebRTC), as a clean-room counterpart to the architecture of pipecat&lt;br&gt;
(Python). This post is about the Rust-specific design decisions that let one&lt;br&gt;
process hold a flat ~0.6 ms p99 from 10 to 2,000 concurrent calls on a single box&lt;br&gt;
— and, honestly, about where that number does not mean what it looks like.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/AreevAI/flowcat" rel="noopener noreferrer"&gt;https://github.com/AreevAI/flowcat&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;THE PROBLEM&lt;/p&gt;

&lt;p&gt;A voice agent carries a call through a pipeline:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;transport in  -&amp;gt;  VAD / turn-taking  -&amp;gt;  STT  -&amp;gt;  LLM  -&amp;gt;  TTS  -&amp;gt;  transport out
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;(or a single speech-to-speech model in the middle). At 50 audio frames/sec per&lt;br&gt;
call, every stage touches every frame. Run a few hundred concurrent calls and the&lt;br&gt;
runtime's per-frame overhead — not the AI — becomes the thing that stalls.&lt;/p&gt;

&lt;p&gt;In Python that overhead is real: the GIL serializes frame routing onto one core,&lt;br&gt;
so you scale by running one process per core (~14 on a 16-vCPU box), each with its&lt;br&gt;
own memory baseline and connection pools. And GC pauses show up as tail latency.&lt;/p&gt;

&lt;p&gt;THE RUST DESIGN&lt;/p&gt;

&lt;p&gt;Three decisions did most of the work.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Each pipeline stage is its own tokio task behind a bounded channel.&lt;br&gt;
Backpressure is just the channel filling up — no manual flow control.&lt;/p&gt;

&lt;p&gt;// Each FrameProcessor owns the receive half of a bounded mpsc.&lt;br&gt;
   // A full channel naturally back-pressures the upstream stage.&lt;br&gt;
   let (tx, rx) = tokio::sync::mpsc::channel::(CAP);&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The hot audio frame is an Arc, so each hop moves a pointer, not PCM.&lt;br&gt;
A 20 ms mu-law frame is small, but copying it across 7 stages x 50 fps x N&lt;br&gt;
calls adds up fast. Cloning an Arc is a refcount bump.&lt;/p&gt;

&lt;p&gt;enum Frame {&lt;br&gt;
       Audio(Arc),   // clone = refcount bump, not a buffer copy&lt;br&gt;
       Text(...),&lt;br&gt;
       Control(...),&lt;br&gt;
       System(...),              // Start / Stop / Cancel / Interruption&lt;br&gt;
   }&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;System frames jump the queue. Start / Cancel / Interruption / End ride a&lt;br&gt;
separate priority channel and invoke start()/stop() lifecycle hooks, so an&lt;br&gt;
interruption (caller barges in) isn't stuck behind a backlog of audio frames&lt;br&gt;
in the normal queue.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No GC means no pause; no GIL means one process uses every core. Measured, one&lt;br&gt;
process scales 8.4x across 14 cores.&lt;/p&gt;

&lt;p&gt;THE BENCHMARK&lt;/p&gt;

&lt;p&gt;Identical Rust WebSocket + mu-law load generator, full-duplex echo, 50 fps/call,&lt;br&gt;
10 s/point, on one Azure Standard_FX16mds_v2 (16 vCPU). pipecat is given its fair&lt;br&gt;
multiprocess deployment (12 workers, SO_REUSEPORT, one per core) — not a single&lt;br&gt;
process. p99 round-trip latency:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concurrent calls&lt;/th&gt;
&lt;th&gt;Flowcat (1 process)&lt;/th&gt;
&lt;th&gt;pipecat (12 workers)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;0.39 ms&lt;/td&gt;
&lt;td&gt;1.13 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;0.51 ms&lt;/td&gt;
&lt;td&gt;33 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;250&lt;/td&gt;
&lt;td&gt;0.59 ms&lt;/td&gt;
&lt;td&gt;51 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;500&lt;/td&gt;
&lt;td&gt;0.51 ms&lt;/td&gt;
&lt;td&gt;843 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1000&lt;/td&gt;
&lt;td&gt;0.47 ms&lt;/td&gt;
&lt;td&gt;5,673 ms (77% throughput)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2000&lt;/td&gt;
&lt;td&gt;0.61 ms&lt;/td&gt;
&lt;td&gt;5,074 ms (41%, conns failing)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Per-frame framework routing measured ~0.20 us in Rust vs ~106 us in Python; RAM&lt;br&gt;
per idle session ~19.6 KB vs up to ~1 MB.&lt;/p&gt;

&lt;p&gt;The tail is the real story. Even at 10 calls, pipecat's p50/p90/p99 are all&lt;br&gt;
sub-millisecond — but its p99.9 is 102 ms and max is 163 ms. About 1 frame in&lt;br&gt;
1,000 eats a GC/GIL stall, which for real-time audio is an audible glitch.&lt;br&gt;
Multiprocess spreads that jitter across workers; it doesn't remove it, because&lt;br&gt;
it's intrinsic to each Python pipeline.&lt;/p&gt;

&lt;p&gt;WHAT THIS DOES NOT MEAN (the honest part)&lt;/p&gt;

&lt;p&gt;That 0.6 ms is runtime/framework overhead, not end-to-end conversational latency.&lt;br&gt;
What a caller hears is dominated by your STT/LLM/TTS providers (hundreds of ms) —&lt;br&gt;
Rust can't change that. The claim is narrower and more useful: the runtime itself&lt;br&gt;
never becomes the bottleneck or the source of a stall.&lt;/p&gt;

&lt;p&gt;Also: the ~525x per-frame framework-routing ratio compresses hard once real&lt;br&gt;
shared I/O (mu-law encode/decode, socket syscalls) is added, because that work is&lt;br&gt;
near-identical in both languages. The realistic end-to-end density win is single-&lt;br&gt;
to low-double-digit x, not 525x. The latency table above is the real end-to-end&lt;br&gt;
measurement, not the framework floor.&lt;/p&gt;

&lt;p&gt;TRY IT&lt;/p&gt;

&lt;p&gt;The whole benchmark kit is reproducible:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker compose -f bench/compose.yml up --build   # on a 16-vCPU VM
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Default build pulls zero provider/network deps — every provider and transport is&lt;br&gt;
a dep:-gated Cargo feature. And you don't have to write Rust to use it: run&lt;br&gt;
flowcat-server from a YAML config and talk to an agent in your browser.&lt;/p&gt;

&lt;p&gt;Repo, full percentile distributions, and methodology:&lt;br&gt;
&lt;a href="https://github.com/AreevAI/flowcat" rel="noopener noreferrer"&gt;https://github.com/AreevAI/flowcat&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Apache-2.0, pre-1.0, built in the open. Feedback and provider PRs welcome.&lt;/p&gt;

&lt;p&gt;Disclosure: this writeup was drafted with LLM assistance and edited by the Flowcat&lt;br&gt;
maintainers; the benchmark numbers are from the reproducible kit in the repo.&lt;br&gt;
pipecat is an independent open-source project used here as an architecture&lt;br&gt;
reference and benchmark baseline; Flowcat is not affiliated with or endorsed by&lt;br&gt;
Daily or the pipecat project.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>performance</category>
      <category>rust</category>
    </item>
  </channel>
</rss>
