<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nelson</title>
    <description>The latest articles on DEV Community by Nelson (@loopbreaker111).</description>
    <link>https://dev.to/loopbreaker111</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3881378%2F7f371656-16d1-425b-b787-f099e0e047df.jpeg</url>
      <title>DEV Community: Nelson</title>
      <link>https://dev.to/loopbreaker111</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/loopbreaker111"/>
    <language>en</language>
    <item>
      <title>We Open-Sourced Our Production Voice AI Stack (Rust Runtime, Sub-Second Latency)</title>
      <dc:creator>Nelson</dc:creator>
      <pubDate>Thu, 16 Apr 2026 00:32:02 +0000</pubDate>
      <link>https://dev.to/loopbreaker111/we-open-sourced-our-production-voice-ai-stack-rust-runtime-sub-second-latency-3gb9</link>
      <guid>https://dev.to/loopbreaker111/we-open-sourced-our-production-voice-ai-stack-rust-runtime-sub-second-latency-3gb9</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; — We open-sourced &lt;a href="https://github.com/ferosai/feros" rel="noopener noreferrer"&gt;Feros&lt;/a&gt;, a full Voice Agent OS you can self-host in one &lt;code&gt;docker compose up&lt;/code&gt;. It has a Rust voice engine for sub-second latency, a Python control plane, a Next.js dashboard, and an AI builder that writes your agent for you. Apache 2.0.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;⭐ If this looks useful, &lt;a href="https://github.com/ferosai/feros" rel="noopener noreferrer"&gt;star us on GitHub&lt;/a&gt; — it's how others find the project.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The voice AI tax is real — and we got tired of paying it
&lt;/h2&gt;

&lt;p&gt;If you've shipped a voice agent at any non-trivial scale, you've hit the wall:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Managed platforms&lt;/strong&gt; (Vapi, Retell) are magical to start with and brutal to scale. Per-minute billing that looks like pocket change at 1,000 calls becomes a six-figure line item at 100,000. And if you're in healthcare, fintech, or anything with data residency requirements? "We handle it in our cloud" isn't good enough.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Low-level frameworks&lt;/strong&gt; (Pipecat, LiveKit) give you the Lego bricks but not the house. You spend three weeks plumbing VAD → STT → LLM → TTS before writing a single line of actual agent logic. Then you maintain that plumbing forever.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Visual node builders&lt;/strong&gt; (older-generation platforms) make you hand-wire every branch, intent, and call flow in a drag-and-drop UI. It gets unmaintainable the moment your agent needs to do anything non-trivial.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We built Feros to collapse all three layers into one self-hostable system that doesn't make you choose between speed, cost, and control.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Feros actually is
&lt;/h2&gt;

&lt;p&gt;Feros is a &lt;strong&gt;Voice Agent OS&lt;/strong&gt; — a complete, production-ready stack that handles everything from the WebRTC/telephony layer to the agent builder UI.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Browser / Phone
       │
  voice-server   ← Rust: telephony gateway, WebSocket router
       │
  voice-engine   ← Rust: VAD → STT → LLM → TTS orchestration
       │
  studio-api     ← Python (FastAPI): agent config, sessions, evals
       │
  studio-web     ← Next.js: dashboard, AI builder, live call monitor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every component is &lt;strong&gt;swappable&lt;/strong&gt;. STT vendor going down? Change one config line. Want to use a local Whisper instance to eliminate STT costs entirely? There's an optional self-hosted inference stack included.&lt;/p&gt;




&lt;h2&gt;
  
  
  The voice engine is Rust — and yes, that matters
&lt;/h2&gt;

&lt;p&gt;The hot path — VAD detection, streaming STT, LLM inference, TTS synthesis, audio mixing — runs entirely in a Tokio async runtime written in Rust.&lt;/p&gt;

&lt;p&gt;Why Rust here specifically?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Latency predictability.&lt;/strong&gt; GC pauses in the hot path are not a latency spike you can explain away. At 20ms audio frames, a 50ms GC pause is audible and destroys the "natural conversation" illusion. Rust gives you deterministic performance without the safetynet of a garbage collector.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory safety without overhead.&lt;/strong&gt; A live call session manages multiple async streams simultaneously — inbound audio chunks, STT partial results, LLM streaming tokens, TTS audio segments, WebRTC pacing. Getting these wrong means memory corruption or deadlocks. Rust's ownership model enforces correct concurrency at compile time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real numbers we care about:&lt;/strong&gt; Voice agents feel unnatural with high latency (the time from user stops speaking to agent starts responding). The Feros pipeline is deeply optimized for low latency and we're constantly working on improving it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The AI builder is the part that might surprise you
&lt;/h2&gt;

&lt;p&gt;Instead of dragging nodes in a canvas, you describe your agent in plain language. The AI builder reads your intent and autonomously provisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The system prompt&lt;/li&gt;
&lt;li&gt;Tool definitions (CRM lookups, calendar booking, webhook calls, etc.)&lt;/li&gt;
&lt;li&gt;Routing logic between conversation states&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn't a gimmick — it's genuinely the fastest path from "I need a voice agent that books appointments and checks account status" to a working, testable agent. You still have full access to the underlying configuration and can edit anything the AI generated.&lt;/p&gt;




&lt;h2&gt;
  
  
  One command to run the whole stack
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/ferosai/feros.git
&lt;span class="nb"&gt;cd &lt;/span&gt;feros
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;code&gt;http://localhost:3000&lt;/code&gt;. That's the full stack:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;URL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Studio Web&lt;/td&gt;
&lt;td&gt;&lt;code&gt;http://localhost:3000&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Studio API&lt;/td&gt;
&lt;td&gt;&lt;code&gt;http://localhost:8000&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voice Server&lt;/td&gt;
&lt;td&gt;&lt;code&gt;http://localhost:8300&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We publish pre-built multi-arch images so the default path doesn't require compiling Rust locally. If you need to build from source (e.g., you're modifying the engine):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose &lt;span class="nt"&gt;-f&lt;/span&gt; docker-compose.yml &lt;span class="nt"&gt;-f&lt;/span&gt; docker-compose.source.yml up &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--build&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The integrations layer: your secrets stay yours
&lt;/h2&gt;

&lt;p&gt;Every third-party integration — CRMs, calendars, webhooks — goes through an encrypted credential vault. Secrets are encrypted at rest and decrypted only inside the runtime. They never hit external audit logs or managed cloud infrastructure in plaintext. This was a non-negotiable for our early enterprise users.&lt;/p&gt;




&lt;h2&gt;
  
  
  What we're building next
&lt;/h2&gt;

&lt;p&gt;The roadmap is public and tracked in the repo:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Outbound calls&lt;/strong&gt; — agent-initiated dialing with retry and scheduling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Agent Variables&lt;/strong&gt; — resolve runtime context at session start for personalized conversations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini Live native audio&lt;/strong&gt; — end-to-end multimodal backend (actively in progress)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Direct PSTN via SIP&lt;/strong&gt; — eliminating the Twilio/Telnyx dependency entirely&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent-to-agent evaluation&lt;/strong&gt; — a tester agent calls your target agent over live audio to evaluate regressions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluation replay&lt;/strong&gt; — run historical transcripts against new agent versions&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why open source, and why Apache 2.0?
&lt;/h2&gt;

&lt;p&gt;Because the voice AI infrastructure layer should not be a moat. It should be a foundation.&lt;/p&gt;

&lt;p&gt;We've been on the receiving end of per-minute pricing that punished growth, "enterprise plans" that required a sales call before you could see a price, and APIs that broke in production with no recourse. We built the thing we wanted to exist.&lt;/p&gt;

&lt;p&gt;Apache 2.0 means you can self-host it, build products on it, and modify it without legal friction.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stack summary (for the skim readers)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Voice Engine&lt;/td&gt;
&lt;td&gt;Rust / Tokio&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voice Server&lt;/td&gt;
&lt;td&gt;Rust&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Control Plane&lt;/td&gt;
&lt;td&gt;Python / FastAPI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dashboard&lt;/td&gt;
&lt;td&gt;Next.js / TypeScript&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Database&lt;/td&gt;
&lt;td&gt;PostgreSQL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inference (optional)&lt;/td&gt;
&lt;td&gt;Whisper + Fish TTS on GPU&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Protocol&lt;/td&gt;
&lt;td&gt;Protobuf over WebSocket + WebRTC&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;License&lt;/td&gt;
&lt;td&gt;Apache 2.0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Give it a spin
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/ferosai/feros.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We're actively building in public. If you run into anything, open an issue. If you have a provider or integration you need, open a discussion before implementing — we want to make sure the architecture stays coherent as the project grows.&lt;/p&gt;

&lt;p&gt;If this is interesting to you: ⭐ on GitHub helps more people find it. That's the whole ask.&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;&lt;a href="https://github.com/ferosai/feros" rel="noopener noreferrer"&gt;github.com/ferosai/feros&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What voice AI problems are you dealing with right now? Cloud costs? Latency? Data residency? Drop it in the comments — we're actively informing the roadmap from real use cases.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>ai</category>
      <category>voiceai</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
