Nelson

Posted on Apr 16 • Originally published at github.com

We Open-Sourced Our Production Voice AI Stack (Rust Runtime, Sub-Second Latency)

#opensource #ai #voiceai #agents

TL;DR — We open-sourced Feros, a full Voice Agent OS you can self-host in one docker compose up. It has a Rust voice engine for sub-second latency, a Python control plane, a Next.js dashboard, and an AI builder that writes your agent for you. Apache 2.0.

⭐ If this looks useful, star us on GitHub — it's how others find the project.

The voice AI tax is real — and we got tired of paying it

If you've shipped a voice agent at any non-trivial scale, you've hit the wall:

Managed platforms (Vapi, Retell) are magical to start with and brutal to scale. Per-minute billing that looks like pocket change at 1,000 calls becomes a six-figure line item at 100,000. And if you're in healthcare, fintech, or anything with data residency requirements? "We handle it in our cloud" isn't good enough.
Low-level frameworks (Pipecat, LiveKit) give you the Lego bricks but not the house. You spend three weeks plumbing VAD → STT → LLM → TTS before writing a single line of actual agent logic. Then you maintain that plumbing forever.
Visual node builders (older-generation platforms) make you hand-wire every branch, intent, and call flow in a drag-and-drop UI. It gets unmaintainable the moment your agent needs to do anything non-trivial.

We built Feros to collapse all three layers into one self-hostable system that doesn't make you choose between speed, cost, and control.

What Feros actually is

Feros is a Voice Agent OS — a complete, production-ready stack that handles everything from the WebRTC/telephony layer to the agent builder UI.

Browser / Phone
       │
  voice-server   ← Rust: telephony gateway, WebSocket router
       │
  voice-engine   ← Rust: VAD → STT → LLM → TTS orchestration
       │
  studio-api     ← Python (FastAPI): agent config, sessions, evals
       │
  studio-web     ← Next.js: dashboard, AI builder, live call monitor

Every component is swappable. STT vendor going down? Change one config line. Want to use a local Whisper instance to eliminate STT costs entirely? There's an optional self-hosted inference stack included.

The voice engine is Rust — and yes, that matters

The hot path — VAD detection, streaming STT, LLM inference, TTS synthesis, audio mixing — runs entirely in a Tokio async runtime written in Rust.

Why Rust here specifically?

Latency predictability. GC pauses in the hot path are not a latency spike you can explain away. At 20ms audio frames, a 50ms GC pause is audible and destroys the "natural conversation" illusion. Rust gives you deterministic performance without the safetynet of a garbage collector.

Memory safety without overhead. A live call session manages multiple async streams simultaneously — inbound audio chunks, STT partial results, LLM streaming tokens, TTS audio segments, WebRTC pacing. Getting these wrong means memory corruption or deadlocks. Rust's ownership model enforces correct concurrency at compile time.

Real numbers we care about: Voice agents feel unnatural with high latency (the time from user stops speaking to agent starts responding). The Feros pipeline is deeply optimized for low latency and we're constantly working on improving it.

The AI builder is the part that might surprise you

Instead of dragging nodes in a canvas, you describe your agent in plain language. The AI builder reads your intent and autonomously provisions:

The system prompt
Tool definitions (CRM lookups, calendar booking, webhook calls, etc.)
Routing logic between conversation states

This isn't a gimmick — it's genuinely the fastest path from "I need a voice agent that books appointments and checks account status" to a working, testable agent. You still have full access to the underlying configuration and can edit anything the AI generated.

One command to run the whole stack

git clone https://github.com/ferosai/feros.git
cd feros
cp .env.example .env
docker compose up -d

Open http://localhost:3000. That's the full stack:

Service	URL
Studio Web	`http://localhost:3000`
Studio API	`http://localhost:8000`
Voice Server	`http://localhost:8300`

We publish pre-built multi-arch images so the default path doesn't require compiling Rust locally. If you need to build from source (e.g., you're modifying the engine):

docker compose -f docker-compose.yml -f docker-compose.source.yml up -d --build

The integrations layer: your secrets stay yours

Every third-party integration — CRMs, calendars, webhooks — goes through an encrypted credential vault. Secrets are encrypted at rest and decrypted only inside the runtime. They never hit external audit logs or managed cloud infrastructure in plaintext. This was a non-negotiable for our early enterprise users.

What we're building next

The roadmap is public and tracked in the repo:

Outbound calls — agent-initiated dialing with retry and scheduling
Dynamic Agent Variables — resolve runtime context at session start for personalized conversations
Gemini Live native audio — end-to-end multimodal backend (actively in progress)
Direct PSTN via SIP — eliminating the Twilio/Telnyx dependency entirely
Agent-to-agent evaluation — a tester agent calls your target agent over live audio to evaluate regressions
Evaluation replay — run historical transcripts against new agent versions

Why open source, and why Apache 2.0?

Because the voice AI infrastructure layer should not be a moat. It should be a foundation.

We've been on the receiving end of per-minute pricing that punished growth, "enterprise plans" that required a sales call before you could see a price, and APIs that broke in production with no recourse. We built the thing we wanted to exist.

Apache 2.0 means you can self-host it, build products on it, and modify it without legal friction.

Stack summary (for the skim readers)

Layer	Technology
Voice Engine	Rust / Tokio
Voice Server	Rust
Control Plane	Python / FastAPI
Dashboard	Next.js / TypeScript
Database	PostgreSQL
Inference (optional)	Whisper + Fish TTS on GPU
Protocol	Protobuf over WebSocket + WebRTC
License	Apache 2.0

Give it a spin

git clone https://github.com/ferosai/feros.git

We're actively building in public. If you run into anything, open an issue. If you have a provider or integration you need, open a discussion before implementing — we want to make sure the architecture stays coherent as the project grows.

If this is interesting to you: ⭐ on GitHub helps more people find it. That's the whole ask.

→ github.com/ferosai/feros

What voice AI problems are you dealing with right now? Cloud costs? Latency? Data residency? Drop it in the comments — we're actively informing the roadmap from real use cases.

DEV Community