SFU Architecture: How We Scale to 200 Participants per Server

#webrtc #architecture #rust #scalability

Originally published at v100.ai

At V100, we build AI video infrastructure entirely in Rust. 20 microservices. 0.01ms server processing. 220,000+ requests per second. Post-quantum encryption on every call.

This post is a deep dive into our architecture and the engineering decisions behind it.

Why Rust for Video Infrastructure?

Video infrastructure has unique constraints that make Rust the ideal choice:

Zero GC pauses — Real-time video processing can't tolerate garbage collection stops
Memory safety — Buffer overflows in media pipelines are a security nightmare
Performance — Our gateway processes requests in 10 microseconds, not 10 milliseconds
Small binaries — Our meeting signaling server is a 2MB binary serving WebRTC at scale

Our Stack

Component	Technology
Web Framework	Axum + Tokio
Database	PostgreSQL + TimescaleDB
Cache	Redis + Cachee (in-process)
Media	FFmpeg (sidecar)
Crypto	ML-KEM-768 + ML-DSA-65 (post-quantum)
Infra	Docker + AWS ECS Fargate

The Numbers

20 Rust microservices — gateway, AI, transcription, video, conferencing, billing, broadcasting
0.01ms server processing latency
220K+ RPS sustained throughput
263ns pipeline latency
938 tests with zero failures
40+ languages for real-time transcription
7 platforms for social publishing (YouTube, TikTok, Instagram, LinkedIn, X, Facebook, Vimeo)

Key Services

Gateway (Axum) — JWT auth, rate limiting, CSRF/SSRF protection
AI Orchestration — Claude/Gemini proxy, streaming, compliance
Transcription — Deepgram + Whisper, word-level timestamps, 40+ languages
Meeting Signaling — WebRTC SDP/ICE, DashMap concurrency, 2MB binary
v100-turn — Full broadcast platform: ABR, DRM, DVR, spatial audio, deepfake detection