Build and run real-time media pipelines, Speech to Text, Voice Agents, live audio processing

John — Mon, 16 Feb 2026 21:21:32 +0000

With StreamKit.dev one can build and run real-time media pipelines on your own infrastructure. Speech-to-text, voice agents, live audio processing — composable, observable, self-hosted. It is totally open source. Full references and description at https://streamkit.dev/

Who is this for?
StreamKit is built for developers who need to process real-time media — whether you’re building voice features for an app, prototyping an AI audio pipeline, or self-hosting alternatives to cloud speech APIs.

What you can build
Live transcription — Ingest audio via MoQ, run Whisper or SenseVoice STT, stream transcription updates to clients
Voice agents — TTS-powered bots using Kokoro, Piper, or Matcha that respond to audio input
Real-time translation — Bilingual streams with live subtitles using NLLB or Helsinki models
Audio processing — Mixing, gain control, format conversion, encoding/decoding pipelines
Content analysis — VAD for speech detection, keyword spotting, or custom safety filters.

one can try the powerful engine at https://demo.streamkit.dev/

DEV Community: John

Build and run real-time media pipelines, Speech to Text, Voice Agents, live audio processing