I built an open-source SIP Phone with real-time AI Text-to-Speech to help a friend with speech difficulties 🎙️

#opensource #cpp #a11y #ai

Hey everyone,

I wanted to share a weekend project I've been working on called Dyna SIP Phone. I originally built this for a dear friend of mine who has speech difficulties, to help him communicate over standard phone calls without barriers. It turned out to be quite useful for him, so I decided to open-source it!

What is it?
It's a minimal SIP client written in C++ that integrates local, high-quality Neural Text-to-Speech (TTS) using ONNX Runtime. It allows the user to type text that is synthesized and streamed directly into the VoIP call in real-time.

Under the hood:

VoIP Engine: PJSIP.

TTS Engines: Piper and Kokoro (fully local and fast, rewritten in C++ with ONNX to ensure minimal latency).

Audio DSP: I added a custom post-processing chain (EQ, Pitch Shifter, Compressor, Telephone Filter, etc.) to customize the synthesized voice and make it sound natural over the phone line.

UI: Built using HTML/CSS/JS rendered through GTK WebView.

🎁 Bonus for developers:
If you don't care about SIP but want local TTS in your apps, the piper.h and kokoro.h classes are designed to be completely decoupled. You can easily extract them and drop them into your own C++ projects! Also, if anyone is looking for Italian voices, I've published some custom multi-speaker Piper fine-tunes on my website (linked in the repo).

Disclaimer: Please keep in mind that this is still very much in active development. It started as a quick weekend hack, so it's a very minimal implementation. But it does the job, and seeing it actually help my friend was a huge win.

If you are interested in VoIP, assistive tech, or C++ audio processing, I'd love for you to check out the repo! Any feedback or contribution is welcome.

GitHub Repo: dyna_sip_phone

DEV Community

I built an open-source SIP Phone with real-time AI Text-to-Speech to help a friend with speech difficulties 🎙️

Top comments (0)