shyn

Posted on Mar 29

I Built a Private Voice Chat App Because I Was Done Giving Discord My Conversations

#opensource #selfhosted #webrtc #privacy

description: Squawk is a self-hosted, open source voice and text chat app for gaming groups. Here's why I built it and what I learned along the way.

---

I play games with a small group of friends. We've been using Discord for years. It works fine — but at some point I started thinking about what "works fine" actually means.

Every conversation we have passes through Discord's servers. Every voice session, every message, every "we're planning this at this time" — all of it sitting in a database I have no visibility into. For most people that's a totally acceptable trade-off. For me it started to feel like an unnecessary one.

So I built something.

What is Squawk?

Squawk is a self-hosted voice and text chat app built for small private groups. You run it on your own machine. Your friends connect through Tailscale — a zero-config private VPN. Nobody else can reach the server. No accounts needed, no data leaving your network.

The core features:

🎤 Peer-to-peer voice chat via WebRTC (audio never touches a server)
💬 Real-time text chat with typing indicators
👑 Channel ownership and permissions (kick, rename, delete)
🔔 Sound notifications for joins, leaves, messages
📱 Mobile-friendly with a voice/chat tab switcher
🐳 Docker support — two commands to get running

You can find it here: github.com/shynsec/squawk

Why build it instead of just using something else?

Mumble exists. TeamSpeak exists. I know.

But I wanted something that felt modern — a clean UI, text chat alongside voice, mobile support — without the overhead of running a full Mumble server and without sending my friends through a sign-up flow.

I also wanted to actually build something. There's a difference between knowing how WebRTC works and having shipped something that uses it. This project closed that gap for me.

What I learned

WebRTC is surprisingly approachable, until it isn't

The basics of WebRTC — creating a peer connection, exchanging offers and answers, adding ICE candidates — clicked pretty quickly. The browser APIs are well documented and the happy path is straightforward.

Where it gets interesting is the signaling layer. WebRTC handles the audio transport but it doesn't handle how peers find each other. You need a signaling server to broker that initial handshake. I used Socket.io for this, which made it clean to implement but also introduced an interesting security question: what stops a client from relaying WebRTC signals to peers in a completely different room?

The answer is nothing, by default. You have to explicitly enforce it. I ended up checking that both the sender and recipient are in the same room before relaying any offer, answer, or ICE candidate. Simple fix, but easy to miss.

Security on a "private" app still matters

It's tempting to think that because Squawk runs behind a private VPN, you don't need to worry much about security. That thinking is wrong for a few reasons.

First, anyone on your Tailscale tailnet can connect — and you might invite people you don't fully trust. Second, if you ever open the app to a wider audience, you want the security foundations already in place. Third, good habits are good habits regardless of the threat model.

Things I ended up implementing that I hadn't originally planned:

Rate limiting per socket connection (30 events per 5 seconds)
Input sanitisation on all usernames, channel names, and messages
Prototype pollution protection using Object.create(null) for state objects
Strict CORS limited to localhost and Tailscale IP ranges
Security headers (CSP, X-Frame-Options, Referrer-Policy)
Non-root Docker user

None of these are complicated individually. But thinking through all the surfaces that need covering took longer than I expected.

Ownership is harder than it sounds

The permissions system in Squawk is simple: whoever creates a channel owns it. They can kick users, rename the channel, delete it. But even that simple model has edge cases.

What happens when the owner leaves? Ownership has to transfer to someone. What if the owner renames themselves mid-session? The ownership record tied to their old name breaks. What if two people have the same display name — does one accidentally inherit the other's permissions?

These are the kinds of details that only surface when you actually try to use the thing with real people. Designing permissions on paper is one thing. Watching them break in practice teaches you something different.

Ship it before it's perfect

The first version of Squawk was a single HTML file and about 100 lines of server code. It worked. It was ugly. It did the one thing I needed it to do.

Every feature I added after that — text chat, typing indicators, mobile layout, Docker support, security hardening, the permissions system — came from actually using it and noticing what was missing. If I'd tried to design all of that upfront I'd probably still be planning.

What's next

Squawk is open source under MIT. If you want to self-host your own private voice chat, the setup is genuinely quick — especially with Docker:

git clone https://github.com/shynsec/squawk.git
cd squawk
# Edit Caddyfile with your IP or domain
docker compose up -d

Things I'm thinking about adding next:

User avatars
Push-to-talk mode
Channel passwords
Persistent message history

If any of that sounds interesting, or you have ideas, the repo is open. Issues and pull requests welcome.

Built by shynsec

DEV Community