Have you ever stopped to think about what happens to the metadata of your voice calls?
We live in an era where end-to-end encrypted apps like WhatsApp, Signal, or Telegram are the standard. While they encrypt the content of our conversations, they still rely on central servers to connect us. This means the server—and whoever controls it—always knows who you are calling, when you called them, and for how long you spoke.
I wanted to challenge this dependency. I asked myself: Is it possible to build a voice communication tool that strips away the server completely? No databases, no WebSockets, no accounts, and absolutely zero logs.
The result is a fully functional, browser-based voice chat that operates purely peer-to-peer using a single HTML file. Here is the story of why I built it, what I aimed to achieve, and how it actually works under the hood.
The "Why": Breaking Free from Metadata
In traditional VoIP applications, the architecture dictates that User A must talk to a Server to find User B. The server acts as the middleman. Even if the server cannot hear the audio, it holds the directory.
For maximum privacy, eliminating the middleman is the only way to ensure zero metadata collection. If there is no server to handle the connection, there are no logs. If there are no logs, there is no history of the interaction ever taking place. I wanted to build a tool for journalists, privacy advocates, or anyone who just wants a truly ghosted conversation.
The Goal: Zero Infrastructure
The objective was strict:
- No Backend: No Node.js, no WebSockets, no signaling server.
- No Installation: It had to run natively in any modern web browser.
- High Security: The connection setup had to be encrypted locally so that even if the connection link was intercepted, it would be useless.
I decided to pack everything into a single, standalone index.html file. You can run it from your local hard drive, and it will still work perfectly.
How It Works: The "Manual Signaling" Architecture
To understand how this works, we have to look at WebRTC (Web Real-Time Communication). WebRTC is amazing because once a connection is established, the audio streams directly between devices, fully encrypted.
However, WebRTC has a famous chicken-and-egg problem called Signaling. Before two devices can connect directly, they need to exchange network details (IP addresses, ports, media formats) called SDP (Session Description Protocol). Normally, developers build a central server just to pass this SDP data back and forth.
To achieve "Zero Infrastructure," I replaced the automatic signaling server with Manual Signaling and Local Encryption.
Here is the exact flow:
- Step 1: The Shared Secret. User A and User B agree on a strong password beforehand (via a secure channel or in person). Both enter this password into the app.
- Step 2: Generating the Offer. User A's browser generates the WebRTC connection data. Instead of sending this to a server, the browser uses the native Web Crypto API to securely encrypt this data (using AES-GCM 256-bit) based on the shared password.
- Step 3: The Copy-Paste Relay. User A copies this encrypted block of text and sends it to User B via any messaging app, SMS, or even an email.
- Step 4: Decryption and Answer. User B pastes the text into their app. Their browser uses the shared password to decrypt it, accepts the connection, generates an "Answer," encrypts it, and sends it back to User A.
- Step 5: Direct Connection. Once User A pastes the final answer, the browsers establish a direct P2P connection. The audio flows directly between them.
Because the signaling data is heavily encrypted locally, the messaging app used to copy-paste the text has no idea what the text means.
The Reality: Trade-offs and Limitations
Building a truly serverless application taught me a lot about network realities. By removing the server, you have to accept certain trade-offs:
- IP Visibility: Peer-to-peer means exactly that. The browsers connect directly, which means User A and User B can technically see each other's IP addresses. If you need absolute anonymity from the person you are talking to, you must route your connection through a VPN.
- The Corporate Firewall Problem (Symmetric NAT): This project uses basic STUN servers to find IP addresses, which works perfectly on almost all home Wi-Fi and mobile networks. However, if both users are behind strict corporate firewalls, they won't be able to connect. Standard apps solve this using TURN servers (which relay the media), but adding a TURN server would defeat the entire "serverless" philosophy of this project.
Final Thoughts
Building this was an incredible exercise in pushing the limits of modern browsers. We are so used to spinning up servers for everything, but sometimes, the most secure server is the one that doesn't exist.
- Check out the code on GitHub: https://github.com/furkiak/secure-p2p-voice
I would love to hear your thoughts on this architecture. Have you ever experimented with manual WebRTC signaling or the Web Crypto API? Let's discuss in the comments!
Top comments (0)