We made a classic architectural mistake building a video calling feature. It worked perfectly in local testing. It fell apart in production. Here's exactly what happened, how we debugged it, and what we should have done from the start.
The Setup
I was building EyeSmarty — a remote team monitoring app with live audio/video calling between managers and team members. The stack was Vue.js + Electron on the frontend, Node.js + Express on the backend, and Socket.IO already handling all real-time events across the app.
When the requirement came in for audio/video calling, the decision seemed obvious: "We already have Socket.IO for real-time things. Let's use it for calls too."
That decision cost us weeks.
How We Built It (The Wrong Way)
The initial implementation was straightforward:
- Capture audio/video from the browser using
getUserMedia - Encode the media into chunks
- Send each chunk to the server via Socket.IO
- Server receives the chunk and emits it to the other participant
- Other participant receives and plays it back
In a quick local test with two browser tabs on the same machine — it worked. Call established, audio heard, video visible. Feature marked as done. Shipped.
When Production Broke Everything
Real users on real networks told a completely different story.
Audio cutting out every few seconds — users reported audio playing for 2–3 seconds, going silent, then coming back. We assumed it was their internet connection.
Video freezing — frames would freeze mid-call, sometimes recovering, sometimes staying frozen for the entire session.
The whole app slowing down — with three or four simultaneous calls, the entire application degraded. Dashboard stopped updating in real time, screenshot uploads delayed, status events lagging. Not just the calls — everything.
Server CPU hitting 70–90% — with just a few concurrent calls.
How We Debugged It
Step 1: Rule out the network
We ran calls between two machines on the same local network. Problems persisted. Network quality ruled out.
Step 2: Measure what the server was actually doing
We added server-side logging to track events per second, payload sizes, and CPU correlation during active calls. The numbers were alarming.
A single 720p video call at 30fps produces roughly 1–2 MB of data per second per direction. With two participants:
- Server receiving: 1–2 MB/s
- Server re-emitting: 1–2 MB/s
- Total: 2–4 MB/s of raw media through a single Node.js process
With three simultaneous calls, that became ~12 MB/s through a single-threaded process. Node.js was spending almost all of its time copying media buffers — leaving almost no capacity for everything else.
Step 3: Understand why Socket.IO is the wrong tool
Socket.IO was designed for low-frequency, small-payload messaging. It's excellent for events like "User X just came online" or "Screenshot uploaded at timestamp Y" — tiny payloads that happen occasionally.
Audio and video are fundamentally different:
- Continuous — data flows every 33ms at 30fps without stopping
- High-bandwidth — orders of magnitude larger than any chat message
- Latency-sensitive — any delay in the relay chain adds directly to call latency
Every millisecond media spent sitting in the server's event queue was a millisecond of added latency. Audio delay becomes noticeable at 150ms. Our relay architecture was regularly exceeding 800ms end-to-end.
The Fix: WebRTC
Research into how Zoom and Google Meet handle calls revealed they don't relay media through their servers in the standard path. They use WebRTC.
The architectural difference is fundamental:
- Socket.IO approach: Client A → Server → Client B
- WebRTC approach: Client A ←→ Client B directly
The server's role in WebRTC is signaling only — tell the two clients about each other, then step out of the way. Once the peer connection is established, media flows directly between browsers. The server sees none of it.
The Migration
Socket.IO wasn't removed — it became the signaling channel:
// Lightweight signaling — perfect for Socket.IO
socket.emit('webrtc-offer', { targetUserId, sdpOffer });
socket.emit('webrtc-answer', { targetUserId, sdpAnswer });
socket.emit('ice-candidate', { targetUserId, candidate });
WebRTC handled the actual media:
const peerConnection = new RTCPeerConnection({
iceServers: [{ urls: 'stun:stun.l.google.com:19302' }]
});
localStream.getTracks().forEach(track => {
peerConnection.addTrack(track, localStream);
});
peerConnection.ontrack = (event) => {
remoteVideoElement.srcObject = event.streams[0];
};
For NAT traversal, a STUN server tells each client their public IP so they can connect directly. For strict corporate firewalls, a TURN server acts as a relay for that edge case only.
Results
| Metric | Socket.IO | WebRTC |
|---|---|---|
| End-to-end audio latency | 600–900ms | 80–150ms |
| Server CPU (3 active calls) | 70–90% | 8–15% |
| Server memory during calls | Steady climb | Flat |
| Video freeze frequency | Every 15–30s | Rare |
| Max stable concurrent calls | ~3 | 10+ |
Server load dropped by over 80%. The rest of the application ran smoothly again.
What I Took Away
1. Match the tool to the data characteristics, not just the category. Socket.IO is real-time. WebRTC is real-time. But the wrong choice doesn't fail immediately — it fails under production load.
2. Local testing hides production load problems. Two tabs on the same machine share the same CPU and loopback network. The problem was invisible until real users added real concurrent load.
3. Measure before you guess. Without server-side metrics, we'd have spent weeks trying to fix the users' internet connections. Data made the root cause undeniable.
4. Don't fight the platform. Browsers have WebRTC built in. Use the tool the platform provides for the exact use case.
I'm Muhammad Aqib, a Full Stack JavaScript Developer with 6+ years of experience building scalable web apps with React, Vue.js, Next.js, Node.js, Python and FastAPI. View my portfolio at muhammadaqib.com
Top comments (0)