How We Built a WebRTC-Based Intercom System for Residential Buildings

#webrtc #reactnative #ios #voip

When we set out to build AdminCondo — a property management platform for residential buildings in Mexico — we knew we needed more than dashboards and payment tracking. Security guards needed to call residents directly from their phone. Residents needed to answer from anywhere — even when the app was killed.

We needed a full intercom system, built into a mobile app, that worked as reliably as a phone call.

This is the story of how we built it with WebRTC, Socket.io, and a lot of pain navigating iOS background restrictions.

The Architecture at a Glance

Our intercom system has four layers:

┌─────────────────────────────────────┐
│  React Native App (iOS / Android)   │
│  react-native-webrtc + CallKit      │
├─────────────────────────────────────┤
│  Socket.io Signaling Server         │
│  (Node.js / Express)                │
├─────────────────────────────────────┤
│  STUN / TURN Servers                │
│  (NAT traversal for cellular nets)  │
├─────────────────────────────────────┤
│  Push Notification Layer            │
│  iOS: PushKit VoIP  |  Android: FCM │
└─────────────────────────────────────┘

The flow for a guard calling a resident looks like this:

Guard taps "Call" in the app, which emits a make-call event via Socket.io
Signaling server looks up the resident's extension, sends a push notification (PushKit on iOS, FCM on Android)
Resident's phone wakes up — iOS shows the native CallKit incoming call screen; Android shows a foreground service notification
Resident answers — the server brokers the WebRTC offer/answer exchange
Peer connection established — audio flows directly between the two devices via STUN/TURN

The Hardest Part: Waking Up iOS

On Android, receiving a call when the app is backgrounded or killed is relatively straightforward — FCM data messages plus a foreground service gets the job done.

iOS is a different beast entirely.

Apple requires VoIP apps to use PushKit for incoming call notifications. PushKit notifications are special: they wake your app even when it's been killed by the OS, and they are mandatory to pair with a CallKit reportNewIncomingCall. If you receive a PushKit notification and don't report a call to CallKit within a few seconds, Apple will terminate your app and eventually revoke your push privileges.

The tricky part is timing. When the app is killed, PushKit wakes it, but your JavaScript bridge (React Native) hasn't loaded yet. The native layer must show the CallKit UI immediately. Only later, once JS is ready, does the app connect to the signaling server and complete the WebRTC handshake.

We solved this with a "pending call" pattern: the native layer stores the call metadata, and once the JS context boots, it queries both local storage and the signaling server for any active calls that need to be joined.

The WebRTC Offer/Answer Timing Problem

A subtle bug that cost us days: don't create the WebRTC offer before the callee is ready.

In early versions, the caller would create a RTCPeerConnection and generate an offer immediately after tapping "Call." The problem? ICE candidate gathering starts as soon as you create an offer. Those candidates get sent to the signaling server, which tries to relay them to the callee — but the callee hasn't connected yet. By the time they answer, the candidates have been lost.

The fix was to defer offer creation — wait for the server to confirm that the callee has joined the call before creating the PeerConnection and generating the offer. This ensures that by the time ICE candidates start flowing, both peers are connected to the signaling server and ready to receive them.

CallKit Audio Session Management

Another iOS-specific challenge: CallKit owns the audio session. When a call starts, CallKit activates an AVAudioSession on your behalf. If your WebRTC stack also tries to manage the audio session, you get conflicts — silence on one end, or audio routing to the wrong output.

The solution is to tell WebRTC to use "manual audio mode" and then bridge CallKit's audio session events. The order matters enormously here. You must wait for the audio session activation before trying to play ringback tones or capture microphone audio.

NAT Traversal: Why You Need a TURN Server

STUN servers work great when both devices are on Wi-Fi. But in production, we found that roughly 15-20% of connections failed — especially when one party was on a cellular network behind carrier-grade NAT.

The fix was deploying our own TURN server (we use coturn). TURN acts as a relay — if direct peer-to-peer fails, audio is routed through your server. It uses more bandwidth, but the connection actually works. For an intercom system where reliability is non-negotiable, TURN is not optional.

Lessons Learned

After shipping this to production and handling real calls between security guards and residents, here are our key takeaways:

iOS PushKit is non-negotiable for VoIP. Regular push notifications are too unreliable for time-sensitive calls.
Never reorder the call setup sequence. CallKit, audio session, ringback, socket signaling — this is a tightly coupled pipeline.
Defer the WebRTC offer. Creating the offer before both peers are online leads to lost ICE candidates and failed connections.
Deploy your own TURN server. STUN-only works in development. In production, carrier-grade NAT will break 15-20% of your connections.
Heartbeats matter. You need to know if a user is truly online before deciding whether to signal via socket or push notification.
Test on real cellular networks. Wi-Fi-to-Wi-Fi testing hides most of the real-world problems.

What We're Building Next

We're exploring adding video calling for visitor verification (guard shows the visitor on camera, resident confirms identity before granting access) and group conference calls for emergency situations.

If you're building something similar or have questions about implementing WebRTC in React Native, I'd be happy to discuss in the comments.

About the author: Aldo Montenegro is the founder of PopServices, a software company building technology for property management in Latin America. AdminCondo is their flagship product, used by residential buildings across Mexico.