Web Real-Time Communication (WebRTC) is an open source project currently being developed with an aim to provide real time, peer-to-peer communication between web applications.
This blog has been written assuming that the reader has zero knowledge of how WebRTC works and hence have explained the entire working in detail using simple terms and analogies wherever possible. Let’s get started !
In order to set-up a WebRTC connection, we need to perform the following two steps :
- Find the location of a peer.
- Notify a peer to setup WebRTC connection.
Step 1: Locating a peer
Think of this like making a phone call, when you need to talk to someone over the phone, you dial the other person’s phone number and connect with that person. The same thing happens when someone wants to call you. In case of mobile communication, we use the mobile/phone numbers as identification of a user. This identification is further used by telecommunication systems to locate a user.
However, web applications cannot ‘dial and call’ each other. Each of the million browsers in the world do not have a unique id (like a phone number) assigned to them. However, the system in which these applications reside is assigned an unique IP address which can be used to ‘locate’ a peer.
However, this process is not as easy as it sounds. Because, most of these systems sit behind a Network Address Translation (NAT) device. NAT devices are needed for security and IPv4 limitations on available public IP addresses. A NAT device assigns private IP addresses to systems within a local network. This private IP addresses are only valid and visible within the local network and cannot be used to accept communication from the outside world since the systems outside the network are not aware of the public IP of devices within the network.
Due to the involvement of NAT devices, a peer does not know it’s own public IP address since it is masked by a private IP address assigned by the NAT. And hence, it cannot share it’s public IP address with another peer for accepting connections. In more understandable terms, if you want someone to give you a phone call, you need to give your phone number to the other person. But, in the presence of NAT, it’s like staying in a hotel wherein the phone number of your room is hidden from the outside world, the calls coming to the hotel are handled at the reception and further redirected to your room on request. Such indirect form of connection is not intended in a peer-to-peer connection technology.
In order to overcome this, we use a protocol called as Interactive Connectivity Establishment (ICE). The job of ICE is to find the best possible path to connect two peers. ICE can perform direct connection , i.e in the absence of NAT and also indirect connections i.e in the presence of a NAT. ICE framework provides us with ‘ICE candidates’. ‘ICE candidates’ are nothing but objects that contain our own public IP address, port number and other connection related information.
In the absence of NAT, ICE is pretty straightforward since the public IP address of the peer is readily available. However, in the presence of NAT, ICE relies on entities called as Session Traversal Utilities for NAT (STUN) and/or Traversal Using Relays around NAT (TURN).
A STUN server basically allows a peer to find out it’s own public IP address. A peers that needs to know it’s own public IP address sends a request to the STUN server. The STUN server replies with the public IP address of that peer. This public address can now be shared with other peers so that they can locate you. However, if the peer is behind a complex NAT and/or a firewall, even STUN is not able to find and provide the requesting peer with it’s IP address. In such cases, ICE relies on TURN to establish the connection. TURN, as the name suggests, is a relay server and acts as an intermediary for transferring data, audio, video when direct connection is not possible between two peers.
The STUN server is only involved during the process of finding the public IP. Once WebRTC connection has been established, all further communication happen through WebRTC. However, in case of TURN, the TURN server is required throughout even after the WebRTC connection has been setup.
A TURN server is something that is not intended but due to limitations of STUN, we have to rely on it. A STUN server is only successful about 86% of the times.
“ICE is complex because we live in a complex world.”
Step 2: Notify a peer to setup WebRTC connection
Now that we have obtained the ICE candidates, the next step is to send this candidates to a peer that we wish to connect to. Along with the candidates, Session Descriptions like the session information, time description, media description are sent. The ICE candidates and the Session Description are bundled inside an object and conveyed using the Session Description Protocol (SDP). In certain cases, the ICE candidates are not bundled in the same object as Session Description and are sent separately, this is called as Trickle ICE (this is a whole new concept, let’s not go deep into it for now!).
I’ve written that we need to ‘send’ the information to the other peer. But, how are the candidates and session description transferred when we only know the IP address of the sender and unaware about the receiving peer’s IP address ? And since the WebRTC connection has not yet been established, through what medium are these information transferred ?
The answer to all these questions lie inside a concept called as Signalling Mechanism. Before a WebRTC connection is established, we need some medium to transfer the above information between the peers and let them know how to locate and connect with each other for a WebRTC connection. This is where signalling mechanism comes into picture. A signalling mechanism, as the name suggests, exchanges connection signals (ICE candidates, Session description, etc) between the two peers that intend to be connected.
WebRTC does not define any standard for implementing such a signalling mechanism and leaves it to the developer to create a mechanism of his/her choice. The signalling mechanism to exchange the information can be achieved by simply copy-pasting either information into respective peers or by using a communication channel like WebSockets, Socket.io, Server Side Events, etc. In short, a signalling mechanism is just a mode of exchanging connection related information between peers so that the peers can identify each other and start communicating further using WebRTC.
Let’s quickly run through the entire process step-by-step for better understanding.
If suppose, peer A would like to setup a WebRTC connection with peer B, they need to perform the following actions:
Peer A generates it’s ICE candidates using Interactive Connectivity Establishment (ICE). In most cases it requires a Session Traversal Utilities for NAT (STUN) or a Traversal Using Relays around NAT (TURN) server.
Peer A bundles the ICE candidate and Session Description into a single object. This object is stored as Local Description (a peer’s own connection information) within peer A and transferred to peer B via a signalling mechanism. This part is called as the Offer.
Peer B receives the offer and stores it as Remote Description (connection information of the peer at other end) for further use. Peer B generates it’s own ICE candidate and Session Description, stores them as it’s Local Description and sends it to peer A via the signalling mechanism. This part is called as the Answer. (Note: As said earlier, the ICE candidates in step 2 and 3 may also be sent separately)
Peer A receives the answer from peer B and stores it as it’s Remote Description.
With this, both the peers have the connection information of each other and can successfully start communicating via WebRTC !
Top comments (4)
It should be noted that the API and its browser support only recently stabilized.
Also, while using WebRTC for direct peer-to-peer-connections is rather easy, adding more peers is where the problems start.
Otherwise, it's a pretty great addition to the web tech stack, allowing a great deal of flexibility for sharing various streams of any format and content.
Does building a STUN server should be easy?
How to build a simple TURN server?
I am interested to see the role of 'servers' vs the role of the 'clients'
I'm not sure how easy or difficult it is to build a STUN or TURN server from scratch on our own since I haven't tried it.
For my previous WebRTC project, I used an open source TURN/STUN implementation called COTURN and hosted it on a DigitalOcean instance.
Also, there are multiple publicly available STUN servers that you can use.
tinyurl.com/y3345yop (All STUN servers in this list may not be live)
The task of STUN seems simple, which is telling you your IP, this is why I asked about the complexity of building one.