DEV Community


Posted on

A beginner’s introduction to Session Initiation Protocol (SIP)

Alt Text


Have you ever wondered how convenient it is to instantly connect with friends and family who are miles away? The incredible ease with which we are able to signal and communicate with anyone globally is brought to us by the wonders of the Session Initiation Protocol (SIP). SIP is an application layer protocol that is responsible for setting up, tearing down and managing multimedia sessions between devices over the IP network. This blog post aims to provide you a high level overview of SIP.

Before the advent of SIP, we primarily communicated using landline phone calls that happened over the Public Switched Telephony Network (PSTN), sometimes fondly referred to as Plain Old Telephone Service (POTS). To establish a call using the PSTN network, a dedicated channel (or circuit) needs to be set up for the duration of the call. With the onset of IP network, there was an opportunity to replace communication over circuit-switched PSTN with packet-switched Voice over IP (VoIP) network. Unlike traditional PSTN, communication over VoIP does not require dedicated telephone lines and data is broken down and sent as packets across the Internet.

What is SIP and why do we need it?

VoIP communication, simply put, is data packets transferred back and forth between the communicating parties. These packets can be classified into two types:

  1. Signalling packets : These packets are responsible for signalling the parties and establishing the communication between them.
  2. Media packets : These are the data packets that is transferred in the communication such as audio / video / text / images etc. The analog media signals are encoded into 0s and 1s and sent as media packets. On the receiver’s end, the media packets are decoded back to the analog signals.

For communication to take place, the sender and receiver should be aware of two things:

  1. The IP address of each other.
  2. The media codecs supported on each other to do effective encoding-decoding.

As one would expect for effective communication to take place, the parties involved must agree on a common protocol (like a common language) in-order to transfer media between them. One such protocol is the Session Initiation Protocol. With SIP, the parties will be able to signal to each other that they are ready to start the communication.

SIP is an application layer protocol, responsible for setting up, tearing down and managing multimedia sessions between endpoints over the IP network.

A high level architecture overview of SIP

One of the most integral needs for communication to take place is to know the address of the other person. In the SIP world, the endpoint devices that the users interact with for communication are called the SIP User Agent Clients (UAC). Each UAC has a unique public address called the SIP AOR (Address of Record) with which they can be uniquely identified (similar to an email-id of a person). A SIP AOR looks like “SIP:user@domain”. As long as we know the SIP AOR of the other person we need to talk to, we can initiate a media session with them.

But in the true sense of a Voice over IP network, it is essential that we know the actual IP address of the UAC device in-addition to their AOR. How do UACs discover each other’s IP addresses that are not known publicly unlike the AORs?

Let us go over an example to understand this.

Alice and Bob are friends who have shared their SIP AORs with each other for communication.

As Alice wants to make herself available for any communication over SIP. She logs into her User Agent Client (endpoint device). As soon as she logs in, Alice’s IP_Address and SIP AOR are registered with a Registrar Service. The Registrar Service stores this IP_Address + SIP_AOR combo in a SIP Registry (probably a Database).

Bob would like to call Alice. Bob places a call to Alice’s SIP_AOR from his UAC.

At this point, Bob’s UAC sends an INVITE message (which is a message part of the SIP protocol) to a Location Service. The INVITE message from Bob’s UAC carries Bob’s IP_Address, Bob’s SIP_AOR and Alice’s SIP_AOR. The Location Service queries the Registry and retrieves the IP_Address corresponding to Alice’s SIP_AOR (remember Alice had initially logged in her device and her SIP_AOR + IP_Address was stored in the Registry). After obtaining Alice’s SIP_AOR + IP_Address, the location service proxies the call invite to Alice on behalf of Bob. Alice receives a call on her UAC and accepts the call. At this point, an ACK message that contains Alice’s SIP_AOR, Alice’s IP_Address is sent back to Bob.

So, as part of the INVITE message, Alice received Bob’s IP_Address and AOR. And as part of the ACK message, Bob received Alice’s IP_Address and AOR.

Alt Text

Hurray, both the UACs have discovered each other and a media session can be initiated between the two! 🎉

This is a basic overview of how a session is established between the different parties in a VoIP session.


Gary Audin, writer at sums it all up really well as SIP “tells you the presence of the other party, makes a connection and lets you do whatever you want over the connection, but it has no idea of what’s going over the connection”. This is a super high level overview of SIP and you can follow the following resources to learn more.


I hope to have helped you understand the world of VoIP a teeny bit better with this article.

Top comments (0)