Quadcode team for Quadcode

Posted on Nov 25, 2022

Evolution of the Quadcode Internet Telephony Architecture

#voip #telephony #architecture

Over the course of several years, various functionalities have been added piece by piece to our Internet telephony. As a result, we found ourselves at a point where the only advantage of the existing VoIP architecture was its reliability. At the same time, making any changes caused headaches, and a successful result of their implementation in the product was never guaranteed. We decided to fix it and wrote a new solution in a year.

Quadcode Telephony Service

We make Internet telephony for our B2B clients. VoIP components that we use:

FreeSWITCH as telephone exchanges.
VoIP telephony providers from our customers.
More than 200 phone numbers from all over the world.
SIP phones and GSM gateways.
WebRTC clients in CRM and our WebRTC Webphone.
Integration with CRM.
Monitoring via Telegraf, Influx and Prometheus.
Homer to store SIP traces.
Ceph and AWS S3 for storing audio recordings of conversations.
DDoS and UDP protocol protection.

VoIP telephony only at first glance seems simple and understandable, but in fact it turns out that only VoIP engineers understand how SIP and WebRTC work with all the nuances of these protocols. Therefore, let's start with a brief explanation so that we can talk further about architectural changes.

What is SIP protocol?

Session Initiation Protocol—a client-server application layer protocol. It provides organization, modification and termination of communication sessions: multimedia conferences, telephone connections and distribution of multimedia information. The interaction of clients within SIP is most often carried out in the form of a dialog—this is a sequence of SIP messages.

SIP can be divided into two interrelated parts:

Signaling.
Media transmission.

Dialog and other SIP messages are transmitted in the signal. The SDP protocol is embedded inside the SIP protocol, which is responsible for establishing a media connection. SDP transmits information about the audio and video stream.

The SIP message itself looks something like this:

INVITE sip:1234567890@192.168.1.1:5080;transport=tcp;gw=123 SIP/2.0
Record-Route: <sip:192.168.1.2;lr=on;ftag=as1f5e1177;vsf=AAAAAAAAAAAAAAAAAAAAAAAACAwAAAcBAAAEAAYCODo1MDgw;vst=AAAAAA8HAAYAdAEPDBtxAAAbGAIfABkxMzQuMjUy>
Via: SIP/2.0/UDP 192.168.1.2;branch=z9hG4bK413d.eb4e995427afcd56704ae5c59c0e9fd9.0
Max-Forwards: 69
From: <sip:0987654321@192.168.1.2>;tag=as1f5e1177
To: 1234567890 <sip:1234567890@192.168.1.1>
Contact: <sip:0987654321@192.168.2.3:5080>
Call-ID: 6366d9bb789ea6f1235f83297564388d@192.168.2.3:5080
CSeq: 102 INVITE
Content-Type: application/sdp
Content-Length: 348
User-Agent: Provider SBC

v=0
o=provider 1719357914 1719357914 IN IP4 192.168.2.3
s=Provider MGW
c=IN IP4 192.168.2.3
t=0 0
m=audio 12554 RTP/AVP 8 0 9 18 3 101
a=rtpmap:8 PCMA/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:9 G722/8000
a=rtpmap:18 G729/8000
a=fmtp:18 annexb=no
a=rtpmap:3 GSM/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=ptime:20
a=sendrecv

Like HTTP, SIP is a text protocol. Their response codes are similar—SIP also has 200s, 400s, 500s, but the protocol itself is a little different. In HTTP we sent a request, got a response and settled down. But in SIP, everything is complicated: one message, a response to it, another message, another response to it. These are the dialogues that we talked about a little bit above. But dialogues are also not always there: at some points they're used, at others they aren't. There's also a difference in the processing of response codes; for example, in HTTP 500 is a clear error, and in SIP—maybe, maybe not.

Another difficulty is the processing of SDP information. The SDP is passed to a SIP body, where codecs, ports, IP addresses, etc. are listed. All this information is coordinated between the server and the client—everything is quite complicated and with a lot of nuances. For example, if the client is behind a NAT, it needs additional means to determine which external IP address to specify in the SDP, because the client only knows its IP address inside the network. The STUN protocol is usually used for this.

It's no coincidence that the FreeSWITCH developer wiki says:

SIP is a crazy protocol and it will make you crazy too if you aren't careful.

Nevertheless, almost all modern VoIP telephony is based on the SIP protocol. In general, VoIP SIP can be represented as a set of various protocols: STUN, TURN, SIP, TLS, SDP, (S)RTP, etc.

What is WebRTC?

Web Real Time Communications—a standard that describes the transmission of signal information, streaming audio data, video data and other content. It's applicable not only to telephony: WebRTC can transmit both SIP and XMPP and other protocols. It's needed for real-time data exchange between browsers directly, or between a server and a browser.

The convenience of WebRTC is that it doesn't need installation of additional plugins or extensions for its operation—it's a built-in functionality. And on the server side there are various software solutions for interaction using WebRTC:

FreeSWITCH.
Asterisk.
Flashphoner.
Kurento.
Etc.

WebRTC provides absolute cross-platform compatibility for applications and complete independence from both hardware and operating system tools. For developers, the technology is available in the form of various JavaScript API libraries. You can take these libraries, import them to yourself and write a full-fledged front end based on them. It'll work without problems and be supported by Google Chrome, Safari, Firefox and other browsers. For example, we use the library sip.js.

But there's one big disadvantage in using WebRTC—large resources are needed for transcoding media, as well as transmitting media streams. But this disadvantage is compensated for by the fact that there's no need to develop software and any of its own protocols for the telephony core, which would require a lot of man-hours.

In our case, WebRTC is used specifically for phone calls. For them, only SIP remains at the upper signaling level, and for media—SRTP plus the media control protocol:

Old VoIP Architecture

In 2020, the architecture of our telephony looked like this:

On the left side of the diagram are VoIP providers connected via the internet. Clients connect to telephony via an internal network. As clients we have:

CRM users—connections from the browsers of support service and client managers.
Users who use Zoiper as a SIP phone from their smartphones or laptops.
Internal office telephony with Wi-Fi handsets and GSM gateways.

Telephony works in two environments: Production and Integration. Integration or Int—a mini-copy of the production; in this environment we conduct integration testing and verification.

Advantages of the old architecture. There was only one advantage in the old scheme—reliability. Firstly, these are proven providers who are always ready to help in solving issues. Secondly, everything worked well, but with a caveat. When it was necessary to change something, we inevitably faced difficulties.

Disadvantages of the old architecture. There were many more disadvantages. Firstly, there was no CI/CD in the old scheme, and we really didn't want to implement CI/CD under the current architecture, since the architecture was far from perfect and had many problems. That's why it was painful to change anything: this was some kind of manual manipulation and getting into Git. Secondly, the architecture didn't scale: if something new was needed, we were forced to reinvent everything.

A bonus of this whole affair was the poor documentation that we inherited. Unfortunately, when the company was still a startup, they didn't get around to documentation, and then it was too late. To the extent possible, we brought it in line with how everything works, but no one knew all the pitfalls.

Overall, it was a classic version of internal telephony, as in any company that uses open-source VoIP telephony solutions.

Cherry on top: The Int solution didn't coincide with the Prod solution. This was a huge disadvantage because all the telephony tests on Int didn't guarantee that everything would work in production.

We conduct development and functional testing in a third environment—Sandbox. And in the old architecture, this environment was kludgy: Sandbox was integrated with Int; it allowed you to do something, but again it didn't guarantee that everything would work the same way in production.

If we consider each FreeSWITCH inside, then the architecture of the old telephony and its integration with CRM looked something like this:

The red zone is the responsibility of the back office developers, and the gray zone is the responsibility of the VoIP telephony department. At some point, these areas of responsibility intermixed.

It was a nightmare. Everything was built as needed: functionality was added piece by piece, resulting in a rattling mix of areas of responsibility. It became impossible to understand what changes at what point may affect what.

Integration took place by using the FreeSWITCH dialplans and making changes to the FreeSWITCH SQL cache; fs_curl was also used with some custom changes, which hasn’t been developed for 6 years. That is, the CRM developers needed to know how to configure FreeSWITCH, how it works, and how its entire binding works.

Moreover: integration took place through Go applications, which were written by the back office, but at the same time were supported and launched on the telephony server. And on top of all this, large numbers of Lua scripts worked in the telephony area of responsibility, the launch of which even became impossible to track in the integration logic. If there was a need to make any change in the existing scheme, it was very complicated, and each new change could bring even more new problems.

At some point, we were faced with the question of expanding the capabilities of B2B telephony for our customers. We realized that it was impossible to continue living with the old architecture, and began to improve it.

New VoIP Architecture

It took a year to create a new solution. As a result, the new architecture looks like this:

We implemented the division of FreeSWITCH by tasks: there are SBCs that are responsible only for telephony and connection of telecom operators, and there are PBXs that are responsible for connecting clients. The diagram has SBC0X and PBX0X, which means that we can increase the number of both SBC and PBX as needed—SBC01, SBC02, PBX01, PBX02, etc.

If there's a need to use Kamailio or any other system for SIP routing on SBC instead of FreeSWITCH—no problem.

Advantages of the new architecture. VoIP reliability has remained at the same level: providers and FreeSWITCH haven't changed. But now we have CI/CD, and everything's controlled through GitLab. The new architecture also added high security, because we implemented and created integration with the DDoS protection service, created UDP protocol protection, and implemented WebSocket connection protection using tokens.

As I said, scalability has appeared. Now we can increase the number of servers depending on the tasks and goals. Every step, every change in telephony is documented; we try to maintain maximum documentation.

The architecture for the Int environment has also changed. It's now almost identical to the prod solution:

The only difference is in the connected providers; placeholders are connected instead. Now we don't need to buy individual numbers on Int—we can emulate the production numbers on the provider and work with them in the same way as in the Prod environment. For the Sandbox environment, we wrapped the Int schema in Docker.

Now the same FreeSWITCH configuration works in Prod, Int, and Sandbox. For each developer, a different environment can be activated in Sandbox, one-to-one with the Int environment. This ensures both testing and development, and the confidence that everything will work the same everywhere.

The new FreeSWITCH architecture and its integration with CRM looks like this:

The areas of responsibility are now clearly divided. There are far fewer interaction points. Now the main point of interaction between FreeSWITCH and CRM is HTTP requests from FreeSWITCH to the CRM API (via a PHP JSON to XML converter), and the exchange takes place by receiving a JSON. That is, when prompted by FreeSWITCH, the PHP converter requests a pre-standardized JSON from the API and generates, based on the JSON response, an XML for FreeSWITCH with a configuration or dialplan. In the new architecture, there's no need to know the principles of configuring and working with FreeSWITCH in order to interact with it.

The second point of contact is FreeSWITCH's event socket, through which the API can receive data from FreeSWITCH, create a callback via originate, or ask FreeSWITCH to update its configuration.

Software Tools for Working with Telephony

To develop and work with telephony, we've created two software products.

The first is Web_fs_cli. FreeSWITCH has fs_cli; it's a FreeSWITCH command line client. The client works through the event socket and allows you to make requests to FreeSWITCH and receive responses. Through fs_cli, you can view the number of registrations or the list of FreeSWITCH users, and execute other commands.

Initially, this interface works only via the console; that is, we connect to the FreeSWITCH event socket in fs_cli, log in and work. But this isn't very convenient, because fs_cli is needed by both testers and developers, and sometimes you need to give SSH access to the server, and at times it's simply impossible to work with fs_cli remotely. And testers often don't like to make requests to the console. That's why we developed the fs_cli interface on the web. The web interface allows you to make requests to the FreeSWITCH event socket by entering a command, or using preset commands in Action. The responses are output directly to the browser.

For the Int environment, the Test originate mod is available in the Web fs_cli interface. This is the generation of calls from provider emulators to an Int SBC. There we indicate who is calling to where and after what time the call ends, and we click "generate call". Through the event socket, an originate request is made to the providers' emulators, and as a result, an incoming call to the SBC is generated in the same way as it would have come to Prod. An auto-informator is connected to the call, which tells from where the call came from and with what numbers.

This interface was launched at the end of 2021, and performs its tasks perfectly.

The second software product is Webphone. This is a WebRTC phone that works in a browser. It uses the sip.js library, which we also use for chat rooms and CRM. The phone supports working with websocket tokens and very often helps with debugging.

Why Our Solution Was Needed

Why deal with legacy in our solution if B2B clients can use any of the ones already available on the market? Here's an example.

One of our B2B clients is a banking service. It had its own infrastructure and its own telephony solution—Twilio. It’s a provider of cloud software for business communications. Twilio isn't an independent dialer, and it could be integrated into a narrow number of other platforms. The client had integration with Zendesk set up.

The Twilio service is cheap, and the customer had no problems with the system itself. However, Zendesk was very expensive, so it was decided to abandon both products. From that moment, the business began to search for the perfect telephony.

bOnline. The first option was bOnline. It satisfied the needs at a reasonable cost. But after a couple of months of operation, regular technical problems with communication began, which the telephony system couldn't solve. Even if the difficulties could be eliminated, it took a lot of time. We had to look for another platform.

Acefone. The next solution was Acephone, but with it the number of technical problems didn't lessen. One day, banking received a not very pleasant comment from the platform that the ability to hear customers is blocked on their side, and the administrators of the company itself should solve this problem.

Quadcode had already worked with the banking service itself, so the client turned to us for a VoIP solution. The telephony team gathered business and technical requirements for the service and was able to offer a satisfying solution. Our Webphone also went into production for this client as a phone in the browser. It looks simple, but fits the current needs.

The full deployment period from receiving the task from the customer to the product solution was one week. With the old architecture, we wouldn't have been able to do anything. So, the decision to support and update our VoIP telephony was right for us.

Here's the feedback we received from the customer:

Possible technical problems are reported immediately, and they're solved very quickly. We can also safely expand the telephony functionality to meet new requirements. For example, relatively recently we added the ability to track who is currently online in the system. Switching to Quadcode telephony was the best solution.

Results of Architecture Refinement

The reliability of the new telephony architecture has remained the same. At the same time, we've increased security, added CI/CD, scalability and regularly maintain documentation. Now VoIP complies with Quadcode architecture standards and supports all three environments: Sandbox, Int and Prod.

Integration with external services is also standardized—this is JSON, where it's clearly spelled out what to use, and how and when. And finally, there's no need to know FreeSWITCH in order to integrate with telephony. The development and implementation of new VoIP products in the new architecture takes one week, and our customers have the opportunity to expand functionality to meet new business requirements.

DEV Community

Evolution of the Quadcode Internet Telephony Architecture

Quadcode Telephony Service

What is SIP protocol?

What is WebRTC?

Old VoIP Architecture

New VoIP Architecture

Software Tools for Working with Telephony

Why Our Solution Was Needed

Results of Architecture Refinement

Top comments (0)

Read next

Mastering Essential Software Architecture Patterns: A Comprehensive Guide🛠️, Part 2

You probably don't need to build large scale microservices. Here is what you can do instead

The future of software architecture: focus on event-driven architecture

Mastering Essential Software Architecture Patterns: A Comprehensive Guide🛠️