Ably Blog for Ably

Posted on Jan 6, 2023 • Edited on Aug 10, 2023 • Originally published at ably.com

Alternatives to WebSockets for realtime features

#websocket #webdev

What is WebSocket?

WebSocket is a realtime protocol that provides a persistent, full-duplex communication channel between a web client (e.g., a browser) and a web server over a single TCP connection.

A WebSocket connection starts as an HTTP request/response handshake between the client and the server. The client always initiates the handshake; it sends a GET request to the server, indicating that it wants to upgrade the connection from HTTP to WebSockets. The server must return an HTTP 101 Switching Protocols response code for the WebSocket connection to be established.

Once the connection upgrade is successful and switches from the HTTP protocol to WebSocket, the client and server can freely exchange low-latency messages over the persistent connection as and when needed. After the WebSocket connection has served its purpose, it can be terminated via a closing handshake (both the client and server can initiate it).

The standardized WebSocket API, which is supported by the vast majority of browsers, extends the WebSocket protocol to web clients. The WebSocket API allows you to perform actions like creating the WebSocket object, managing the WebSocket connection, sending and receiving messages, and listening for events triggered by the WebSocket server.

Five alternatives to the WebSocket protocol

We’ll now look at five alternatives to the WebSocket protocol - similar technologies that allow you to power realtime communication for use cases like chat, live multiplayer games, and streaming live score updates.

1. Server-Sent Events

Server-Sent Events (SSE) is a server push technology based on something called Server-Sent DOM Events, which was first implemented in Opera 9. The idea is simple: a browser can subscribe to a stream of events generated by a server, receiving updates whenever a new event occurs. This led to the birth of the popular EventSource interface, which accepts an HTTP stream connection and keeps it open for data streaming. The connection is kept open until closed by calling EventSource.close().

Server-Sent Events advantages

Built-in support for reconnections.
Supported by all modern browsers.
Lightweight protocol that’s more efficient and less resource-intensive than other options like long polling.

Server-Sent Events disadvantages

It’s mono-directional; only the server can push data to the client, which makes SSE a poor choice for realtime use cases that require bidirectional communication, such as chat apps.
You can only have six concurrent SSE connections per browser at any one time.
It only supports UTF-8 text data; SSE can’t handle binary data.

SSE is a good choice for scenarios where you don’t need two-way messaging, such as streaming live score updates. For use cases where you need bidirectional communication, WebSocket is the better option.

See how Server-Sent Events compares to WebSocket

2. Long polling

Long polling is a client pull technology that takes HTTP request/response polling and makes it more efficient, since repeated requests to a server wastes resources. For example, establishing a new connection, parsing the HTTP headers, a query for new data, response generation and delivery, and finally, connection closure and clean up.

To avoid this effort, in long polling, the server elects to hold a client connection open for as long as possible, and delivers a response when new data becomes available or if a timeout threshold is reached.

Long polling advantages

Long polling is implemented on the back of XMLHttpRequest, which is near-universally supported by devices, so there’s usually little need to implement any fallbacks.
In cases where exceptions must be handled though, or where a server can be queried for new data but does not support long polling (let alone other more modern technology standards), basic polling can sometimes still be of limited use, and can be implemented using XMLHttpRequest, or via JSONP through simple HTML script tags.

Long polling disadvantages

Long polling is more resource intensive on the server than WebSockets.
Long polling can come with a latency overhead because it requires several hops between servers and devices. Gateways often have different ideas of how long a typical connection is allowed to stay open, and might sometimes close it while processing is still underway.
Reliable message ordering can be an issue, since it’s possible for multiple HTTP requests from the same client to be in flight simultaneously. Due to various factors, such as unreliable network conditions, there’s no guarantee that the requests issued by the client and the responses returned by the server will reach their destination in the right order.

Long polling is an early precursor to WebSockets. When it comes to building high-performance, low-latency realtime apps, WebSocket is a superior choice in almost every way. That’s not to say that long polling is obsolete; there are certain environments such as corporate networks with proxy servers that block WebSocket connections. In such scenarios, long polling is useful as a fallback mechanism for WebSockets.

See how long polling compares to WebSocket

3. MQTT

MQTT (Message Queuing Telemetry Transport) is a publish-subscribe messaging protocol dating back to 1999, when IBM’s Andy Stanford-Clark and Cirrus Link’s Arlen Nipper published the first iteration.

In an MQTT architecture, we have:

Publishers (producers) and subscribers (consumers). Note that a publisher can also be a subscriber.
A broker which acts as the middleware MQTT server that manages the exchange of messages between publishers and subscribers. Note that messages are stored in topics (or channels).

MQTT advantages

Lightweight protocol that’s ideal for networks with limited bandwidth or unpredictable connectivity, and devices with limited CPU, memory, and battery life.
It’s a reliable protocol, with three different levels of data delivery guarantees: 0 (at most once delivery), 1 (at least once delivery), and 2 (exactly-once delivery).
It’s bidirectional and flexible - it provides one-to-one, one-to-many, and many-to-many communication.

MQTT disadvantages

Not a good choice for sending photos, video, or audio data.
You can’t send MQTT messages to a browser, because web browsers don’t have MQTT support built-in.
The base MQTT protocol doesn’t use encrypted communication. Many MQTT brokers allow you to use MQTT over TLS for enhanced security, but this leads to increased CPU usage, which may be a problem for constrained devices.

Due to being lightweight by design, MQTT is a better choice than WebSockets for many IoT use cases, such as collecting data from temperature or pressure sensors in realtime. However, as previously mentioned, MQTT can’t directly send messages to a browser. That’s why WebSocket is often used as a transport for streaming MQTT data to browser clients (MQTT over WebSockets).

See how MQTT compares to WebSocket

4. WebRTC

Web Real-Time Communication (WebRTC) is a framework that enables you to add realtime communication (RTC) capabilities to your web and mobile applications. WebRTC allows the transmission of arbitrary data (video, voice, and generic data) in a peer-to-peer fashion.

WebRTC consists of several interrelated APIs. Here are the key ones:

RTCPeerConnection. Allows you to connect to a remote peer, maintain and monitor the connection, and close it once it has fulfilled its purpose.
RTCDataChannel. Provides a bi-directional network communication channel that allows peers to transfer arbitrary data.
MediaStream. Designed to let you access streams of media from local input devices like cameras and microphones. It serves as a way to manage actions on a data stream, like recording, sending, resizing, and displaying the stream’s content.

WebRTC advantages

Strong security guarantees, as data transmitted over WebRTC is encrypted and authenticated with the help of the Secure Real-Time Transport Protocol (SRTP).
Open-source and free to use; plus, it’s supported by organizations such as Apple, Google, and Microsoft.
Platform and device-independent; a WebRTC application will work on any browser that supports WebRTC, irrespective of operating systems or the types of devices.

WebRTC disadvantages

Even though WebRTC is a peer-to-peer technology, you still have to manage and pay for web servers. For two peers to “talk” to each other, you need to use a signaling server to set up, manage, and terminate the WebRTC communication session.
WebRTC can be extremely CPU-intensive when dealing with video content and large groups of users.
It’s hard to get started with WebRTC. There are plenty of concepts to explore and master: its various interfaces, codecs, network address translations (NATs) & firewalls, UDP (the main underlying communications protocol used by WebRTC), and many others.

WebRTC is primarily designed for streaming audio and video content (over UDP), and will generally be a better choice than WebSockets in such scenarios. On the other hand, WebSocket is a better choice when data integrity (guaranteed ordering and delivery) is crucial, as you benefit from the underlying reliability of TCP.

Oftentimes, WebRTC and WebSocket are complementary technologies. WebRTC peers coordinate communication through a process called signaling. It’s important to note that WebRTC does not provide a standard signaling implementation, allowing developers to use different protocols for this purpose. The WebSocket protocol is often used as a signaling mechanism for WebRTC applications.

See how WebRTC compares to WebSocket

5. WebTransport

WebTransport is a nascent realtime technology offering client-server messaging over HTTP/3. There are two key WebTransport concepts:

Datagrams. A datagram is a self-contained packet of data that can arrive in any particular order. Designed for use cases that require low latency, and where best-effort data transmission is good enough.
Streams. The Streams APIs provide reliable, ordered data transfer. Note that you can create both unidirectional and bidirectional streams.

WebTransport advantages

By using the Datagrams API, or via multiple Streams API instances, you don’t have to worry about head-of-line blocking.
Establishing new connections is very fast - this is because HTTP/3 uses QUIC under the hood; a QUIC handshake is known to be faster than starting TCP over TLS.

WebTransport disadvantages

WebTransport is still an emerging technology. As of November 2022, WebTransport is a draft specification with W3C, and there’s always a chance that aspects related to how it works may change.
WebTransport is not yet supported by all browsers. For example, you can’t use WebTransport in Firefox and Safari.

Unlike the other WebSocket alternatives we’ve covered in this article - which are well established technologies that have been around for a while - WebTransport is more of a potential future alternative to WebSockets. We don’t know how it will evolve in the coming years, how likely developers are to adopt it, or what pitfalls it might have when used in a production-ready system. In comparison, WebSocket has been around for over a decade; it’s a robust, stable technology, with a large and active community, which currently makes it a superior alternative to WebTransport.

Learn more about WebTransport

Multi-protocol capabilities for the win

We hope you’ve found this article helpful as a starting point for discovering potential alternatives to WebSockets. However, the reality is that many production-ready realtime systems don’t use just one protocol, but a mixture of multiple protocols.

For example, if you’re developing a video conferencing solution, WebRTC is a great option for sending audio and video data between peers. In this scenario, WebSocket complements WebRTC, and is frequently used as a signaling mechanism for WebRTC peers.

Another example: due to its lightweight design, MQTT is an excellent choice for collecting telemetry data from IoT sensors. However, if you want to use this data to power realtime dashboards that can be monitored in a browser, MQTT is unsuitable, as it’s not supported in browsers. What you can do is send the data to browsers over WebSockets. That’s why many MQTT brokers nowadays also support WebSockets (or MQTT over WebSockets).

If you do decide to use WebSockets as the primary transport protocol for your use case, you still need to consider supporting fallback transports, as certain environments block WebSocket connections (e.g., restrictive corporate networks). SSE and long polling are often used as fallbacks for WebSockets.

It's important to be aware of these considerations and potential challenges up front, so we suggest getting ready with this video from Alex Booker on the challenges of scaling WebSockets:

Any questions? Comment on this post, or the video!

DEV Community