DEV Community

Cover image for WebSockets Explained Under 10 Minutes (With Visuals)
Jean-Paul Rustom
Jean-Paul Rustom

Posted on

WebSockets Explained Under 10 Minutes (With Visuals)

Use-cases

WebSockets are used in systems where we need to display data in real time and with low latency, such as chatting applications, stocks prices fluctuations, or game leaderboards.

Chat app

Leaderboard

History of HTTP 1.0 & HTTP 1.1

Now before moving forward, let’s step back and talk a little bit about history.

HTTP 1.0 ( 1996 )

In HTTP 1.0, each separate request would have its own TCP connection.

We would open a TCP connection, send a request, and as soon as a response is received, we would close that connection.
For example if we would want to load four images, we would open and close four separate TCP connections, which would kill the performance.

HTTP 1.0 Performance Killed

HTTP 1.1 ( 1997 )

Now in HTTP 1.1, things have changed.

We have a new header called Connection: ‘Keep-Alive’.

We can initiate a TCP connection, keep it open, and have multiple requests and responses in this single TCP connection.

HTTP 1.1 Keep-Alive

Because persistent connections were introduced in HTTP 1.1, WebSockets’ minimum HTTP version should be 1.1.

Polling

Historically, creating web apps that needed bidirectional communication, has required an abuse of HTTP to poll the server for updates.

But sending multiple requests is expensive and could cause server overload.

Problems with polling

A simpler solution would be to use a single TCP connection for traffic in both directions.

This is what the WebSocket Protocol provides.

Bi-directional Protocol

As previously stated, WebSocket is a stateful bidirectional protocol, built on top of HTTP.

It use a single TCP connection for traffic in both directions.

The connection between client and server will keep alive until it is terminated by client, or by server.

WebSocket Connection

WebSocket uses those default URI schemes, for secure and unsecure connections respectively

wss://jaypmedia.com/socket
ws://jaypmedia.com/socket
Enter fullscreen mode Exit fullscreen mode

WebSocket vs HTTP

Also, WebSocket is not HTTP.
It is, indeed, more complex, and of course more persistent and more lightweight.
It is an independent TCP-based protocol.
Its only relationship to HTTP is that its handshake is interpreted by HTTP servers as an HTTP Upgrade Request.

Now don’t get the wrong idea, HTTP is great, but the request response model doesn’t cover bi-directional communication.

Handshake

The WebSocket handshake is the bridge from HTTP to WebSockets.

Client

The WebSocket protocol begins its connection to the server as a simple HTTP request.

In order to start the connection, clients sends an http GET request, that includes at least the following headers:

GET /chat HTTP/1.1

Host: server.jaypmedia.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Enter fullscreen mode Exit fullscreen mode

If any of these are not included in the HTTP headers, the server should respond with an HTTP error code 400 Bad Request.

Client Handshake

  • The Connection: Upgrade header was introduced in HTTP/1.1 to allow the client to notify the server of alternate means of communication.

  • The Sec-WebSocket-Key header is used during the WebSocket handshake to ensure that the client and server are speaking the WebSocket protocol.
    It is a base64 encoded value that is generated by randomly selecting 16-byte value as a nonce.

  • The Sec-WebSocket-Version header indicates the version of the WebSocket protocol that the client supports.

  • If the client is a web browser, it will supply the Origin header.
    If the server does not wish to accept connections from this origin, it can choose to reject the connection.
    Server will only accept connections from listed origins.

Listed Origins

If you are using a browser that supports WebSocket, the whole handshake and the generation of the relevant headers will be handled automatically by using the JavaScript API.

Server

The handshake from the server looks like this:

        HTTP/1.1 101 Switching Protocols

        Upgrade: websocket
        Connection: Upgrade
        Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Enter fullscreen mode Exit fullscreen mode

Handshake from server

According to the WebSocket spec, the only indication that a connection to the WebSocket server has been accepted is the header field Sec-WebSocket-Accept.
To get its value, the server would concatenate the value of Sec-WebSocket-Key received from the client, with a predefined global unique identifier, defined by the RFC.
Then, the string formed will be hashed, then base64 encoded.

Sec-WebSocket-Accept generation

This magic string exists because it will very likely not be used by servers that do not understand WebSockets.

The server responds with a 101 status code.

Any code other than 101 results in an error and means that WebSocket handshake was not completed.

Once the client and server have both sent their handshakes, and if the handshake was successful, then the data transfer part starts.

WebSocket Frames

Now let’s get a little bit deeper, shall we ?

After a successful handshake, clients and servers transfer data back and forth using, at the bit level, a sequence of frames.

Frame

There are control frames and data frames.

Control frames communicate state about the WebSocket, for example the close frame which is used for closing connections.

Data frames on the other hand, as their name implies, carry regular application data.

Contrary to control frames, they can be fragmented.

For security reasons and other concerns explained by the RFC, it is required that a client MUST mask all frames that it sends to the server, whether or not TLS is used.

Those concerns are related to proxies in the middle that do not understand WebSockets.

RFC also requires that a server must close a connection if it receives a frame not masked.

On the other hand, a server must not mask frames it sends to the client.

A client will close a connection if it receives a masked frame.

WebSocket Frame Structure

Now let’s zoom a little bit into our frame.

This is how the frame bits structure looks like.

To know the details about what each field means, you can check WebSocket’s spec.

We will go through some fields.

1- FIN (1 bit) — Indicates if this is the final fragment in a message or not.

2- Opcode: (4 bits) Defines the interpretation of the payload data.

  • 0: Continuation frame ( 0 for all frames except the first one )
  • 1: Text Frame
  • 2: Binary Frame
  • 8: Connection Close

3- Mask (1 bit): If set to 1, masking-key is present. Should be one if sending from client.

4- Masking key: ( 0 or 32 bits) The masking key field is present if MASK bit is set to , which means if sent from client.

The masking key is a random key of 4 bytes chosen by the client.

To decode the frame, we XOR its content with the masking key, 4 bytes at a time.

Frame Header size

A WebSocket frame header can be at max 16 Bytes, which is way smaller than a big HTTP header which can reach the size of 16 KiloBytes.

Conclusion

And that’s it ! We’re done with this quick explanation of WebSockets.

Of course, you can try writing your own WebSocket server, or, you know, use already existing libraries that handles all this complexity.

More topics with visuals:


Top comments (0)