DEV Community

ibrahim ali
ibrahim ali

Posted on

A deeper look into websockets

Web sockets are a constant, bidirectional communication line that communicates via HTTP and a single socket. The basic structure of communication between the server and the client is a request-response-based formula. The client makes a request, the server takes the request, interprets it, and responds accordingly. While this works great for most communications between server and client, sometimes you need to communicate in real-time. The req-res based structure of HTTP becomes impractical for this purpose, but an early workaround for this is something called HTTP-long polling.

HTTP-long polling works with the client sending a request to the server with a long timeout. The server uses this time to respond to the client but pushing any data it has available to the server. Upon completion of said request, the client immediately resends a request to "keep the connection open". While this works it has quite a few drawbacks. The first and foremost being that this practice will tie up server resources even when there is no data available to be returned. Other issues pertain to problems as usage increases and re-establishing a connection upon disconnect. This is where WebSockets come in to remedy the situation.

Websockets are established upon the connection of a simple HTTP request. After the first connection is established and required conditions are met, the WebSocket will attempt to keep the original TCP/IP connection established by the HTTP handshake open as a WebSocket. If this is successful the WebSocket becomes a way for the server and client to send UDP-based message data back and forth. They essentially upgrade a request-response pair into a WebSocket.

The HTTP request sent to the server from the client will need to contain a few headers to indicate its intention of being upgraded into a WebSocket connection. The connection header is set to the value of the upgrade. This header indicates that the connection wished to stay open and alive after the transaction that began is resolved. The upgrade header set to WebSocket goes on further to establish the now existing connection as a WebSocket, it also needs a sec-websocket-key which is a randomly generated 16-byte encoded value to identify that specific connection.

The reply from the server must also satisfy a few criteria. It needs to return an HTTP 101 switching protocol to confirm that it is changing to a WebSocket connection. its header must include information such as a confirmation response to the connection and upgrade headers and a sec-websocket-accept header which takes the client's request's sec-websocket-key and rehashes it into another value that indicates that the connection has been properly established and encoded.

Framed protocol

Websockets are a framed protocol meaning that they're divided into chunks of data each with its own properties encoded inside of its frame. The fin bit dictates that this frame is the last frame in this chain of chunks that makes up the protocol. The op-code is set to indicate how each frame's payload data should be interepted.

Alt Text

Masking changes all of the payload data into another value to prevent something known as cache poisoning. The client uses a random key with the data to achieve it. This is done so that the WebSocket doesn't cache the data. If a server is faulty, the communication between protocols once they've left their origin can't be controlled and if the data were to become cached then future connections ran the risk of returning incorrect responses. The masking key generated by the client is responsible for interpreting the data when it's reached its destination. The payload len property is used to encode the total length of the payload. If the payload is larger than 126 bytes then additional protocols are used to encode any additional data.

breaking a connection

A closing frame sent by the server will initiate the closing sequence of a WebSocket connection. The frame would generally contain the reason for connection closing and both parties must receive and accept the frame to end the connection. The TCP connection ends.

Top comments (0)