Introduction
Let's start with the basics. The web works on client and server model. Where the Client initiates the request. It passes through the internet and reaches the server. The server then processes it and sends it a response the same way it arrived.
- Client - usually the browser. But can also be a web crawler bot indexing the search results.
- Internet - many networks of connected computers
- Server - THE computer that responds to that particular request
HTTP
HTTP stands for Hypertext Transfer Protocol. But what is Hypertext? Hypertext is a way of linking text to other text or resources. It is the underlying concept of the World Wide Web, where web pages are interconnected by hyperlinks. It is a text which you see on a computer, and on clicking on it redirects to the resource. Furthermore, this resource is a document (usually HTML files) present on a computer (it can be any computer. Be it yours or someone else's on cloud). And now, what is a Protocol? To send this request to the server, we need to follow some standard rules/protocols.
In short, HTTP is the standard set of rules, which describe the way we can make a request, that allows us to transfer a text/file/media (hypertext/hypermedia) over the internet.
Between the client and the server, there are many entities (routers, switches, load balancers) through with the requests get passed. But discussing them would be out of scope for this article. You can read more about this here.
It's Simple and Stateless
All the HTTP messages are human-readable, which makes it easier for the developers to debug and reduces complexity. Whenever a request is made from the client, the server doesn't store any information; which makes it stateless.
You might question, if the server doesn't have the data from previous requests, how will it be able to keep track of what response to send for an authenticated user or a user with different permissions?
Well, though it is stateless; it can keep track of this data using cookies, whenever a new session is created. This is where we store the JWT tokens of an authenticated user in HTTP Header, to let your server know that the user is logged in.
Building Blocks
HTTP is built on top of TCP/IP protocols and consisted of 4 blocks
- textual format to represent Hypertext files (HTML)
- protocol to exchange these files (HTTP)
- client to display these files (browser)
- server to give access to these files
Evolution on HTTP
HTTP/0.9
The initial implementation of HTTP was remarkably straightforward. It's also called one-line protocol, which contained only GET
method followed by the path to the resource.
REQUEST
GET /fancyWebPage/index.html
RESPONSE
<html>
HTML page with only text.
No images or other resources are included in this file.
</html>
- only GET method was provided to access the resources
- no HTTP Headers. The response was always a (HTML) file.
- no status or error codes. If there was an error, an HTML file with the generated error description was sent.
HTTP/1.0
- versioning was added at the end of the resource path.
- status code was also sent at the beginning of the response.
- HTTP Headers were introduced for both request and responses, which allowed us to send metadata.
- This allowed us to transfer documents other than plain HTML if its respective HTTP header value was included in
Content-Type
- The connection is closed after the response is received. We need to create a new connection for every request.
REQUEST
GET /fancyWebPage/index.html HTTP/1.0
User-Agent: BrowserName/3.0 (OS-Name 2.0)
RESPONSE
200 ok # status code
Date: Fri, 11 Aug 2023 12:23:22 IST # metadata
Server: company/2.0 project/2.8 # metadata
Content-Type: text/html # metadata
<html> # html content
HTML page with images and other file content.
<img src="/cat.png" />
</html>
If the response contained, other resources; subsequent calls are made to the server to fetch them. Here, in the next call, the browser makes a call to the server to fetch the cat.png
file.
HTTP/1.1
- it introduced 'keep-alive' mechanism. Connections can be reused. There's no longer a need to open multiple connections for each request.
-
Pipelining was introduced, which allowed the client to make a new request without waiting for the response from the previous one. The responses were sent in the order they were requested.
- But this couldn't be handled well by the proxy servers(entities) between the client and server.
- Cache control mechanisms and support for chunked responses was added.
- It was easier to create new Headers and methods. It was stable for more than 15yrs.
Here's an image from MDN showing the request/response structure.
HTTPS
Netsape came up with additional layer SSL on top of HTTP to encrypt the transmission of messages between client and server.
Authoring before REST
When the World Wide Web(browser) was created by Tim Berners-Lee, he wanted to create a medium, where everyone can share, edit, create and share the documents over the internet. But eventually, the webpages became read-only for most users. The write access was limited to few folks, who can change these documents on the servers.
In 1996, people of World Wide Web consortium addressed the problem of authoring on the web and HTTP was extended to allow authoring, creating WebDAV - Web Distributed Authoring and Versioning. It is an extension to HTTP that lets clients edit remote content on the web. Any server that supports WebDAV can act as a file server. There are other extentions like CalDAV which allows the client to schedule events on the server and CardDAV, which allows you to share the contact informatiom on the remote server which work similarly.
Popular clients of WebDAV are Microsoft Office, OpenOffice, DropBox, etc. Any fie sharing system you can think of uses WebDAV behind the scenes to create/ share/ modify/ delete your files or folders over the internet.
β¨RESTβ¨
In 2000, Representational State Transfer (REST) was designed to use the HTTP protocol. It uses the basic methods defined in HTTP/1.1 and allowed any web application to modify the data, without the need to update its servers. One drawback of it was each website defined its own RESTful APIs and had complete control over them.
Think of this as using an open API(MovieDB) on your website, to display the content, but also allowing your clients to modify the data(marking a movie as favourite). But any modification of the data wouldn't effect the data in server where the Open API's being hosted. But instead, these changes are persisted on your website server where you handle these changes on top of Open API data.
HTTP/2
As webpages became complex, more scripts being added for interactivity of the webpage... more and more data was being transfered through the HTTP requests. This created more overhead for the HTTP/1.1 connections. In early 2010, Google created an experimental protocol SPDY, which laid foundation for HTTP/2.
Officially standardised in 2015 and 35.7% of websites use HTTP/2
- Binary protocol, unlike HTTP/1.1 which is text based
- multiplexed protocol. Enables the client to make parallel requests.
- It compresses headers, which are common for similar set of requests. This removes duplication and overhead of data transmitted.
- Server push. This allows the server to populate the client cache
HTTP/3
It uses QUIC instead of TCP in the transportation layer. 26.5% of websites use HTTP/3. QUIC is a connection-oriented protocol that creates a stateful interaction between a client and server. QUIC authenticates the entirety of each packet and encrypts as much of each packet as is practical. QUIC packets are carried in UDP datagrams [UDP] to better facilitate deployment in existing systems and networks.
Initially an acronym QUICK was described as Quick UDP Internet Connections, but RFC 9000 mentions it as name, not an acronym.
- multiplexed protocol, unlike HTTP/2 which runs on single TCP connection. QUIC runs on multiple UDP streams.
- Endpoints communicate in QUIC by exchanging QUIC packets. Most packets contain frames, which carry information and application data between endpoints.
- Application protocols exchange information over a QUIC connection via streams, which are ordered sequences of bytes.
- bidirectional streams, which allow both endpoints to send data;
- unidirectional streams, which allow a single endpoint to send data.
- A credit-based scheme is used to limit stream creation and to bound the amount of data that can be sent.
- QUIC depends on congestion control to avoid network congestion using QUICK-RECOVERY algorithm for detecting and recovering losses.
How to create a server with HTTP/2/3 protocol?
Many programming lanaguages have in-built libraries, for us to create servers. Node.js has http, http2 libraries to build implement servers with HTTP protocols.
Golang has similar libraries http, and http2, http3β¨
But the support for creating servers which use HTTP/3 is still a work in progress in Node.js community. As of today, 18 Aug 2023, there's no library available for us to get on HTTP3.
Keep track of HTTP/3 and QUIC
- Node.js - status : currenly blocked on QUIC implementation #38478
Reference
HTTP - MDN Docs
What is WebDAV
RFC 9000
Top comments (0)