HTTP is the most used technology now and indispensable for all the web services.
We can easily send HTTP requests with HTTP client libraries we like, or can easily run HTTP servers with whatever languages and frameworks we like. We don't need to think about what is happening in the libraries or frameworks.
However, I believe that understanding the basic idea of HTTP would be useful for many programmers, when thinking of what HTTP header should be added or thinking of what is HTTP/2 for example.
In this post, I'm going to explain how HTTP works, starting with running TCP socket programs.
Let's start with taking a look at TCP, which is underlying HTTP, with a tiny TCP socket program.
The ability of TCP is quite simple, send / receive data as byte stream. There is no data format, no end of data, no rule for timing of sending data. Thus, client can send whatever data to the server at whenever it wants, and vice versa.
Here is sample Ruby programs of TCP server / client, both of which can send data, input from keyboard, and output data, received from the other.
# server.rb require 'socket' port = 20000 server = TCPServer.new(port) socket = server.accept Thread.new do loop do data = socket.gets p data end end loop do data = gets socket.print(data) end
# client.rb require 'socket' host = '127.0.0.1' port = '20000' socket = TCPSocket.open(host, port) Thread.new do loop do data = socket.gets p data end end loop do data = gets socket.sendmsg data end
You can check how they work by opening two terminals and running each scripts above.
$ ruby server.rb
$ ruby client.rb
* Note that
server.rb must be run first because
client.rb expects to the server already run.
You can type any letters you want. When you type Enter key, you may notice that both client and server can send / receive data each other at any time.
Now, let's take a look at how HTTP clients send HTTP requests, using the previous
Before going ahead, we need a simple HTTP server as a sandbox.
In theory, we can build HTTP server program by extending
server.rb program above, but it's too challenging to do within this post.
Instead of building HTTP server program by ourselves,
WEBrick::HTTPServer in Ruby is available as HTTP server in localhost. Just copy & paste the CLI command below, and run it to start a HTTP server.
$ ruby -rwebrick -e 'WEBrick::HTTPServer.new(:DocumentRoot => "./", :Port => 20000).start'
The goal of this section is to send HTTP requests to the sandbox server above and receive 200 response with nice response body by sending appropriate data with
As we saw in the previous section, communication between client and server with TCP socket does not have any rule about sending / receiving data.
HTTP, which is based on TCP, is one of the protocols which provides the specification of "How server and client communicate each other". In other words, HTTP defines the data format, timing to send data, when to close connection, etc.
You can check specifications in detail on RFC 7230. (I am also trying to read this page to write this post)
Let's start with the first line of request. As defined section-3.1.1:RFC 7230, HTTP method, request target and HTTP protocol version appear on the first line, with the format below.
method SP request-target SP HTTP-version CRLF
If we want to send
GET request to the root path with
HTTP/1.1, for example, the first line should be like this.
GET / HTTP/1.1
So let's try to send the data above using
client.rb and type the strings above.
$ ruby client.rb GET / HTTP/1.1
Then, press enter key twice and you would get HTTP response with response code 200.
"HTTP/1.1 200 OK \r\n" "Content-Type: text/ html; charset=\"UTF-8\"\r\n" "Server: WEBrick/1.3.1 (Ruby/2.4.1/2017-03-22)\r\n" "Date: Tue, 23 Jan 2018 13:33:51 GMT\r\n" "Content-Length: 9173\r\n" "Connection: Keep-Alive\r\n"
Cool! We could send HTTP request to the server successfully with simple, tiny, toy TCP socket program.
Now, take a closer look at when the server sends response to the client.
Remember that you pressed enter key twice when you send a HTTP request, even though the first line ("GET / HTTP/1.1") had been sent right after the enter key was pressed once.
It is because the server is supposed to return response after the client finishes sending all the data. Therefore, the server has to evaluate that it has received entire data by looking at the data itself according to specification of HTTP protocol.
The entire HTTP request format is defined like this. (See section-3:RFC 7230)
HTTP-message = start-line
*( header-field CRLF )
[ message-body ]
You may notice that
header-field is supposed to follow the first line, so the server cannot response before making it clear
header-field is over. In other words, the server calculates the end of
header-field with receiving double
CRLF appears at the end of
header-field first and again in the next empty line)
That is why the HTTP server sent response to the client after pressing enter key twice, which means
header-field is empty and the request message is over here.
Keep-Alive header introduced in HTTP/1.1, HTTP connection would be closed after sending response from server to client once. (appendix-A.1.2:RFC 7230)
In HTTP/1.0, each connection is established by the client prior to the request and closed by the server after sending the response.
You can try to send data again after receiving responce, and you may soon notice that the connection is already closed. (
A HTTP connection is over here and you must start with connecting to the server again.
We experienced connecting to HTTP server with TCP socket program in this post. It is clear now that HTTP is based on TCP connection and introduces common rules of how to use TCP for efficient communication between web applications and clients (Mobile app, CLI programms, etc).
Please note that the information about HTTP protocol introduced in this post is only a couple of features of HTTP protocol. To understand HTTP in practice, you need to know much more, such as roles of request headers, connection management (such in
Keep-Alive header), or other things stated in the RFC documents.
Hope this post will help you understand the first step of how HTTP works.