A Dive Into The World of Web.

#webdev #beginners

An Overview of The Web

The Web is an Internet-based distributed information system. And since, there is no central control or administration for the Web hence anyone can potentially put material on the web and can retrieve information from it. The Web is the most widely used internet services, among Electronic Mail, File Transfer Protocol (FTP), Chat Rooms, Mailing list, Instant Messaging, Chat, and News Groups.

Although as a user most of the things are abstracted away from us, we can still look at the nitty-gritty stuff.

What happens in a nutshell?

As soon as you enter say www.amazon.com, the game of requests and responses begin:

Finding the right address i.e., resolving what you entered (URL), Your computer needs to know the exact address to send a request to, i.e., the address of the webserver on which amazon is hosted.
A Request is sent to the server of the website.
The response of the server is parsed i.e., the webserver sends the web page back to your web browser.
The page is rendered and displayed and voilà you have the information you needed right away.

Before we dig deep into these steps, first we must get familiar with some major web components. A basic understanding of all of these components and how they work together lays a good foundation to dive into the world of the Web.

Networks: The local-area network and the wide-area networks connecting computers world-wide forming the Internet.
Clients and Servers: Generally all the machines connected to the web are called clients and servers.
- Clients are the typical web user's internet-connected devices (for example, your computer connected to your Wi-Fi, or your phone connected to your mobile network) and web-accessing software available on those devices (A web browser such as Firefox or Chrome is an HTTP client, it runs on your computer to access Web servers on any Internet hosts).
- Servers are special computers that store webpages, sites, or apps. When a client device wants to access a web-page, a copy of the web-page is downloaded from the server onto the client machine to be displayed in the user's web browser.
  - But, What is a Web Browser anyway? And how does it work? A web browser is software that helps users obtain information from the Web, letting you see text, images, and videos from anywhere in the world. Typically a browser supports the display of HTML files and images in standard formats. The files or information is transferred using the Hypertext Transfer Protocol, which defines how text, images, and video are transmitted on the web. This information needs to be shared and displayed in a consistent format so that people using any browser, anywhere in the world can see the information.
Documents: Web Pages, coded in HTML, supply for information need for the World Wide Web.
Protocols: For programs and computers from different vendors, under different operating systems, to communicate on a network, a detailed set of rules and conventions must be established for all parties to follow. The HyperText Transfer Protocol HTTP that Web clients and servers use to talk to one another which is based on Internet Protocols. Networking protocols are no mystery. Think about the protocol for making telephone calls. You (a client process) must pick up the phone, listen to the dial tone, dial a valid telephone number, wait for the other side (the server process) to pick up the phone. Then must say hello and identify yourself, etc, and you can’t deviate from this protocol to make your call successful. The same goes for computer programs getting to talk to another through a computer network.

Now let’s have a look at all four steps mentioned above in detail.
Step 1 - URL Gets Resolved

The website code you requested is obviously not stored on your computer and hence needed to be fetched from the server where it is stored. When you connect to the internet you do so via an Internet Service Provider (ISP). You type a domain name or web address into your browser to visit a site; for example google.com, amazon.com, microsoft.com, etc.
The Web uses Uniform Resource Locators (URLs) to identify (locate) resources (files and services) available on the Internet. A URL may identify a host, a server port, and the target file stored on that host. URLs are used, for example, by browsers to retrieve information and by HTML to link to other resources.

A full URL usually has the form

scheme://server:port/pathname

The scheme part indicates the information service type and therefore the protocol to use. Common schemes are https, FTP, file, mailto, telnet, news, etc.

When you ask your browser for a web page, the request is sent across the Internet to a special computer known as a web server which hosts the website.
Web servers are special computers that are constantly connected to the Internet, and are optimized to send web pages out to people who request them.
It is called a “server”. Because it serves some purpose, in our case, it serves the website.
Because the Internet is a global network of computers each computer connected to the Internet, must have a unique address. The Internet address is in the form xxx.xxx.xxx.xxx where xxx must be a number from 0-255. This address is known as the IP address (IP stands for Internet Protocol), it is like a unique identifier number for that computer.
You enter “www.amazon.com” (that is called “a domain”) but actually, the server which hosts the source code of a website, is identified via IP (= Internet Protocol) addresses. The browser sends a “request” (see step 2) to the server with the IP address you entered (indirectly - you, of course, entered “www.amazon.com”).
Basically here the user requests for a particular website, and that what it’s called an HTTP request.

How does the domain “www.amazon.com” gets translated to its IP address?
As discussed above every host on the Internet has a unique IP address and a domain name. To translate the domain into its IP address, we have a special type of server out there on the internet - not just one but many servers of that type. Which is a so-called “name server” or “DNS server” (where DNS = “Domain Name System”). The domain name system (DNS) provides a distributed database service that supports the dynamic update and retrieval of information contained in the namespace.
Your computer contacts a network of servers called Domain Name System (DNS) servers to obtain address information for a target host making contact with a server. These act like phone books; they tell your computer the IP address associated with the requested domain name.
The job of these DNS servers is to translate domains to IP addresses. You can imagine those servers as huge dictionaries that store translation tables: Domain => IP address.
When you enter “www.amazon.com”, the browser therefore first fetches the IP address from such a DNS server.

So far we’re just getting the request to the right place. Now, once the IP address is known, we advanced to step 2.

Step 2 - Request is sent

With the IP address resolved, the browser goes ahead and requests the server with that IP address. The unique number that the DNS server returns to your computer allows your browser to contact the webserver that hosts the website you requested. “A request” is not just a term. It really is a technical thing that happens behind the scenes.
The browser bundles up a bunch of information (What’s the exact URL? Which kind of request should be made? Should metadata be attached, etc, etc) and sends that data package to the IP address.
The data is sent via the “HyperText Transfer Protocol” (known as “HTTP”) - as discussed above it is a standardized protocol which defines what a request (and response) has to look like, which data may be included (and in which form) and how the request will be submitted.
Because HTTP is used, a full URL actually looks like this:

http://www.amazon.com

The browser auto-completes it for you.
And also there is HTTPS - it’s like HTTP but encrypted. Most modern pages use that instead of HTTP. A full URL then becomes:

https://www.amazon.com

Since the whole process and format is standardized, there is no guessing about how that request has to be read by the server. Also, your request finds the fastest possible path to the server with the specifies IP. This is not a direct journey. It requires hopping from server to server until we arrive.
The server then handles the request appropriately and returns a so-called “response”. Again, a “response” is a technical thing and kind of similar to a “request”. You could say it’s basically a “request” in the opposite direction.
Like a request, a response can contain data, metadata, etc. When requesting a page like amazon.com, the response will contain the code that is required to render the page onto the screen.
Now, What actually happens on the server?
The requested server figures out exactly what we’re asking for. The server builds up the right content, often pulling information from the database. In the end, a response has to be sent. That response doesn’t have to contain “a website”. It can contain any data - including files or images. The server responds with any combination of HTML, CSS, and JavaScript.
Some servers are programmed to generate websites dynamically based on the request (e.g. a profile page that contains your personal data), other servers return pre-generated HTML pages (e.g. a news page). Or both are done - for different parts of a webpage. There also is a third alternative: Websites that are pre-generated but that change their appearance and data in the browser.
For our simple case, we have a server that returns the code to display a website. So let’s continue with step 3.

Step 3 - Response is Parsed

The web server then sends the web page you requested back to your web browser. But web pages containing images and tons of text are too large to send as a single packet of data. So, to overcome this a single web-page gets pulverized in thousands of thousands of tiny packets of data each wrapped in the information needed to rebuild itself. The data sent over the Internet (and most networks) are sent in manageable chunks. On the Internet, these manageable chunks of data are known as packets. A packet envelops the transmitted data with address information so the data can be routed through intermediate computers on the network. Because there are multiple routes from the source to the destination host, the Internet is very reliable and can operate even if parts of the network are down.
When the web browser fetches data from an internet-connected server and it then uses a piece of software called a rendering engine to translate that data into text and images. This data is written in any combination of Hypertext Markup Language (HTML), Cascading Style Sheet (CSS), and JavaScript, and web browsers read this code to create what we see, hear and experience on the internet.
The browser receives the response sent by the server. This alone doesn’t display anything on the screen though.
Instead, the next step is that the browser parses the response. Just as the server did it with the request. Again, the standardization enforced by HTTP helps of course.
The browser checks the data and metadata that are enclosed in the response. And based on that, it decides what to do.
You might’ve had cases where a PDF opened in your browser. That happened because the response informed the browser that the data is not a website but a PDF document instead. And the browser tries to pick the best handling mechanism for any data type it detects.

Back to our website scenario.
In that case, the response would contain a specific piece of metadata, that tells the browser that the response data is of type text/HTML.
This allows the browser to then parse the actual data that’s attached to the response as HTML code.
The browser knows how to parse HTML and now simply goes through the entire response data (also called “the response body”) to render the website.

Step 4 - Page is Displayed

As mentioned, the browser goes through the HTML data returned by the server and builds a website based on that.
Though it is important to know, that HTML does not include any instructions regarding what the site should look like (i.e. how it should be styled). It really only defines the structure and tells the browser which content is a heading, which content is an image, which content is a paragraph etc. This is especially important for accessibility - screen readers get all the useful information out of the HTML structure.

So lets review, What we have learned so far?

The browser goes to the DNS server and finds the real address of the server that the website lives on.
The browser sends an HTTP request message to the server, asking it to send a copy of the website to the client. This message, and all other data sent between the client and the server, are sent across your internet connection using TCP/IP.
If the server approves the client's request, the server sends the client a "200 OK" message, which means "Of course you can look at that website! Here it is", and then starts sending the website's files to the browser as a series of small chunks called data packets.
The browser assembles the small chunks into a complete website and displays it to you.