How does the web work? What happens behind the scenes? Where do you start if you're completely new to all this?
I don't have a definite answer but I have studied this for sufficiently long time to put together a post which can cover everything (the important bits) in one place.
You can essentially use this as a cheat sheet or last minute prep to prepare for web/software engineering interviews.
I know it can't be an exhaustive list of all things but I'm putting the effort to get as close to it as possible to help anyone who's looking for it.
Okay, let's start with "what happens when you type in Google.com or visit any URL in your web browser?"
What happens when you type Google.com in the browser?
When you type in Google.com and hit enter. Your browser wants to translate the domain name Google.com into an IP address. This process is called Domain Name Resolution and is done by DNS (Domain Name System).
Here’s how it starts: Your browser will look for IP address of Google.com in your browser’s DNS cache. If it’s not there, it’ll check your OS (Operating System’s) DNS cache. If it’s not there, it’ll send that request to your Router. If it’s not there, it’ll send that request to your ISP.
Still not there? Your ISP will send that request to one of the 13 root servers (not exactly 13 physical servers but 13 named authorities that maintain hundreds of these physical servers around the world).
The root server will share the address of the respective TLD (Top Level Domain) server with your ISP. In this case, it is “.COM” TLD server.
Your ISP then contacts that respective TLD Server, which returns the address of the Authoritative DNS server.
Finally your ISP contacts the Authoritative DNS sever which will return the IP address (the “A” record of DNS) back to it.
Your ISP then sends it back to your router and your router sends it back to you. At this point, your ISP, Router, OS and Browser will most likely maintain a cached record of this IP in their tables.
Why? The next time you or someone in your network requests for Google.com, one of these entities will know where to send that request without making a full DNS trip.
Now that your browser knows the address of Google.com, it is time to start making HTTP requests to actually get the page/resource.
Github repo: https://github.com/alex/what-happens-when
What is HTTP?
HTTP is Hyper Text Transfer Protocol. What is a protocol? Protocol is a set of rules that tells two machines how to talk to each other (similar to the rules in our language).
HTTP is stateless protocol, that means it doesn’t maintain a state after each request. This also results in HTTP being connectionless. That means the client (your browser/machine) and the server is only aware of each other during one/current request.
To fix this, HTTP operates over TCP (Transmission Control Protocol) which opens and keeps the connection alive.
HTTP/1.0
A new TCP connection was required for each request/response pair. That leads to poor performance because it is time and resource expensive to create a TCP connection for each request.
Image source: Mozilla
HTTP/1.1 (persistent connections are default)
We still have to wait for a response before sending a request but we can have multiple request/response pairs in a single connection.
Also, we can open multiple connections but most browsers support up to 6 parallel connections per domain.
To fix this limitation, a technique called domain sharding is used where resources are delivered from multiple subdomains. For example, you may have seen images being served from i1.wp.com, i2.wp.com on WordPress.com.
HTTP/1.1 (with pipelining)
Multiple requests can be sent without waiting for a response but the responses have to be in order they were requested. This is poorly supported by browsers and servers and is almost never used.
HTTP/2 (multiplexing)
Multiple requests can be sent without waiting for a response and the responses can be in any order. This vastly improves performance (see a demo: here and here. It can also break the responses into smaller items and send them as soon as they are ready.
HTTP/2 (with push)
Let’s say we request index.html, the server can check this file needs few more files like style.css and scripts.js and the server automatically “pushes” these to us. Thus the browser didn’t need to make separate requests for them.
Reference:
https://stackoverflow.com/questions/36517829/what-does-multiplexing-mean-in-http-2
Refer this sweet animation which explains all this beautifully: https://freecontent.manning.com/animation-http-1-1-vs-http-2-vs-http-2-with-push/
What about HTTPS?
HTTPS is a way of encrypting HTTP. It basically wraps HTTP messages up in an encrypted format using SSL/TLS.
SSL is secure sockets layer and TLS is transport layer security, both are cryptographic protocols designed to provide communications security over computer network.
SSL is basically deprecated and TLS is what we all use (so if someone says SSL today, they’re most likely referring to TLS).
Note that TLS is not mandatory according to the HTTP/2 spec but most browsers will only allow HTTP/2 over TLS.
Encryption: Asymmetric vs Symmetric
TLS handshake uses asymmetric encryption where the server produces two keys: public and private. Data can be locked/encrypted using the server’s public key but can only be unlocked/decrypted using the server’s private key.
The server sends server’s public key to the client. It can be seen by anyone on the network but it doesn’t matter because it is a public key (which can only be used to lock/encrypt things).
Client now creates a new symmetric key on its side (a key that can be used to both lock and unlock). Client locks this symmetric key in a box using the server’s public key and sends it to the server.
People can again see this box on the network but cannot unlock it without the server’s private key. (Note: The act of secretly listening to a private conversation is called eavesdropping).
Once server receives this box, it unlocks it using server’s private key and finds a symmetric key with a note from the client which says “we shall now communicate using this symmetric key”.
At this point, client and server don’t need to send keys anymore. They can send data by locking it with the symmetric key and unlock it using the same symmetric key.
💥 Boom! That’s how data is securely sent over the network!
Reference: https://www.cloudflare.com/learning/cdn/cdn-ssl-tls-security/
Here's a cool video that I created with my friends, it explains encryption with motion graphics:
HTTP Request Methods
Each HTTP request have their own headers which tell the server what the request is all about. You can see this by opening developer tools in your browser, heading to the network tab, loading a page and then inspecting each request.
HTTP methods are classified into two types:
- Safe methods: They don’t change anything on the server side.
- Idempotent methods: They produce the same results no matter how many times they are called.
Idempotence catchphrase →
Send and send and send my friend, it makes no difference in the end.
Now let’s look at each of these methods:
GET
Fetch a resource from the server.
Example: GET /users/207
Safe: ✅
Idempotent: ✅
POST
Modify/update a resource on the server.
Example: POST /users
Safe: ❌
Idempotent: ❌
PUT
Create a new resource or overwrite if one is present.
Example: PUT /users/207
Safe: ❌
Idempotent: ✅
DELETE
Remove a resource from the server.
Example: DELETE /users/207
Safe: ❌
Idempotent: ✅
PATCH
Apply partial modifications to a resource.
Example: PATCH /users/207
Safe: ❌
Idempotent: ❌
POST vs PUT vs PATCH
There is an excellent StackOverflow post that covers this in detail. Here’s an excerpt from that post →
POST to a URL creates a child resource at a server defined URL.
PUT to a URL creates/replaces the resource in its entirety at the client defined URL.
PATCH to a URL updates part of the resource at that client defined URL.
If the endpoint on the server is idempotent (i.e safe to do the request over and over again) and the URI is address to the resource being updated then we can safely use PUT, else use POST.
PUT will essentially take an object and replace the entire resource at the URI. For example, if we had to update only the email of a user, we would send the entire resource:
PUT /users/207
{
name: "Omkar Bhagat",
email: "omkar@bhagat.com",
city: "Mumbai"
}
If we only send email (a partial resource) as follows →
PUT /users/207
{
email: "omkar@google.com"
}
Then the rest of the properties will have NULL
values. That’s where PATCH comes to our rescue.
Reference: https://medium.com/@kumaraksi/using-http-methods-for-restful-services-e6671cf70d4d
Reference models
When it comes to online communication, we have some conceptual model or framework that partitions the communication system into abstract layers.
Why is this important? Let’s understand this with a real life example.
- You place an order for food via a smartphone app
- The order is received at the server and sent to the restaurant
- That restaurant prepares the food and packages it when done
- A delivery person arrives on a motorcycle, collects the food and delivers it to you!
Now this whole process can be partitioned into different layers (where each layer operates independently). You can fix problems at each layer, make changes and it won’t affect the other layers.
Like let’s say instead of delivering food via motorcycle, you decide to upgrade to a truck. Or your organization switches from iOS app to Android App. It doesn’t affect the other layers. If there’s a problem at any particular layer, you know who to call and what to do. That’s the power of abstraction.
There are two models to remember:
OSI Model
OSI (Open Systems Interconnection) is a 7 layer reference model developed by ISO (International Organization Of Standarization) in 1974. No modern protocols implement this model fully.
Image Source: GeeksForGeeks
TCP/IP Model
TCP/IP networking model is used by the TCP/IP protocol suite which is used by the internet (inter-network of computing devices).
Image Source: Wikipedia
TCP vs UDP
These are two important transport layer protocols. The basic difference is that TCP is connection oriented and UDP is connectionless (meaning it doesn’t establish a connection before sending data).
TCP is reliable because data sent via TCP is guaranteed to be delivered to the receiver. If data is lost in communication, it will recover and resend it. It’ll also check packets for errors to make sure the data received is not corrupted. UDP doesn’t do any of that.
TCP also provides flow control that ensures the sender is not overwhelming the receiver with too many packets at a time. To do this, both sender and receiver maintains “buffers”. Each time a packet is received, a message is sent to the sender with the value of current receive window.
UDP doesn’t provide flow control, instead packets arrive in a continuous “stream”.
TCP ensures the packets sent and received are in the correct order. UDP doesn’t care about order.
Since TCP does a lot of extra work (establish connection, error-check, flow control, ordering) to be “reliable”, it is slower than UDP.
TCP is best suited for applications that require high reliability → HTTP, HTTPS, SSH, FTP, SMTP, etc. For example, email, file transfer, internet banking (where every digit/packet matters).
UDP is best for applications that need speed and efficiency like streaming videos, online games, DNS resolution, VoIP, etc.
Email Protocols
Just because I mentioned SMTP in the previous section, I’ll quickly cover 3 email protocols.
- SMTP (Simple Mail Transfer Protocol)
- POP3 (Post Office Protocol 3)
- IMAP (Internet Mail Access Protocol)
SMTP
SMTP is pretty much the delivery truck that takes your email and delivers it to the right address after you hit the send button. It’s the protocol for transferring email over the internet.
POP3
POP3 is what you can use to receive/download an email. It’s designed to delete the email from the server as soon as you download it.
IMAP
IMAP is again what you can use to receive/download an email (and the folder structure) while retaining the email on server. You can think of it as remote file server. For example, you can download email on your mobile and laptop and it’ll continue to stay in your email inbox (and at all other places).
I'm tempted to end the post here but I'll quickly add a bonus Networking section here which could help to cover a few things at a glance.
Networking
Networking is vast but as the title of this post says, I’ll only cover the high level view.
Hub → It can only broadcast to every other node connected to it.
Switch → It knows the MAC (media access control) address of every node connected to it. It knows where to send the data within the network.
Router → It can do all of the things above and route traffic between networks. It’s intelligent and knows how to find the shortest path to the destination.
MAC address → Media Access Control. Every network component has its own MAC address (physical address) that’s embedded on its hardware. Your phone, router, desktop, laptop, everything has one.
DHCP → Dynamic Host Configuration Protocol. This is what gets (leases) you a dynamic IP every time you connect to the internet.
NAT → Network Address Translation. It’s what turns private addresses into public and vice versa.
ARP → Address Resolution Protocol. ARP request is nothing but broadcasting a packet over the network to validate whether we came across the destination MAC address or not.
IPv4 → IP version 4 is 32 bit address containing four octets (4x8bits). Example: 12.244.233.165. This is still very much in use.
IPv6 → It’s 128 bit address containing eight hextets (8x16bits) separated by colons. Example: 2001:0db8:0000:0000:0000:ff00:0042:7879. This is still pretty new.
CDN → Content Delivery Network. The idea is to distribute/cache your content around the world at different servers and serve it to the client from the nearest server (resulting in faster delivery).
Subnetting & CIDR → Subnetting is the process of dividing a network into smaller network sections. CIDR (Classless Inter Domain Routing) is an alternative to traditional subnetting.
BGP / OSPF → Border Gateway Protocol, Open Shortest Path First are routing protocols. To oversimplify, BGP (an exterior gateway protocol) is used at the edge of your network to connect your network to the Internet. OSPF (an interior gateway protocol) is usually used internally inside your network.
Finish
There's so much more to cover but it can't be done in one single post. Please let me know if you find this helpful and if you'd like me to dive deeper into any of these topics. Thanks.
Top comments (0)