DEV Community

Cover image for How a Browser Works: A Beginner-Friendly Guide to Browser Internals
SATYA SOOTAR
SATYA SOOTAR

Posted on

How a Browser Works: A Beginner-Friendly Guide to Browser Internals

What Happens When You Type google.com in the Browser?

Before we deep-dive into the internals of the browser, let's have a little overview of "What happens when you type google.com in the address bar of the browser".

DNS Resolution

The browser doesn't know what google.com is. It only understands IP addresses. These fancy names like google.com, amazon.com, or chaicode.com are for us humans to understand. The browser only needs the IP address of the machine on which these websites are hosted.

The browser delegates this task to the DNS. Think of DNS as the phone book of the internet. It has all the domain names, including their IP addresses.

We will not do a deep dive into DNS now. But if you want to understand how DNS works, then please visit this blog.

Protocols: TCP and UDP

After DNS resolution, we get the IP address of the machine. But as a client, we can't directly ask for the resources. We need to establish a proper connection first by following a protocol.

There are a lot of protocols, but in this blog we will briefly discuss only two of them:

  • TCP
  • UDP

1. Transmission Control Protocol (TCP)

  • It is the most reliable connection.
  • Keeps the order of the packets (chunks of data) the same.
  • Detects loss.
  • Retransmission occurs if packets are lost.
  • Flow control.
  • Congestion control.
  • Used in text messaging and emailing.

2. User Datagram Protocol (UDP)

  • It is less reliable.
  • No retransmission occurs even if a packet is lost.
  • It does not keep the order of packets the same.
  • No congestion control.
  • Used in video calls.

Generally, to connect with a server, we use TCP protocol's 3-way handshake.

We will not do a deep dive into protocols now. But if you are interested in how the 3-way handshake works, please visit this blog. It will help you understand it in the simplest way possible.

HTTPS and HTTP

After we establish a secure connection between the server and the client (browser), we make an HTTPS connection where keys are shared for encryption and decryption.

After that, via an HTTP request, we can get the resources (web page) of the website.

What Is a Browser?

A web browser is software that enables users to access and view content on the World Wide Web. Its primary function is to locate and retrieve web pages, images, videos, documents, and other files from servers and display them on the user’s device.

When you type a website’s URL into the browser and hit Enter, the browser sends a request to the server where the website’s files are stored using protocols like HTTP or HTTPS. The server responds by sending back files, usually written in HTML, CSS, or JavaScript, which the browser interprets and displays as a web page.

Components of a Browser

Browser Components

1. User Interface

This is generally what we see in the browser's software. These include the address bar, back and forward buttons, extension section, etc.

In simple terms, it is every part of the browser except the window where you see the requested page.

Browser UI

2. Browser Engine

A browser engine is a bridge between the user interface and the rendering engine. It controls and coordinates everything that happens inside the browser.

Think of it as the brain of the browser.

3. Rendering Engine

The rendering engine is responsible for loading and displaying web content. It starts by getting the contents of the requested document from the networking layer.

Critical Rendering Path

Parser

  • The HTML bytes are parsed by the HTML parser and converted into a DOM. DOM stands for Document Object Model. In the early days, HTML files were called documents.

index.html

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <title>DOM and Frame Tree Example</title>

    <link rel="stylesheet" href="styles.css" />
  </head>

  <body>
    <p>
      Hello
      <span>Students</span>
    </p>

    <div>
      <img src="https://via.placeholder.com/120" alt="Sample Image" />
    </div>
  </body>
</html>

Enter fullscreen mode Exit fullscreen mode

Image DOM (Document Object Model)

  • The CSS file is parsed by the CSS parser and converted into a CSSOM. CSSOM stands for Cascading Style Sheet Object Model.

style.css

/* BODY */
body {
  margin: 20px;
  font-family: Arial, sans-serif;
}

/* P FRAME */
p {
  font-size: 16px;
}

/* TEXT NODE STYLE (SPAN INSIDE P) */
p span {
  font-size: 16px;
  font-weight: bold;
}

/* IMAGE FRAME */
img {
  font-size: 16px; /* included only to mirror your diagram */
  float: right;
}

Enter fullscreen mode Exit fullscreen mode

CSSOM (Cascading Stylesheet Object Model)

  • DOM and CSSOM are independent and are parsed in parallel.

Frame Constructor

  • A frame constructor transforms DOM nodes with computed styles into frame objects.
  • It is responsible for creating frames for elements and text.

Frame Tree

Reflow

  • It is responsible for calculations.
  • Reflow computes the size and position of each frame in the frame tree.

Reflow process:

  • Start at the root frame
  • Apply CSS box model rules
  • Calculate widths and heights
  • Position children relative to parents
  • Propagate layout information

Painting

  • Before painting, the browser converts the frame tree into a display list.
  • Painting converts this display list into actual pixels.

The rendering engine flow explained above is from Firefox's Gecko engine. Other rendering engines follow a similar flow.

Below is the rendering engine flow of Safari's WebKit engine. Both are almost the same, with minor differences in terminology.

WebKit Render Engine Flow

Gecko calls the tree of visually formatted elements a frame tree, where each element is a frame. WebKit uses the term render tree, and it consists of render objects. WebKit uses the term layout for placing elements, while Gecko calls it reflow.

Attachment is WebKit’s term for connecting DOM nodes and visual information to create the render tree. In Gecko, this occurs during frame construction.

4. Networking Layer

This is the layer where all the network calls are made.

Discussing the networking layer of the browser is a separate rabbit hole and not within the scope of this blog.

5. JavaScript Interpreter

The JavaScript interpreter or JavaScript engine is responsible for executing JavaScript code on the DOM or the CSSOM.

Every browser uses a different JavaScript engine. Node.js is created from Chrome's V8 engine.

Examples:

  • Chrome - V8 Engine
  • Safari - JavaScriptCore
  • Firefox - SpiderMonkey

Javascript Engine

6. UI Backend

The UI backend is used to draw basic widgets like combo boxes, alerts, pop-up windows, and frames.

This backend exposes a generic interface that is not platform-specific. Underneath, it uses operating system UI methods.

7. Data Storage

The browser needs to save data locally (cookies, cache, etc.), so the data storage component handles this part.

Modern browsers also support storage mechanisms like:

  • localStorage
  • IndexedDB
  • File System

Conclusion

Flow of the Internet

When you type google.com in the browser and hit Enter, the browser first resolves the domain name using DNS to get the IP address of the server. After that, it establishes a connection with the server using network protocols like TCP or UDP, secures the connection using HTTPS, and sends an HTTP request to fetch the website resources. Once the response comes back, the browser starts working internally by parsing HTML into DOM, CSSinto CSSOM, and then using the rendering engine to construct frames, calculate layout through reflow, and finally paint pixels on the screen so the webpage becomes visible to us.


Hope you liked this blog. If there’s any mistake or something I can improve, do tell me. You can find me on LinkedIn and X, I post more stuff there.

Top comments (0)