DEV Community

Cover image for How HTML works? In Simple words.
Jaison D Souza
Jaison D Souza

Posted on

How HTML works? In Simple words.

HTML is a markup language responsible for the structure of a web page.
The skeleton of a web page is HTML.

Instead of talking about various tags and how to use them, let us understand how HTML works, how it renders, and how the browser displays it.

When a client asks a server for a web page (this happens when you load a link), the server sends an HTML text file.
It is a plain text file, nothing else.

<html>
  <head>
    <title> My Web Page </title>
  </head>
  <body>
    <p> Hello World </p>
  </body>
</html>
Enter fullscreen mode Exit fullscreen mode

The browser cannot directly render this raw text in a meaningful way.
So it must convert the text into internal structures that it understands.

I will briefly explain the steps involved.


Tokenization and State Machine

Step 1: Tokenization

The browser reads the HTML character by character and converts it into tokens.
Tokens follow a set of structured rules defined by the HTML specification.

The HTML code shown earlier would be converted into tokens like:

  • StartTag: html
  • StartTag: head
  • StartTag: title
  • Text: My Web Page
  • EndTag: title
  • EndTag: head
  • StartTag: body
  • StartTag: p
  • Text: Hello World
  • EndTag: p
  • EndTag: body
  • EndTag: html

Why is this conversion needed, and how does it happen?

This process is implemented using a state machine.

The diagram shows multiple states with transitions based on characters such as <, >, /, and text content. These transitions help the parser identify opening tags, closing tags, and text nodes.

The state machine follows strict rules to convert the input stream into tokens.
Although it may look complicated, it is essentially a series of conditional transitions (similar to if-else logic).

Why do we need tokens?
Tokens make it easier to convert the HTML into a DOM because each token clearly represents what kind of element it is.


DOM, CSSOM, and Render Tree

Step 2: Tokens → DOM

The next step is building the DOM (Document Object Model).

The tokens are processed and converted into a tree structure called the DOM.
Each node in the tree represents an element, text node, or attribute from the HTML document.

A hierarchical structure starting with the html root node, branching into head (containing title → My Web Page) and body (containing p → Hello World).


Step 3: DOM + CSSOM → Render Tree

When CSS is added, the browser parses it into another structure called the CSSOM (CSS Object Model).

  • DOM contains structure and content
  • CSSOM contains styling rules

The browser combines the DOM and CSSOM to build the Render Tree.

The Render Tree contains only visible elements and the styles that apply to them (for example, elements with display: none are excluded).

The Render Tree defines:

  • what needs to be displayed
  • how it should be styled

Note on JavaScript:
When JavaScript modifies the DOM, for example:

document.body.innerHTML += "<div>Hello</div>"
Enter fullscreen mode Exit fullscreen mode

the browser may trigger layout recalculation (reflow) and repainting.
However, the entire tree is not always re-rendered—only the affected parts are updated when possible.


Layout and Paint

Step 4: Layout Calculation (Reflow)

In this step, the browser calculates the exact position and size of each node in the Render Tree.

For every node, the browser computes:

  • width
  • height
  • position

This process is recursive and can be expensive, especially when changes affect large portions of the page.

Layout calculation produces precise instructions required to generate the final visual output.


Step 5: Paint

During the paint phase, the browser converts layout information into actual pixels.

This includes painting:

  • text
  • colors
  • borders
  • backgrounds
  • shadows

In simple terms:

Layout → Paint → Pixels on the screen


Optional next step (for completeness)

Modern browsers often perform an additional step called Compositing, where painted layers are combined and optimized (often using the GPU) before being displayed on the screen.


That's all. Thank you reading till the end. You might have a basic idea about how HTML and small part of browser works. These were my learning based on studying the working of HTML. If any mistakes are there fell free to correct me.

Top comments (0)