DEV Community

Cover image for Detailed Steps in DOM Construction
Rinon Tendrinomena
Rinon Tendrinomena

Posted on

Detailed Steps in DOM Construction

1. HTML source code → Bytes

  • When you request a webpage (https://example.com), the server sends raw bytes across the network (TCP packets).
  • These bytes are not text yet — they’re just sequences of numbers.

Example: the string <h1> in UTF-8 encoding is sent as bytes:

60 104 49 62
Enter fullscreen mode Exit fullscreen mode

(< = 60, h = 104, 1 = 49, > = 62 in ASCII/UTF-8).

2. Bytes → Characters

  • The browser needs to decode the bytes into actual characters.
  • It uses the character encoding specified by the server (Content-Type: text/html; charset=UTF-8) or by the HTML <meta charset="utf-8">.

Example:
Bytes 60 104 49 62 → Characters <h1>.

3. Characters → Tokens

  • Now the browser runs the HTML tokenizer.
  • The tokenizer scans the characters and groups them into tokens, which represent meaningful chunks of HTML.

Types of tokens:

  • StartTag token: <h1>
  • EndTag token: </h1>
  • Text token: Hello World
  • Comment token: <!-- note -->

At this point, the browser doesn’t know about parent/child relationships yet — it just has a sequence of tokens.

4. Tokens → Nodes
Each token is turned into a node (an object in memory).

Types of nodes:

  • Element nodes: <h1> → an element node named h1.
  • Text nodes: Hello World → a text node containing a string.
  • Comment nodes: <!-- note -->.

5. Nodes → DOM Tree
The browser’s HTML tree builder arranges these nodes into a tree structure based on nesting rules of HTML

Example input:

<body>
  <h1>Hello</h1>
  <p>World</p>
</body>
Enter fullscreen mode Exit fullscreen mode

Becomes a tree:

Document
 └── body (Element)
      ├── h1 (Element)
      │     └── "Hello" (Text)
      └── p (Element)
            └── "World" (Text)
Enter fullscreen mode Exit fullscreen mode

Now this DOM tree is the structure your JavaScript code can traverse with APIs like document.getElementById().

Here's the structure summary:

HTML (bytes) → decode → characters → tokenize → nodes → structured DOM tree.

Top comments (0)