Detailed Steps in DOM Construction

#javascript #beginners #webdev

1. HTML source code → Bytes

When you request a webpage (https://example.com), the server sends raw bytes across the network (TCP packets).
These bytes are not text yet — they’re just sequences of numbers.

Example: the string <h1> in UTF-8 encoding is sent as bytes:

60 104 49 62

(< = 60, h = 104, 1 = 49, > = 62 in ASCII/UTF-8).

2. Bytes → Characters

The browser needs to decode the bytes into actual characters.
It uses the character encoding specified by the server (Content-Type: text/html; charset=UTF-8) or by the HTML <meta charset="utf-8">.

Example:
Bytes 60 104 49 62 → Characters <h1>.

3. Characters → Tokens

Now the browser runs the HTML tokenizer.
The tokenizer scans the characters and groups them into tokens, which represent meaningful chunks of HTML.

Types of tokens:

At this point, the browser doesn’t know about parent/child relationships yet — it just has a sequence of tokens.

4. Tokens → Nodes
Each token is turned into a node (an object in memory).

Types of nodes:

5. Nodes → DOM Tree
The browser’s HTML tree builder arranges these nodes into a tree structure based on nesting rules of HTML

Example input:

<body>
  <h1>Hello</h1>
  <p>World</p>
</body>

Becomes a tree:

Document
 └── body (Element)
      ├── h1 (Element)
      │     └── "Hello" (Text)
      └── p (Element)
            └── "World" (Text)

Now this DOM tree is the structure your JavaScript code can traverse with APIs like document.getElementById().

Here's the structure summary:

HTML (bytes) → decode → characters → tokenize → nodes → structured DOM tree.