1. HTML source code → Bytes
- When you request a webpage (https://example.com), the server sends raw bytes across the network (TCP packets).
- These bytes are not text yet — they’re just sequences of numbers.
Example: the string <h1> in UTF-8 encoding is sent as bytes:
60 104 49 62
(< = 60, h = 104, 1 = 49, > = 62 in ASCII/UTF-8).
2. Bytes → Characters
- The browser needs to decode the bytes into actual characters.
- It uses the character encoding specified by the server (Content-Type: text/html; charset=UTF-8)or by the HTML<meta charset="utf-8">.
Example:
Bytes 60 104 49 62 → Characters <h1>.
3. Characters → Tokens
- Now the browser runs the HTML tokenizer.
- The tokenizer scans the characters and groups them into tokens, which represent meaningful chunks of HTML.
Types of tokens:
- StartTag token: <h1>
- EndTag token: </h1>
- Text token: Hello World
- Comment token: <!-- note -->
At this point, the browser doesn’t know about parent/child relationships yet — it just has a sequence of tokens.
4. Tokens → Nodes
Each token is turned into a node (an object in memory).
Types of nodes:
- Element nodes: <h1>→ an element node named h1.
- Text nodes: Hello World→ a text node containing a string.
- Comment nodes: <!-- note -->.
5. Nodes → DOM Tree
The browser’s HTML tree builder arranges these nodes into a tree structure based on nesting rules of HTML
Example input:
<body>
  <h1>Hello</h1>
  <p>World</p>
</body>
Becomes a tree:
Document
 └── body (Element)
      ├── h1 (Element)
      │     └── "Hello" (Text)
      └── p (Element)
            └── "World" (Text)
Now this DOM tree is the structure your JavaScript code can traverse with APIs like document.getElementById().
Here's the structure summary:
HTML (bytes) → decode → characters → tokenize → nodes → structured DOM tree.
 
 
              
 
    
Top comments (0)