1. HTML source code → Bytes
- When you request a webpage (https://example.com), the server sends raw bytes across the network (TCP packets).
- These bytes are not text yet — they’re just sequences of numbers.
Example: the string <h1>
in UTF-8 encoding is sent as bytes:
60 104 49 62
(< = 60, h = 104, 1 = 49, > = 62 in ASCII/UTF-8).
2. Bytes → Characters
- The browser needs to decode the bytes into actual characters.
- It uses the character encoding specified by the server
(Content-Type: text/html; charset=UTF-8)
or by the HTML<meta charset="utf-8">
.
Example:
Bytes 60 104 49 62
→ Characters <h1>
.
3. Characters → Tokens
- Now the browser runs the HTML tokenizer.
- The tokenizer scans the characters and groups them into tokens, which represent meaningful chunks of HTML.
Types of tokens:
- StartTag token:
<h1>
- EndTag token:
</h1>
- Text token:
Hello World
- Comment token:
<!-- note -->
At this point, the browser doesn’t know about parent/child relationships yet — it just has a sequence of tokens.
4. Tokens → Nodes
Each token is turned into a node (an object in memory).
Types of nodes:
- Element nodes:
<h1>
→ an element node named h1. - Text nodes:
Hello World
→ a text node containing a string. - Comment nodes:
<!-- note -->
.
5. Nodes → DOM Tree
The browser’s HTML tree builder arranges these nodes into a tree structure based on nesting rules of HTML
Example input:
<body>
<h1>Hello</h1>
<p>World</p>
</body>
Becomes a tree:
Document
└── body (Element)
├── h1 (Element)
│ └── "Hello" (Text)
└── p (Element)
└── "World" (Text)
Now this DOM tree is the structure your JavaScript code can traverse with APIs like document.getElementById().
Here's the structure summary:
HTML (bytes) → decode → characters → tokenize → nodes → structured DOM tree.
Top comments (0)