Firstly, we need to understand the web browser on your device a little better. In this article I'll refer to three core components of browsers.
The first is the rendering engine (also called the browser engine), which reads HTML and CSS files and renders (outputs) the content on the screen. This is the component that creates the DOM! It can actually be used outside the browser, for example email clients use a rendering engine to display HTML email. You may have heard of the rendering engines used in popular browsers - Blink (Chromium browsers, i.e. Chrome, recent versions of Microsoft Edge and many more), Gecko (Firefox) and Webkit (Safari).
These components work together inside your browser to produce webpages. They tend to be written mainly in the programming language C++.
The core functionality that browsers provide is, like the web itself, not centralised, but based on certain standards. When referring to the features browsers make available to developers, I'll refer to the Mozilla Developer Network web docs rather than the actual standards, because they are a more accessible guide to the tools available to us and how they are implemented in different browsers.
Well, the browser provides it in a variable called
window to access the stuff on it, just saying
someProperty is the same as saying
window.someProperty (in most cases).
The definition of what the browser should provide on the window object has been standardised, using interfaces. This is an object-orientated programming term which refers to the description of an object, rather than the object itself. Though an interface is generally a point of interaction, here it means the description of an object, because that enables the interaction of objects to happen smoothly, since they know what properties and methods another object has.
Here's two things we should know about interfaces:
The interface name is written in PascalCase as a convention.
Interfaces can take properties and methods from other interfaces, by inheriting them from an ancestor interface, or getting them from an unrelated interface called a mixin. We'll see this later.
Here's MDN's documentation on the interface for the window object: Window.
Have a look and you'll see there's quite a lot on there. The functionality the browser gives us to communicate with it is known as Web APIs.
API stands for application programming interface. In other words, someone wrote an application, in this case the browser, and they also wrote a set of features and rules so you could interface (interact) with it using programming.
When you call
fetch(), or any other Web API method, you are making use of the runtime environment provided by the browser. The main difference with these methods is that they are asynchronous, which means they don't necessarily run immediately after the previous command in your JS code - you make a request for an action, which is queued up and runs when possible. For example in the case of
fetch(), there will be a delay while it gets the requested resource.
The Web APIs make use of objects with properties and methods, just like the window object. In the fetch API, one of these is the Response object. The API defines exactly what the structure of the object should be.
But we're not going to talk about all the weird and wonderful APIs available to us in the browser: we want to know what the DOM is. There's just one more thing to look at first: a property of the window object called document.
Just like how the window object is the container for almost all of the 'global' stuff (console, scrollbars, window dimensions etc.) in your browser, the document is a container for the content, i.e. the webpage itself. It represents what you give the browser, not what's already there. This can be an HTML, XML or SVG document, but we're just going to talk about HTML.
You can give your browser a HTML file by asking it to open one stored locally on your device, or you can request to view a website, causing the browser to retrieve the file from that website's server via the internet. The browser's rendering engine (mentioned at the beginning) then does two things: parse the HTML (read the code line by line), then create a tree of elements.
When I say create a tree, I'm not talking about planting. It's one way of storing data with a programming language, by creating objects that have 'family' relationships between them. These 'family' relationships are the same you create in an HTML document.
The relationships are defined by edges (which clearly ought to be called 'branches', but never mind...). The objects at the end of an edge are known as nodes, because this means the place where lines join (it's also the place where a leaf and stem join on a plant, so it's a bit closer to the tree metaphor). But remember, a node is still just a type of object.
The node at the very top of the tree is called the root. Visually, the structure would be sort of like a tree. What the browser creates is known as a document tree: a node tree where the root node is a document. It stores information about the document in that root node, and each HTML element on the page and any text inside them also has its own node.
Let's finally talk about the DOM.
The DOM, technically, is not the document tree, i.e. the data structure itself. It's the model that describes how the data should be stored and interacted with. However, you will often hear people saying things like 'manipulating the DOM', which is simpler to say than 'manipulating the document tree'. I'll use DOM in this sense too, for convenience.
The technical term for it is an 'object model', which means it defines some objects and how they can be manipulated, but we don't need to worry about that. Just know that's what DOM stands for: Document Object Model.
To be clear, the DOM is a generic API for manipulating documents. There is a specific offshoot for HTML called the HTML DOM API (remember that other types of documents can be modelled by the DOM). But this distinction doesn't really affect us practically.
We can see the interfaces we need in MDN's documentation on the DOM and HTML DOM. (The 'official' description is currently WHATWG's DOM Living Standard, and the HTML DOM is defined in WHATWG's HTML Living Standard.)
Let's use an example to understand interfaces.
window as the global object), I have access to the
document object, as discussed.
It's described by the Document interface. On the list of methods, you will see Document.querySelector(). This lets me use CSS selector syntax to get an element from the document - in this case, an HTML element, because our document is HTML.
Now say I have an
<input> element in my HTML file with an id
const input = document.querySelector('#my-input');
In my example, I now have a variable pointing to the element object. Specifically, it's a HTML input element, described by the HTMLInputElement interface (part of the HTML DOM). You can see from the properties listed that I can access the value (the text) in the input and read/write it. Pretty useful.
Now looking at the methods, you'll see things like blur() and focus(). Very useful too. But look at where they come from - they are inherited from HTMLElement. My
input is a type of HTMLElement, so it gets properties and methods shared by all HTML elements.
The inheritance doesn't stop there - HTMLElement is a type of Element (now we're back in the generic DOM API). There's some useful stuff there too, like setAttribute(), so I could add, say, a class on my input field in certain circumstances.
Let's keep moving on up. An element is a type of Node. We know what those are. Element isn't the only type of node - Document is also, of course, a type of Node, since it's the root node of the tree. And we mentioned before that the text inside an element gets its own node, Text, which you can read/write from the node with the textContent property.
Note: we may be confused here because there is also an HTMLElement.innerText and an Element.innerHTML property. As MDN explains, these properties have poorer performance and
innerHTML can leave you vulnerable to cross-site scripting (e.g. I get the value from my input and set the
innerHTML of a
div somewhere else to whatever it is - someone could have written a
textContent is the better property to use.
Handily, browsers give us some nice tools that help us view and interact with the DOM.
Here I opened Chrome developer tools on the Google homepage and inspected their festive logo
The Elements tab shows us the image tag and its place in the document. It looks like it's just an HTML tag, but it's not. We could see the original HTML by right-clicking the page and selecting 'view page source'.
In fact, the Elements tab is a visual representation of the DOM, and the elements in it are objects.
Let's prove this by going to the Console tab. If we enter
$0 (the Console shortcut for logging the element currently selected in the Elements tab) this will just show us the same representation. But if I use
console.dir I can see the object:
Here we can see all the object's properties, including those inherited properties.
We can see the prototype object by expanding the
__proto__ property. If we kept following the chain up we'd end up at
console.log(window) on a blank HTML page you could still find them. When I accessed the logo
Hopefully you now have a better idea of what DOM objects are and how to inspect them. If you want to learn more about inspecting the DOM with Chrome devtools, Google provides a guide here.
Now we understand the DOM and how to use it, let’s look more closely at the process of rendering a page, so we can think more carefully about how we use the DOM.
This is why it's often advised that you put the
<script> tag at the bottom of your HTML, so the HTML can be loaded first. Alternatively, you can change the default behaviour by using the
async attributes on the script tag.
The browser also creates a CSS Object Model (CSSOM). This is similar to the DOM, but instead of representing your HTML document, it represents your CSS style sheets and their content with interfaces.
It’s an API, so you could interact with it to alter your styles, but you are usually better off defining all the styles you will need in your stylesheet first, then if necessary changing what they apply to using the DOM, by altering the class names on your elements (or using the
style attribute on the elements if you prefer).
To get ready for rendering, the DOM and CSSOM are combined to create another tree, the render tree. Anything that won’t be displayed on the page, e.g. the
<head> element, is excluded. The render tree contains all the information the browser needs to display the webpage.
The browser assembles the layout of elements on the page (like doing a pencil sketch before a painting), then paints the elements to the screen.
This means that if we respond to user interaction on the page by changing the DOM, the browser will have to do some work to re-layout and repaint items on the page. This has a performance cost, and could be what we would call expensive in performance terms. However, the browser responds to events efficiently as possible, only doing as much re-layout and repainting as necessary. This is explained in Tali Garsiel's research on how browsers work.
Bear that in mind, because there is sometimes a misconception that the reason we have fancy front-end frameworks is that the DOM itself is slow. That wouldn't make sense - frameworks still have to use the DOM, so they couldn't possibly make it faster. Really, it's all down to how you use the DOM.
Let's look briefly at the history and present of DOM manipulation to understand this.
Modern libraries and frameworks don't need to tackle deficiency in the DOM, but they do aim to improve your efficiency and productivity in using it. It's not the sole reason they exist, but it's a big one.
If you are writing a simple website with limited user interaction, you probably won't run into the efficiency problem, provided you're not doing something very silly performance-wise with your DOM manipulation. But simple sites are not all we have on the web today – web applications such as Facebook are very common.
The core front end libraries and frameworks most used today are React, Angular and Vue.js. These aim to take efficient DOM manipulation off your hands, so there is more emphasis on what you want the page to look like, not how this should be achieved. If you want to make web applications professionally, your best bet is to simply pick one of these frameworks and learn it (you don't have to, but most companies use one of them or one like them).
Let's recap the key points:
- The DOM is an API provided by browsers, but the term is also often used to refer to the document tree. The document tree is a model of your HTML document created by the browser's rendering engine.
- Front end libraries and frameworks can help improve your productivity with the DOM, but you should be aware of why you are using them to ensure you get the best out of them.
Thanks for reading and happy DOM manipulation! 🙂
I cross-reference my sources as much as possible. If you think some information in this article is incorrect, please leave a polite comment or message me with supporting evidence 🙂.
* = particularly recommended for further study
- Browser engine - Wikipedia
- Window - MDN
- API - MDN Glossary
- Tree (data structure) - Wikipedia
- What is the Document Object Model? - w3.org
- * Document Object Model (and related pages) - MDN
- * Ryan Seddon: So how does the browser actually render a website | JSConf EU 2015
- How Browsers Work: Behind the scenes of modern web browsers - Tali Garsiel, published at html5rocks.com
Document tree image credit: Birger Eriksson, CC BY-SA 3.0, via Wikimedia Commons (side banner removed)