Marcio Frayze

Posted on Nov 16, 2022 • Edited on Dec 29, 2022

Virtual DOM: What problem does it solve?

#elm #react #webdev #javascript

A Brazilian Portuguese version of this article is available here.

I vividly remember a conversation I had with a co-worker in 2014. He was enthusiastic about a JavaScript library developed internally by Facebook that the company had decided to make open-source. I'm talking, of course, about React.

Like many at the time, I was skeptical. Mixing JavaScript with Html?! What a horrible idea! What about jsx then? For heaven's sake, no! Leave my HTML templates alone! It is obvious that CSS should be separated from HTML and HTML separate from JavaScript! 😤

My opinion was no exception. The web development community in general was not very receptive to React's ideas. At a time when many people, including myself, still believed that two-way data binding was the future, the ideas proposed by React's team were very disruptive and those responsible for disseminating the project had difficulty transmitting the advantages of this new approach.

As the years passed, my vision has changed, as has the opinion of many in the frontend community. And as you may know, today React holds a significant slice of the market and has influenced many other libraries and frameworks (not only for the web but in any form of graphical interface, including mobile and desktop applications).

But from the very beginning, in the first versions of React, one thing caught my attention: the Virtual DOM. I was intrigued by that idea. What problem was it trying to solve? 🤔

What is a Virtual DOM anyway?

Explaining to a developer the concept of Virtual DOM is relatively easy. It is nothing more than a JavaScript object that somehow represents a possible DOM state.

The idea is that instead of changing the DOM objects directly, a representation of it is kept in memory in a JavaScript object. When any change to the screen is needed, a new Virtual DOM is created, with a new representation of the page. The library will then compare these two objects (the previous version which represents the old screen and the one that represents the new content that should be displayed), find out what changes need to be applied to the browser DOM so that it looks the same as what is in the Virtual DOM, and finally apply these changes to the browser DOM to sync it with the new Virtual DOM.

When asking my colleague what was the main advantage of using a Virtual DOM, he replied without hesitation: "making changes to the DOM is very costly, so making the changes using a Virtual DOM is much faster!". 🚀

I understand that updating (or creating a new) JavaScript object is much easier and faster than changing the actual DOM of the browser. But to display or change something on the screen, we have to change the DOM anyway! That is, with the addition of the Virtual DOM, to make any visual changes, we now need to update the information in two places: in the Virtual DOM and in the browser DOM. This second part is done automatically by React itself (or any other library that uses the concept of Virtual DOM), but still needs to be done. Without this, the user would not be able to see the new representation.

So I couldn't understand how something like this would increase the performance! The browser needs to do what it had done before (update the DOM) and still have more work to do: update the Virtual DOM, do the diffing of what changed (compared to the previous version), create the patchs so that the 2 representations are equal and after all this, finally update the DOM. 😮‍💨

💡 And the answer to my question is really simple: Virtual DOM doesn't increase your application's performance!

By continuing this discussion with other developers, I was told that it would be faster for large applications. But I couldn't understand how that would be possible... At least not in this magical way that they were trying to sell me. 🧚

So I decided to do what any stubborn person usually do: I refused to use Virtual DOM. But since I'm also curious, I allowed myself to keep this idea in the back of my head, hibernating. But until I understood the real motivation for it's creation and what problem it solves, I wouldn't use it.

My first Single-Page Application: Implementing a SPA imperatively

It took me a while to get into the single-page applications wave. When the first versions of Angular, React, Elm and so many other solutions for creating SPAs appeared, at work I was still implementing systems using the old-fashioned JSF (JavaServer Faces) and, at home, was trying out Ruby on Rails and also learning how to develop native Apps for iOS and Android.

But one day in 2014 I decided to implement a new page in my spare time and felt it would be a good time to finally understand how to implement a SPA. It's no longer available but, in short, it was a webapp where you could find (and register) places to donate food.

The backend was implemented using RackStep, a Ruby library I was developing at the time, in conjunction with MongoDB. To keep things simple, I chose Heroku as the cloud provider. For the frontend, I decided to implement everything in pure JavaScript! I would never do this in an application at work but, as it was something quite experimental, I chose to do everything without any library or framework and figure out in practice - through a lot of pain, trial and error - the kind of problems those fancy JavaScript libraries and frameworks could help me with in future projects. The question I wanted to answer was essentially this: what is the minimum amount of dependencies required to develop a SPA? At first I thought it would be zero. And so it was, until production. Even the build scripts were made using only the good old bash script. To be honest, I used some dependencies to minify (reduce the size) of the files. I also used the Google Maps API. That's all.

Early on it became clear that I would need some way to control the state of the application. What's happening? What step is the user at? Is she reading the description? Registering a new address? Looking for addresses in the region?...

These were some of the easy to predict possible states. A small state machine was enough. But because it was a SPA that communicated with a server, each request could result in some kind of error. What should happen when a request returns an error? What if it timeout? What if the address the user entered is not found? What if the person does not allow access to their location in the browser?!

🚨 For every question I tried to answer, the state machine grew exponentially, as did the amount of bugs! 🐛

One of the first architectural decision I made was to opt for a synchronous flow. Even though the experience might be suboptimal, each step needed to be very well defined. If it were expecting a request from the server, any other action on the screen was blocked and showed an animation in the "Wait, loading information" style. That helped a lot! But it didn't solve everything. The bugs kept popping up 🐛 and the code became increasingly annoying to maintain. Clearly that solution wouldn't scale well. 🙁

Another recurring problem was how to synchronize what appeared on the screen with the application data. For example, when a person typed an address, this would be recorded directly in the fields on the screen (and only there). When the user pressed a button, it was necessary to:

Get this information directly from the DOM (doing a query in the input fields);
Validate the entry;
- If it was a failure, change the DOM to display an error message and stop the flow;
Lock the screen (displaying a "Wait, loading" message), make the AJAX call to the backend and wait for its return;
Validate the return of backend;
- In case of failure, display error message and stop the flow;
Finally, display a message indicating that the address has been successfully registered!

In addition, it still needed to address all other possible types of errors that an asynchronous process can generate.

Throughout this process, something began to annoy me:

🚨 I was saving the entire state of my data model in the DOM. In the same place where I was supposed to display the representation of this data. That is, the view layer was mixing with the model/business layer.

Any visual change could lead to changes in the queries I used to obtain and assign business values. I isolated it as much as I could, but it bothered me. A lot.

I tried different approaches to separate the layers, but it was very difficult to maintain the data in sync. It vaguely followed the ideas of two-way data binding, but it was very common for the business layer to be out of sync from the view layer (DOM). In the end I chose to leave the whole state in the DOM. It was the easiest way to avoid headaches, but I wasn't happy about it. I knew that, again, this solution wouldn't scale very well. 🙁

At the bottom of my head, sometimes that question arose: would a Virtual DOM help me here? But the answer would not come until a few years later.

First experiences with declaratively software development

Around 2017 I began to be interested in functional programming. I went straight into the most extreme language I found at the time: Haskell. After reading a lot and taking some online courses, I could barely get away with the famous "hello world". 😬 It was tough. The potential advantages of that approach were noticible, but in practice I could do almost nothing. Frustrated, looked for simpler alternatives. Didn't want to give up, but I decided to postpone this adventure into Haskell. I had clearly bitten off more than could chew, especially since there was no one around with the necessary knowledge to guide me properly. That's when I decided to learn the Elm programming language. 🌳

Elm has many similarities to Haskell. Both are functional, strongly typed and pure language. The big difference is that Elm is focused on the development of webapps, while Haskell is a general-purpose programming language. In addition, there are several other technical differences that turn Haskell into a more complex language.

My goal was to use Elm as a bridge: I would learn the basics through it and then migrate to Haskell. But in the middle of the process I fell in love with Elm's philosophy and ended up postponing this trip to Haskell to this day. And it was through this language that I finally had a clearer first view of the advantages of using that idea of Virtual DOM! 🤩

The Elm Architecture

To develop webapps in Elm I was forced to understand (The Elm Architecture), also known as Model View Update (or just MVU). I have already described in detail this model in this video (in Brazilian Portuguese) and one of the fundamental features is that it is a declarative model. And this is the magic word that, after I understood the meaning, Virtual DOM started to make perfect sense.

💡 The main motivation of creating the Virtual DOM is to enable a declarative approach in developing the view layer of a webapp.

The MVU Architecture is divided into 3 parts:

Model — a data structure where the state (model) of your application is stored.
View — a function that takes as a parameter a model and returns a data structure representing a new version of the DOM (in other words, returns a Virtual DOM).
Update — a function that allows you to "change" (create a new) model through messages.

I will not go into details of the entire MVU model in this article. I'll focus on the first two letters: M and V (Model and View). They are the ones that will solve most of the problems I described having encountered in my previous attempt, when I tried to create my first single-page application using an imperative approach. And they are the ones hiding the secret advantages of using a Virtual DOM. 🥷

If you want to understand the MVU model in practice, I recommend watching this video (in Brazilian Portuguese).

The view function

The view function is very simple. As I said earlier, its responsibility is quite specific: from a model, it should be able to represent an equivalent presentation layer (view). The best way to think about it is as just a pure transformation function: the input is a data structure representing the current state of the application, and the output is a representation of what should be displayed on the screen for that application state condition. Each time the application state changes, the view function is re-executed to get the new screen representation.

💡 The view function allows you to generate the entire representation of the screen from scratch, ignoring any previous context. There is no other state being kept anywhere. Screen representation always depends solely and exclusively on the current state of the application.

That is why we call this approach declarative. No matter the sequence of steps that led the application to its current state. The only thing that matters is your current state.

And that's the big difference from the imperative approach. In an imperative implementation, it is necessary to be aware of the timing between the representation of the state of our application and the representation of the screen itself. When one changes, you must make sure that the other part is also in sync. Any small error in this process can generate inconsistencies, and debugging this type of scenario can be quite stressful. With the help of the view function, this sync becomes fully transparent! As I said, every time the model changes, the view function is re-executed.

But maybe you're thinking, what about the other way around? When you change the screen, how will the template be updated? And the answer is simple:

💡 You never update the screen (DOM objects) directly! That's why this model is called single-way data flow.

And how is the template updated?

In the MVU architecture, some rules are enforced. In addition to not being allowed to change the objects of the DOM directly, it is also forbidden to change the model (application state) directly. The only way to change the model is within the update function.

But this part deviates a little from the scope of defining a Virtual DOM and therefore will be part of another article. 😉

Couldn't we do all this directly into the DOM?

One last important question that needs to be answered is:

🤔 Couldn't we just re-generate the entire DOM every time there is a change, thus ruling out the need for a virtual DOM?

In theory, it's possible. But this approach brings two major problems.

The first one is performance-related. Re-creating the entire browser DOM every time any changes are required is very costly and time-consuming. Perhaps this will become possible in the future, but today this approach would perform very poorly even on extremely simple pages.

The second big problem is even worse: the user experience would be terrible! That's because, the way browsers work today, re-creating the entire DOM object tree would cause the page scrollbar state to be lost (the page would return to the top all the time), the screen elements would flash, among other small behaviors that would be noticeable to our users.

The Virtual DOM solves these problems, and by using a library or framework, this whole job is fully transparent to the developer who is implementing the page.

Other approaches

Although Virtual DOM is a fairly appropriate way to solve the problems described throughout this article, not every modern library and framework uses this concept. Svelte, for example, uses another approach. So it is clear that there are alternative ways to handle these problems, each with its advantages and disadvantages.

Conclusions

Although the concept of Virtual DOM is apparently simple, it brings a big change in the way graphical interfaces are developed. The first time I came across this concept was during the development of webapps with the React library and also the Elm programming language, but this technique spread to the different types of graphical interfaces. The same single-way data flow mechanisms is very present today, for example, in the development of native mobile Apps.

After learning The Elm Architecture (or MVU Architecture), it became much easier to understand Android's Jetpack Compose, Apple's SwiftUI, the mechanism for creating and updating the screens in Flutter and also, of course, React Native.

And it is easy to understand where the idea that the Virtual DOM would bring better performance came from. Actually, this technique allows the amount of changes in the DOM to be minimized with little effort. So, in a way, it helps a lot in the development of fast and light pages. But I prefer to give the credits not to Virtual DOM, but to the declarative approach. Virtual DOM is just one tool to achieve this way of implementing web pages.