DEV Community: Erik Hoogeboom

Improving Largest Contentful Paint (LCP) using iterators

Erik Hoogeboom — Tue, 07 Nov 2023 06:00:00 +0000

In the previous part I explained how we used an iterator to improve the Time to First Byte (TTFB) for our search result pages which we generate using Server-Side Rendering (SSR). The way this iterator is used enables a type of lazy loading. It improves the Time to First Byte by executing an expensive call to load the result products while rendering the page instead of before rendering it. Now that the products are loaded halfway the rendering of the html, this introduces a new phenomenon: a small yet very much noticeable pause as soon as the iterator starts loading the 40 products to show on the search result page.

This pause happens when the page looks somewhat like this:

After the short pause, rendering continues and the products are shown on the result page:

Even though the pause is relatively short (somewhere between 100-400ms), the pause hurts an important performance metric: the largest contentful paint (LCP).

What is the ‘Largest Contentful Paint’?

According to web.dev, the Largest Contentful Paint “marks the point in the page load timeline when the page’s main content has likely loaded—a fast LCP helps reassure the user that the page is useful”. In other words: the Largest Contentful Paint is the moment in time where the page seems finished loading. You will want this to be as quickly as possible to avoid annoying the user with a seemingly slow website.

This means it is crucial to get as many pixels of the first screen in their final state as quickly as possible. By the ‘first screen’ I mean what the user sees without scrolling, also known as ‘above the fold’. Of course, whatever happens below the fold is also important, but not as important as what happens above it.

If the search result page pauses loading for a bit when a large part of the screen is still empty, this will negatively affect the LCP. After all; the LCP counts when most of the screen seems to have reached its final state.

This specific problem with our search result page only applies to desktop browsers. That’s because on a mobile browser the first search result is just outside the first view. Or, if you will, just below the fold. That is why the LCP on mobile devices is already much better than on desktop browsers. But since a significant part of our traffic comes from desktop browsers, this problem needed a solution.

To improve the LCP we modified the iterator that is returned when the templates start looping over the 40 products in the ProductList.

Before the changes this is how the Renderer interacts with the ProductList:

As you can see, the first time an iterator is created for the ProductList is the moment when the 40 products are loaded using getProducts(). After that, the products can be easily iterated over by getting an iterator from an ArrayList that holds the 40 loaded products. But loading the 40 products takes up a large part of the time spent before the LCP.

Batched loading

In order to improve the LCP we want to load the first two rows of products (8 in total) as soon as possible. The remaining 32 products are less important as they are only needed below the fold. To accomplish this we want to load the products in batches of 8 products at a time.

If we want to have more control over which of those 40 products are loaded – and when to load them, we have to implement a custom iterator.

In our case (using Java with Freemarker), when a template starts to loop over a list of items, the template engine sees that the ProductList implements the Iterable interface, so it uses the iterator() method to create a new iterator for the list. Earlier, this is where we used to load all 40 products (if not done already) and then call iterator() on the filled-up ArrayList holding the 40 products. If we want the products to be loaded in batches of 8 products at a time, we have to implement an actual iterator to do so.

In the final solution, a custom ProductListIterator is created when the iterator() method on the ProductList is called. Upon creation, the following parameters are passed to the iterator: the ArrayList holding the 40 product keys, a reference to the ProductService where to load products and a reference to a Map holding already-loaded products. This last parameter prevents products being loaded twice when the ProductList is iterated over more than once.

The following diagram shows the final solution:

" width="800" height="647">

When iterating over a list, the hasNext() and next() are called for each iteration. The call to hasNext() only peeks if there is another item to iterate over and the call to next() makes the next item ‘current’. In our custom iterator, we use an internal index to keep track of the current product. It is a zero-based integer so we can use it to get the current product’s key from the ArrayList holding the 40 product keys.

A call to next() first increments the internal pointer. After that, the current product key is retrieved from the list of product keys. If the product for the current key is not already loaded, a batch of to-be-loaded product keys is being collected. This batch contains the current product key and the next 7 keys. Then, the products are retrieved from the ProductService and stored into the Map holding already loaded products. Finally, the current product is taken from the map to be used as return value.

In the following 7 calls to next() the then current product will have been loaded already so its key will already exist in the Map holding already loaded products. These calls to next() will not cause any products to be loaded and will return a product immediately.

When the next() method is called for the 9th time, another 8 products will be loaded using getProducts(). And so forth, until all 40 products have been iterated over.

When the ProductList is iterated over for a second time, all products will have been loaded already so they will not be loaded again.

In summary, we were able to dramatically improve both the Time to First Byte (TTFB) and Largest Contentful Paint (LCP) of our search result pages by utilizing a custom iterator. And we did not have to change a single template to accomplish this.

Improving web performance using iterators

Erik Hoogeboom — Sat, 08 Apr 2023 17:42:15 +0000

In this post I will explain how we were able to improve the performance of our website by using iterators. In the first part I will show how we used an iterator to improve the time to first byte (TTFB) for some specific pages. In the second part I will show how we modified our iterator to improve the largest contentful paint (LCP) of those pages.

Part 1: improving TTFB

The time to first byte (TTFB) is an important metric to determine a website’s responsiveness. In general, web.dev recommends a TTFB of 0.8 seconds or less.
Of course we followed the general recommendations for improving TTFB but even after that we noticed that especially the search engine result pages (SERP) had a relatively bad TTFB. In order to find out what was causing this poor performance, we added logging in several places and were quickly able to find out exactly where most of the processing time was spent.

Like most dynamic web sites, we use a template engine to render the html. In the following diagram (which is of course a very much simplified version of what is actually going on) you can see which methods are being executed in order to render a search engine result page:

As you can see, the RequestController is being called to serve the request. It figures out which page is being requested and collects all data required to render that page. As soon as all data has been collected the execution is passed to the Renderer which is when the first bytes are sent back to the client.

When a product search page is being requested the SolrClient is called to get the first 40 results to be shown on the requested search result page. Since our Solr instance does not actually store the complete product objects, only the product ids are used from the search result. The next step is to have the ProductService load the 40 product objects using those 40 product ids.

This is where most of the time is spent. In our case, loading those 40 products typically takes anywhere between 100ms and 400ms. Not only does this vary (by a lot), but it is obviously too much processing time before the first byte. In comparison, fetching search results from Solr takes only about 20ms.

Loading those 40 product objects takes that much time because they are complex objects. Retrieving a single product from the database takes a few queries in different database tables. On top of that, there are 2 layers of object cache. But with a product catalog of over a million products, a search query can easily yield results that are not in any cache and have to be retrieved from the database.

Since the process of loading products from the database is already very much optimized using combined queries, there was no way to further optimize this. The only way forward… was to load nothing at all. At least, not before TTFB. After all, the actual products are not yet needed before the template can start rendering and the first bytes of HTML can be sent to the user.

The first idea to optimize the TTFB was to change two things:

Pass only a list of product ids to the template
Have the template load the products at the time they are needed

This approach had a number of drawbacks:

This required building a custom template function and explaining template developers how and when to use this
The whole operation requires coordination of when to change the templates and when to stop loading products before TTFB
We did not have a clear view which templates had to be changed

Especially that last one was a deal breaker. Our website uses search results on many places. For instance on the home page or on category pages showing the first 10 results of a search request for those categories. Those inline search results use the same data structure as the normal search result page. So in order to eliminate loading products before TTFB, we would have to change a lot of templates. This would have required even more coordination of when to have templates changed and when to stop loading products before TTFB.

Ideally the solution would not require to change anything in any template. This is why using a custom iterator is such a valuable solution. No template has to be changed and the application developer can control the entire change.

So what exactly is an iterator?

In simple terms: it is an object that implements a predefined way of iterating over it. The actual implementation differs from one programming language to another, but in pretty much any modern language you can work with custom iterators. I say custom because all of these languages already come shipped with standard iterators. Those standard iterators are used when iterating over a simple list or array. These are so common that you probably never realized you are actually already using iterators all the time.

Template engines in Java (JSP, Velocity, Freemarker), PHP (Twig, Blade), Python (DTL, Jinja) all support working with iterators in their loop or foreach directives. All of these template engines will keep working if you drop-in replace an existing list or array with an iterator. Or, depending on the language, an iterable object.

In our case, the actual solution was to change the existing list of products by a custom ProductList. This is a custom class that implements the Collection interface so it can be used in a loop directive in our templates. The list initially only holds 40 product ids and a reference to the ProductService where it can retrieve the actual product objects from. As soon as any template starts iterating over the list, it will check if the 40 products have been loaded already and it will load them if necessary. This allows the ProductList to be iterated over multiple times. The products will be loaded only once, but no sooner than actually required. This is also known as ‘lazy loading’.

When loading the 40 products is no longer needed before TTFB but done implicitly by the templates when they start iterating over the ProductList, the process now looks more like the following diagram:

As you can see, the time-consuming getProducts() method is no longer called before TTFB. It is now being called from inside the rendering process whenever a template starts iterating over the ProductList for the first time. This will cause the call to render() to take more time than before, but the TTFB will be a lot better than before.

In the next part I will explain how we were able to improve the Largest Contentful Paint (LCP) metric for the same search result page by improving the inner workings of this ProductList.