DEV Community

Cover image for Making the world’s fastest website, and other mistakes
Taylor Hunt
Taylor Hunt

Posted on • Updated on

Making the world’s fastest website, and other mistakes

This is a story about a lot of things:

  • Fitting a Fortune 20 site in 20kB
  • Diving into site speed so deep we’ll see fangly fish
  • React thwarting my goal of serving users as they are
  • Burning out trying to do the right thing
  • And by the end, some code I dare you to try.

The situation: frustratingly typical

I work on Kroger’s ecommerce sites for their regional chains, most of which share a codebase. You’d probably guess the front-end stack: React, Redux, and their usual symptoms of too much JavaScript.

The WebPageTest activity graph shows a dense forest of yellow representing JavaScript execution, with an majority of red underneath showing when interaction was impossible. The worst part? The x-axis is over 46 seconds.

The facts:

In particular, React SSR was one of those changes that looks faster, but looks can be deceiving. In retrospect, I’m amazed developers get away with considering SSR+rehydration an improvement at all.

Make your code faster… by running it twice!

how React SSR works, apparently

The backstory: developer bitten by a radioactive WebPageTest

I used to ask other developers to stop writing slow code.1 Such as…

  • “Please cut down on the <div>s, they make our DOM big and slow.”

  • “Please avoid CSS like .Component > * + *, it combines with our big DOM into noticeable lag.”

  • “Please don’t use React for everything, it caps how fast we can be.” (Especially if it renders big DOMs with complex styles…)

Nobody listened. But, honestly, why would they?

This carried on, and it was cool/cool/depressing/cool. But a new design system inflicted enough Tailwind to hurt desktop Time to First Paint by 0.5 seconds, and that was enough to negotiate for a dedicated Web Performance team.

Which went well, until it didn’t. Behold, the industry-standard life of a speed optimization team:

  1. Success with uncontroversial changes like better build configuration, deduplicating libraries, and deleting dead code
  2. Auditing other teams’ code and suggesting improvements
  3. Doing the improvements ourselves after said suggestions never escaped backlogs
  4. Trying to make the improvements stick with bundle size monitoring, Lighthouse checks in PRs, and other new layers of process
  5. Hearing wailing and gnashing of teeth about having to obey said layers of process
  6. Realizing we need to justify why we were annoying everyone else before we were considered a net negative to the bottom line

The thing was, WebPageTest frowning at our speed didn’t translate into bad mobile traffic — in fact, most users were on iPhone.2 From a business perspective, when graphs go up and to the right, who cares if the site could be faster?

To prove we weren’t wasting everyone’s time, we used WPO Stats and internal data to calculate that each kB of client-side JavaScript cost us ≈$100,000 per year, and every millisecond until Time to Interactive at least $40,000.3

But proving speed = money only moved us from Anger to the Bargaining stage of performance grief: hoarding improvements to use later, empty promises to fix massive regressions after a deadline, and protesting numbers with appeals to “developer experience”.

The 5 Stages of Performance Grief
Denial It’s fast enough. You’ve seen those M1 benchmarks, right?
Anger You mean I have to care about this, too!? We just got done having to care about accessibility!
Bargaining I promise we will eventually consolidate on just three tooltip libraries if you let us skip the bundle check
Sadness I should have realized the dark path I was going down when I tried to see if npm install * worked.
Acceptance I love my slow website.

Proving that speed mattered wasn’t enough: we also had to convince people emotionally. To show everyone, god dammit, how much better our site would be if it were fast.

So I decided to make a demo site that reused our APIs, but in a way that was as fast as possible.

Spoiler: surprising myself, I succeeded. And then things got weird. But before I can tell you that story, I have to tell you this story…

The goal: how fast is possible?

HTTP/1.1 204 No Content
Cache-Control: max-age=999999999,immutable
Enter fullscreen mode Exit fullscreen mode

This is the fastest web page. You may not like it, but this is what peak performance looks like.

That may seem unhelpful — of course a useful page is slower than literally nothing! — but anything added to a frontend can only slow it down. The further something pushes you from the Web’s natural speed, the more work needed to claw it back.

That said, some leeway is required, or I’d waste time micro-optimizing every little facet. You do want to know when your content, design, or development choices start impacting your users. For everything added, you should balance its benefits with its costs. That’s why performance budgets exist.

But to figure out my budget, I first needed some sort of higher-level goal.

Some sort of higher-level goal

🎯 Be so fast it’s fun on the worst devices and networks our customers use.

Target device: bestselling phone at a local Kroger
Hot Pepper’s Poblano VLE5
$35 ($15 on sale)
Specs: 1 GB RAM, 8 GB total disk storage, and a 1.1 GHz processor.
Target connection: “slow 3G”
400kbps bandwidth
400ms round-trip time latency
At the time, what Google urged to test on and what WebPageTest’s “easy” configuration & Lighthouse used

Unfortunately, connections get worse than the “slow 3G” preset, and one example is cellular data inside said Kroger. Big-box store architectures double as Faraday cages, losing enough packets to sap bandwidth and latency.

Ultimately, I went with “slow 3G” because it balanced the USA’s mostly-faster speeds with the signal interference inside stores. Alex Russell also mentioned “we still see latency like that in rural areas” when I had him fact-check this post.

(These device and connection targets are highly specific to this project: I walked inside stores with a network analyzer, asked the front desk which phone was the most popular, etc. I would not consider them a “normal” baseline.)

(Wait, don’t spotty connections mean you should reach for a Service Worker?)

Yes, when networks are so bad you must treat them as optional, that’s a job for Service Workers.

I will write about special SW sauce (teaser: offline streams, navigation preload cache digests, and the frontier of critical CSS), but even the best service worker is irrelevant for a site’s first load.

Although I knew what specs I was aiming for, I didn’t know what they meant for my budget. Luckily, someone else did.

Google’s suggestions to be fast on mobile

Google seems to know their way around web performance, but they never officially endorse a specific budget, since it can’t be one-size-fits-all.

But while Google is cagey about an specific budget, Alex Russell — their former chief performance mugwump — isn’t. He’s written vital information showing how much the Web needs to speed up to stay relevant, and this post was exactly what I needed:

Putting it all together, under ideal conditions, our rough budget for critical-path resources (CSS, JS, HTML, and data) at:

  • 170KB for sites without much JS
  • 130KB for sites built with JS frameworks

Can You Afford It? Real-world Performance Budgets

(Alex has since updated these numbers, but they were the ones I used at the time. Please read both if you’re at all interested — Alex accounts for those worse-than-usual networks I mentioned, shows his work behind the numbers, and makes no bones about what exactly slows down web pages.)

Unfortunately, the hardware Alex cited clocks 2GHz to the Poblano’s 1.1GHz. That means the budget should lower to 100kB or so, but I couldn’t commit to that. Why?

Engineering around analytics

As usual, third-parties ruin everything. You can see the 2022 site’s cross-origin bytes situation, and it doesn’t include same-origin third-parties like Dynatrace.

Cross-section diagram of clowns packed into a car.

from The Physics Of: Clown Cars · Car and Driver

I can’t publish exact figures, but at the time it was scarcely better. Barring discovery of the anti-kilobyte, I needed to figure out which third-parties had to go. Sure, most of them made $, but I was out to show that dropping them could make $$$.

After lots of rationalizing, I ended with ≈138kB of third-party JS I figured the business wouldn’t let me live without. Like the story of filling a jar with rocks, pebbles, and sand, I figured engineering around those boulders would be easier than starting with a “fast enough” site and having it ruined later.

Some desperate lazy-loading experiments later, I found my code couldn’t exceed 20kB (after compression) to heed Alex’s advice.

Okay, 20kB. Now what?

20 kilobytes ain’t much. react + react-dom are nearly twice that. An obvious alternative is the 4kB Preact, but that wouldn’t help the component code or the Redux disaster — and I still needed HTML and CSS! I had to look beyond the obvious choices.

What does a website truly need? If I answered that, I could omit everything else.

Well, what can’t a website omit, even if you tried?

You can make a real site with only HTML — people did it all the time, before CSS and JS existed.

Maybe if I sprinkled the HTML with just enough CSS to look good… and if I had any room left, some laser-focused JavaScript for the pieces that benefit most from complex interactivity.

(Yes, I see you with the Svelte.js shirt in the back. I talk about it in the next post.)

Amazon serves basically what I just described if you visit with a really bad User-Agent:

What Amazon shows for Opera/9.80 (J2ME/MIDP; Opera Mini/5.1.21214/28.2725; U; ru) Presto/2.8.119 Version/11.10. as viewed by Opera Mini: one product on screen, navigation at bottom, and a site header stripped down to a logo, links to deals/cart/lists, and a searchbar.

And this is the lowest-fidelity version of Amazon, which appears with a skin=noskin cookie.

A product page for two mason jars in glorious Web 1.0 style. Times New Roman on a blank backdrop, and the only formatting is centering the image and Add buttons and some horizontal rules.

So my plan seemed possible, and apparently profitable enough that Amazon does it. Seemed good enough to try.

But everyone knows classic page navigation is slow!

Are you sure about that? The way I figured…

  • If you inline CSS and generate HTML efficiently, their overhead is negligible compared to the network round-trip.
  • A SPA still requests JSON data to render, yeah? Even if you inline that JSON into the initial response, JSON→JavaScript→HTML cannot possibly be faster than skipping straight to the HTML part.
  • Concatenating strings on a server should not be a huge bottleneck. And if it were, how does React SSR justify concatenating those strings twice into both HTML and hydration data?

But don’t take my word for it — we’ll find out how that stacks up next time. In particular, I first need to solve a problem: how do you send a page before all its slow data sources finish?

  1. I still ask other developers to stop writing slow code, but I used to, too

  2. That does not count as insider information. Any US website with a similar front-end payload will tell you the same. 

  3. Those numbers were very loose, conservative estimates. They’re also no longer accurate — they’re much higher now — but they still work as a bare minimum. 

Top comments (17)

peerreynders profile image
peerreynders • Edited

Entertaining read!

After lots of rationalizing, I ended with ≈138kB of third-party JS I figured the business wouldn’t let me live without.

Perhaps time to banish them to a web worker - Partytown style - to preserve your TTI.

Fictional conversation at some unnamed retail location:
Customer: "Excuse me. Do you carry the 'Hot Pepper’s Poblano VLE5' phone?"
Clerk: "I'm sorry. We had to discontinue that model. Customers kept returning it convinced they had a defective unit after they tried to access our web front."

tigt profile image
Taylor Hunt • Edited

Indeed, later the third-parties had grown enough that they exceeded my budget by themselves. I did “solve” it with some code in the same spirit as Partytown, but a different approach. (I promise I’ll write about it later!)

jay8142 profile image
Sir Broddock

Good post, I found it entertaining. I would've liked to have seen mention of other approaches to delivering content quickly such as edge caching. Having data close to the user can be more impactful than shaving off KBs, but it's great to see such a passion for performance that is often sorely neglected.

tigt profile image
Taylor Hunt

I agree. I originally had a piece talking about edge caching and rendering, and maybe I should dust it off and post it.

beqa profile image

Sadness: I should have realized the dark path I was going down when I tried to see if npm install * worked.

I don't know why I laughed on this so much

pauselaugh profile image
Pause Laugh • Edited

back in my day we would just serve content based on the user's environment.

seems like the proper solution isn't creating Lowest Common Denominator designs, so the same site is served on weak and strong hardware, but being able to dynamically serve content based on the specs of the device

meai profile image
Meai • Edited

"JSON→JavaScript→HTML cannot possibly be faster than skipping straight to the HTML part."

I thought about this a lot and I think there is a fairly good case for it being faster on most sites: If a page has a lot of repeated html elements then those repeated elements would need network data to transfer. Meanwhile your js can repeat that structure for free on the client. E.g imagine the simplest example: A 100000 long html table. If you transmit that entire thing, it will be huge. If you render it via js, the data could come from a json and then at least you save on the shell of the table rows, the extras.

The only downside is that all your js needs to be loaded first, so it should ideally be inlined and as small as possible....which it probably won't be at all because the whole point of doing things this way is because you want application level rendering control via js.

It's kind of like a video game engine though: You would never think to save the coordinates of every pixel of every texture of every character and think that this speeds up the initial render. No, you just save enough of the character data so everything can be moved into place in the first frame render. Or in a networking engine: You dont transmit the entire model, you only transmit the difference needed to be able to render the correct state. To constantly transmit the whole html structure should and would be considered wasteful.

tigt profile image
Taylor Hunt

That’s certainly plausible: truly scrutinized JSON can have fewer bytes wrapping around the data bytes than HTML with element wrappers and styling attributes. I’d have to benchmark to find the inflection point — if the HTML cruft is repetitive, compression makes the question more complicated.

itsjoekent profile image
Joe Kent

Doing the improvements ourselves after said suggestions never escaped backlogs

I felt this pain

zwacky profile image
Simon Wicki

This was a great read and you listed many great resources. Wel done 👌

ayoubmehd profile image

Legendary, Love it

mxdpeep profile image
Filip Oščádal

"progressive upgrade" is the keyword here

waylonwalker profile image
Waylon Walker

Fascinating to hear your journey from an F20!

svgatorapp profile image

Following for more of this very insightful (and entertaining) writing style!

aakashgoplani profile image
Aakash Goplani

Hey, I'd like to experiment with Party-town and Dynatrace in my project but unfortunately in the official docs, there is no reference for Dynatrace integration! Anyone have idea how to integrate Dynatrace with Partytown?

sip profile image
Dom Sipowicz

In a way, AMP was the solution...

Image description