This is a story about a lot of things:
- Fitting a Fortune 20 site in 20kB
- Diving into site speed so deep we’ll see fangly fish
- React thwarting my goal of serving users as they are
- Burning out trying to do the right thing
- And by the end, some code I dare you to try.
The situation: frustratingly typical
React/Redux packages used totaled 44.7 kB before any feature code.
Our WebPageTest results spoke for themselves.
This was after investing in Server-Side Rendering (SSR), a performance team, and automated regression testing.
In particular, React SSR was one of those changes that looks faster, but looks can be deceiving. In retrospect, I’m amazed developers get away with considering SSR+rehydration an improvement at all.
Make your code faster… by running it twice!
—how React SSR works, apparently
The backstory: developer bitten by a radioactive WebPageTest
I used to ask other developers to stop writing slow code.1 Such as…
“Please cut down on the
<div>s, they make our DOM big and slow.”
“Please avoid CSS like
.Component > * + *, it combines with our big DOM into noticeable lag.”
“Please don’t use React for everything, it caps how fast we can be.” (Especially if it renders big DOMs with complex styles…)
Nobody listened. But, honestly, why would they?
This carried on, and it was cool/cool/depressing/cool. But a new design system inflicted enough Tailwind to hurt desktop Time to First Paint by 0.5 seconds, and that was enough to negotiate for a dedicated Web Performance team.
Which went well, until it didn’t. Behold, the industry-standard life of a speed optimization team:
- Success with uncontroversial changes like better build configuration, deduplicating libraries, and deleting dead code
- Auditing other teams’ code and suggesting improvements
- Doing the improvements ourselves after said suggestions never escaped backlogs
- Trying to make the improvements stick with bundle size monitoring, Lighthouse checks in PRs, and other new layers of process
- Hearing wailing and gnashing of teeth about having to obey said layers of process
- Realizing we need to justify why we were annoying everyone else before we were considered a net negative to the bottom line
The thing was, WebPageTest frowning at our speed didn’t translate into bad mobile traffic — in fact, most users were on iPhone.2 From a business perspective, when graphs go up and to the right, who cares if the site could be faster?
|Denial||It’s fast enough. You’ve seen those M1 benchmarks, right?|
|Anger||You mean I have to care about this, too!? We just got done having to care about accessibility!|
|Bargaining||I promise we will eventually consolidate on just three tooltip libraries if you let us skip the bundle check|
|Sadness||I should have realized the dark path I was going down when I tried to see if
|Acceptance||I love my slow website.|
Proving that speed mattered wasn’t enough: we also had to convince people emotionally. To show everyone, god dammit, how much better our site would be if it were fast.
So I decided to make a demo site that reused our APIs, but in a way that was as fast as possible.
Spoiler: surprising myself, I succeeded. And then things got weird. But before I can tell you that story, I have to tell you this story…
The goal: how fast is possible?
HTTP/1.1 204 No Content Cache-Control: max-age=999999999,immutable
This is the fastest web page. You may not like it, but this is what peak performance looks like.
That may seem unhelpful — of course a useful page is slower than literally nothing! — but anything added to a frontend can only slow it down. The further something pushes you from the Web’s natural speed, the more work needed to claw it back.
That said, some leeway is required, or I’d waste time micro-optimizing every little facet. You do want to know when your content, design, or development choices start impacting your users. For everything added, you should balance its benefits with its costs. That’s why performance budgets exist.
But to figure out my budget, I first needed some sort of higher-level goal.
Some sort of higher-level goal
🎯 Be so fast it’s fun on the worst devices and networks our customers use.
- Target device: bestselling phone at a local Kroger
- Hot Pepper’s Poblano VLE5
- $35 ($15 on sale)
- Specs: 1 GB RAM, 8 GB total disk storage, and a 1.1 GHz processor.
- $35 ($15 on sale)
- Target connection: “slow 3G”
- 400kbps bandwidth
- 400ms round-trip time latency
- At the time, what Google urged to test on and what WebPageTest’s “easy” configuration & Lighthouse used
- 400ms round-trip time latency
Unfortunately, connections get worse than the “slow 3G” preset, and one example is cellular data inside said Kroger. Big-box store architectures double as Faraday cages, losing enough packets to sap bandwidth and latency.
Ultimately, I went with “slow 3G” because it balanced the USA’s mostly-faster speeds with the signal interference inside stores. Alex Russell also mentioned “we still see latency like that in rural areas” when I had him fact-check this post.
(These device and connection targets are highly specific to this project: I walked inside stores with a network analyzer, asked the front desk which phone was the most popular, etc. I would not consider them a “normal” baseline.)
Yes, when networks are so bad you must treat them as optional, that’s a job for Service Workers. I will write about special SW sauce (teaser: offline streams, navigation preload cache digests, and the frontier of critical CSS), but even the best service worker is irrelevant for a site’s first load.
(Wait, don’t spotty connections mean you should reach for a Service Worker?)
Yes, when networks are so bad you must treat them as optional, that’s a job for Service Workers.
I will write about special SW sauce (teaser: offline streams, navigation preload cache digests, and the frontier of critical CSS), but even the best service worker is irrelevant for a site’s first load.
Although I knew what specs I was aiming for, I didn’t know what they meant for my budget. Luckily, someone else did.
Google’s suggestions to be fast on mobile
Google seems to know their way around web performance, but they never officially endorse a specific budget, since it can’t be one-size-fits-all.
But while Google is cagey about an specific budget, Alex Russell — their former chief performance mugwump — isn’t. He’s written vital information showing how much the Web needs to speed up to stay relevant, and this post was exactly what I needed:
Putting it all together, under ideal conditions, our rough budget for critical-path resources (CSS, JS, HTML, and data) at:
- 170KB for sites without much JS
- 130KB for sites built with JS frameworks
(Alex has since updated these numbers, but they were the ones I used at the time. Please read both if you’re at all interested — Alex accounts for those worse-than-usual networks I mentioned, shows his work behind the numbers, and makes no bones about what exactly slows down web pages.)
Unfortunately, the hardware Alex cited clocks 2GHz to the Poblano’s 1.1GHz. That means the budget should lower to 100kB or so, but I couldn’t commit to that. Why?
Engineering around analytics
As usual, third-parties ruin everything. You can see the 2022 site’s cross-origin bytes situation, and it doesn’t include same-origin third-parties like Dynatrace.
I can’t publish exact figures, but at the time it was scarcely better. Barring discovery of the anti-kilobyte, I needed to figure out which third-parties had to go. Sure, most of them made $, but I was out to show that dropping them could make $$$.
After lots of rationalizing, I ended with ≈138kB of third-party JS I figured the business wouldn’t let me live without. Like the story of filling a jar with rocks, pebbles, and sand, I figured engineering around those boulders would be easier than starting with a “fast enough” site and having it ruined later.
Some desperate lazy-loading experiments later, I found my code couldn’t exceed 20kB (after compression) to heed Alex’s advice.
Okay, 20kB. Now what?
20 kilobytes ain’t much.
react-dom are nearly twice that. An obvious alternative is the 4kB Preact, but that wouldn’t help the component code or the Redux disaster — and I still needed HTML and CSS! I had to look beyond the obvious choices.
What does a website truly need? If I answered that, I could omit everything else.
Well, what can’t a website omit, even if you tried?
You can make a real site with only HTML — people did it all the time, before CSS and JS existed.
(Yes, I see you with the Svelte.js shirt in the back. I talk about it in the next post.)
So my plan seemed possible, and apparently profitable enough that Amazon does it. Seemed good enough to try.
But everyone knows classic page navigation is slow!
Are you sure about that? The way I figured…
- If you inline CSS and generate HTML efficiently, their overhead is negligible compared to the network round-trip.
- Concatenating strings on a server should not be a huge bottleneck. And if it were, how does React SSR justify concatenating those strings twice into both HTML and hydration data?
But don’t take my word for it — we’ll find out how that stacks up next time. In particular, I first need to solve a problem: how do you send a page before all its slow data sources finish?
I still ask other developers to stop writing slow code, but I used to, too. ↩
That does not count as insider information. Any US website with a similar front-end payload will tell you the same. ↩
Those numbers were very loose, conservative estimates. They’re also no longer accurate — they’re much higher now — but they still work as a bare minimum. ↩
Top comments (17)
Perhaps time to banish them to a web worker - Partytown style - to preserve your TTI.
Fictional conversation at some unnamed retail location:
Customer: "Excuse me. Do you carry the 'Hot Pepper’s Poblano VLE5' phone?"
Clerk: "I'm sorry. We had to discontinue that model. Customers kept returning it convinced they had a defective unit after they tried to access our web front."
Indeed, later the third-parties had grown enough that they exceeded my budget by themselves. I did “solve” it with some code in the same spirit as Partytown, but a different approach. (I promise I’ll write about it later!)
Good post, I found it entertaining. I would've liked to have seen mention of other approaches to delivering content quickly such as edge caching. Having data close to the user can be more impactful than shaving off KBs, but it's great to see such a passion for performance that is often sorely neglected.
I agree. I originally had a piece talking about edge caching and rendering, and maybe I should dust it off and post it.
I don't know why I laughed on this so much
back in my day we would just serve content based on the user's environment.
seems like the proper solution isn't creating Lowest Common Denominator designs, so the same site is served on weak and strong hardware, but being able to dynamically serve content based on the specs of the device
I felt this pain
I thought about this a lot and I think there is a fairly good case for it being faster on most sites: If a page has a lot of repeated html elements then those repeated elements would need network data to transfer. Meanwhile your js can repeat that structure for free on the client. E.g imagine the simplest example: A 100000 long html table. If you transmit that entire thing, it will be huge. If you render it via js, the data could come from a json and then at least you save on the shell of the table rows, the extras.
The only downside is that all your js needs to be loaded first, so it should ideally be inlined and as small as possible....which it probably won't be at all because the whole point of doing things this way is because you want application level rendering control via js.
It's kind of like a video game engine though: You would never think to save the coordinates of every pixel of every texture of every character and think that this speeds up the initial render. No, you just save enough of the character data so everything can be moved into place in the first frame render. Or in a networking engine: You dont transmit the entire model, you only transmit the difference needed to be able to render the correct state. To constantly transmit the whole html structure should and would be considered wasteful.
That’s certainly plausible: truly scrutinized JSON can have fewer bytes wrapping around the data bytes than HTML with element wrappers and styling attributes. I’d have to benchmark to find the inflection point — if the HTML cruft is repetitive, compression makes the question more complicated.
Great article. looking forward to the next one.
This was a great read and you listed many great resources. Wel done 👌
Legendary, Love it
Following for more of this very insightful (and entertaining) writing style!
"progressive upgrade" is the keyword here
Fascinating to hear your journey from an F20!
Hey, I'd like to experiment with Party-town and Dynatrace in my project but unfortunately in the official docs, there is no reference for Dynatrace integration! Anyone have idea how to integrate Dynatrace with Partytown?
In a way, AMP was the solution...