Roman

Posted on Oct 28

Measuring the Real Impact of a Performance Refactor (Not Just Lighthouse)

#javascript #webdev #react #performance

Every team I’ve worked with has had this moment:

Someone ships a “massive performance refactor.”

They post a screenshot of Lighthouse:

🌈 “We went from 67 → 95!”

Everyone celebrates… and nothing changes.

Conversion doesn’t move. Bounce rates don’t drop. Users don’t even notice.

That’s when you realize: Lighthouse isn’t the finish line. It’s a lab test.

Real performance is what happens when real users use your app — on bad devices, slow networks, and bloated pages full of third-party scripts.

Let’s talk about what lab tests really are, how they differ from real-world testing, and how to actually measure performance impact that matters.

Lighthouse Isn’t the Finish Line — It’s a Lab Test

Lighthouse, WebPageTest, or PageSpeed Insights are what we call lab tests —

synthetic, controlled, reproducible environments that tell you how your app should perform under ideal conditions.

They’re run in Google’s Chrome datacenters or your local environment, usually with:

A single device type (often high-end desktop or emulated mobile)
A fixed network profile (Fast 3G or 4G)
A cold cache (first load)
No background scripts, extensions, or user data

That makes them consistent — great for detecting regressions.

But also isolated — bad for measuring reality.

The Four Major Types of Performance Tests

Type	Description	Use Case	Tools
🧪 Lab Tests	Controlled, reproducible environment. Usually one device/network setup.	Detecting regressions, comparing builds, CI checks.	Lighthouse, WebPageTest, Calibre CI
🌍 Field Tests (RUM)	Data collected from real users via browser APIs.	Measure real-world experience (LCP, INP, CLS).	Perfume.js, Vercel Analytics, SpeedCurve, Datadog RUM
⚙️ Synthetic Monitoring	Automated tests that simulate users at intervals from multiple regions.	Detecting live performance drops or regressions in production.	Checkly, Pingdom, Uptrends, Datadog Synthetics
🧭 A/B Performance Experiments	Compare old vs optimized builds or feature flags under real traffic.	Measuring the business impact of optimizations.	LaunchDarkly, Optimizely, custom rollouts

Why You Need Both Lab and Real Data

Lab tests are like a medical check-up — they show how healthy your app looks in a clean environment.
RUM and synthetic data are the real patient data — they show how your app behaves in the wild.

You need both:

Lab to catch regressions early.
RUM/Synthetic to confirm real impact.

Without RUM, you’re optimizing for the lab, not the user.

Without lab tests, you can’t prevent accidental slowdowns.

Example: When the Lab Lies

Imagine your team ships a huge performance refactor — lazy loading, image compression, and bundle splitting.

Lighthouse score jumps from 70 → 95.

Everyone cheers.

Then you check RUM data a week later:

Median LCP barely moved.
INP got worse on low-end Android devices.
Conversion rate didn’t change.

Why?

Because Lighthouse runs on a clean, high-end environment.

Your real users run on throttled CPUs, 3G networks, or under thermal throttling on mobile.

Those factors dominate actual experience far more than your code improvements.

In other words — the lab said your app was healthy, but the patients are using different hardware.

That’s the gap between synthetic success and real-world performance.

Lab tools show potential; real data shows impact.

Step 1. Stop Treating Synthetic Metrics as Success

Lab tests are useful — but only as early indicators.

They show regressions, not user experience.

If you ship a refactor that scores 95 but feels identical, you didn’t make the app faster — you just optimized for a benchmark.

The real goal isn’t to improve a score.

It’s to make your app feel faster for real people.

Step 2. Define What “Fast” Actually Means for Your Product

Performance is not absolute — it’s contextual.

Ask yourself:

“What does fast mean for our users?”

For an e-commerce site:

Time from landing → product visible (LCP)
Time to first interaction (INP)
Time to checkout confirmation

For a SaaS dashboard:

Chart render time after data load
Response time between user actions

For a content site:

Perceived load time
Scroll latency

These are the moments users actually feel.

Before you refactor, define what metric equals “user happiness” in your context.

Step 3. Measure with Real-User Data (RUM)

Synthetic tests show potential; RUM (Real User Monitoring) shows reality.

Use libraries like Perfume.js or analytics such as Vercel Analytics, SpeedCurve, or Datadog RUM.

Measure from actual browsers:

LCP (Largest Contentful Paint) — the time when the largest visible element in the viewport is painted.
- It’s not “main content,” but whichever element is largest at render time.
- Valid LCP candidates include:
- <img> elements
- <image> inside <svg>
- Background images via url()
- Video poster images
- Block-level text elements (<p>, <h1>, etc.)
- LCP represents when the most significant visual element becomes visible — a strong proxy for perceived load time.
INP (Interaction to Next Paint) — measures how quickly the page responds to user input during the entire session.
CLS (Cumulative Layout Shift) — quantifies how much visible content unexpectedly moves while loading.

How Google Actually Measures INP

INP (Interaction to Next Paint) is the newest Core Web Vital replacing FID (First Input Delay).

Unlike FID, which captured only the first input, INP observes all interactions (clicks, taps, keypresses) throughout a session and reports the worst one — the longest delay between input and the next frame rendered.

Google collects this data progressively from real Chrome users through the Chrome User Experience Report (CrUX) and browser telemetry.

Each session contributes anonymized samples aggregated by domain, producing real-world performance data visible in:

PageSpeed Insights → Field Data tab
CrUX Dashboard (BigQuery / Looker Studio)

Google’s thresholds:

Good INP ≤ 200 ms (P75)
Needs improvement 200–500 ms
Poor > 500 ms

That 75th percentile (P75) is the line Google Search uses when evaluating Core Web Vitals for ranking.

So even if Lighthouse shows a perfect 100, CrUX field data may still rank your site as slow if your real users experience higher interaction latency.

Then break it down by:

Device (desktop vs low-end mobile)
Region / network
App version (old vs new)

RUM gives you percentile data (P75, P95) across real users — the difference between “it’s fast on my machine” and “it’s fast for 80 % of our audience.”

Step 4. Create an A/B Test for Performance

If you want to know whether your performance work mattered, you need a controlled comparison.

✅ Set up an A/B or gradual rollout:

Group A → old build
Group B → optimized build

Then compare:

Web Vitals (LCP, INP, CLS)
Business metrics (conversion, retention, bounce)

If your app got “faster” but conversion didn’t move → you probably optimized something users didn’t care about.

Performance work must always connect to user or business outcomes.

Step 5. Correlate Technical Metrics with Business Outcomes

Real impact = when you can say:

“Reducing LCP from 4 s → 2 s improved checkout completion by 3 %.”

That’s a win you can show to both engineers and product managers.

How to get there:

Capture performance metrics via RUM.
Log key business events (purchase, signup, retention).
Correlate them in your analytics system.

Even a simple scatter plot (LCP vs conversion rate) can reveal patterns.

That’s how you go from “we think it’s faster” to “we know it’s paying off.”

Step 6. Close the Feedback Loop in CI/CD

Once you know what matters, automate it.

Use Lighthouse CI or Calibre to detect regressions in PRs.
Add RUM dashboards visible to everyone (Datadog, Grafana, Vercel Analytics).
Set alerts for real metrics, not just lab ones (e.g. LCP P75 > 3 s).

The goal isn’t to chase 100/100 — it’s to never regress on what truly affects users.

Step 7. Communicate Results Like an Engineer, Not a Salesperson

The biggest mistake is presenting performance wins as vanity metrics:

“We improved Lighthouse by 20 points!”

No.

Say this instead:

“We reduced LCP by 35 %, and users now see the product 1.2 s sooner. Checkout conversions went up 2.8 %.”

That’s how you make performance work visible and valuable.

Step 8. Recognize When “Faster” Stops Paying Off

Every optimization has diminishing returns.

If your app already loads in ~2 s on average devices, shaving another 200 ms won’t move metrics — but fixing layout shifts or input delay might.

The goal isn’t perfection; it’s perceived speed.

Real users care about “feels fast,” not “measured fast.”

Key Takeaways

Lighthouse = lab test, not success metric.
Combine Lab + RUM + A/B for complete performance visibility.
Define what fast means for your users.
Correlate technical and business outcomes.
Once users stop noticing slowness, you’re done.

Final Thoughts

Performance is the easiest thing to brag about and the hardest thing to prove.

You can’t screenshot real impact — you have to measure it.

The next time you optimize something, ask yourself:

“Did this make my users’ experience faster — or just my metrics look better?”

That’s the difference between a performance refactor and a performance result.

DEV Community