우병수

Posted on Apr 24 • Originally published at techdigestor.com

Lighthouse is Lying to You (And How to Get Scores That Actually Matter)

#productivity #tools #webdev #discuss

TL;DR: A client came to me frustrated. Their site scored 91 on Lighthouse.

📖 Reading time: ~30 min

What's in this article

The Problem: Your Lighthouse Score is 94 and Your Users Still Hate the Site
The Gap Between Your Lighthouse Score and Your Users' Reality
Running Lighthouse the Right Way (Not Just Chrome DevTools)
Reading the Report Without Getting Distracted by the Score
The Score Is a Lagging Indicator — Start With the Metrics
The Fixes That Actually Move Your Score (Ranked by ROI)
The LCP Trap Is Where Most Sites Bleed Points First
Webpack and Build Tool Settings That Lighthouse Rewards

The Problem: Your Lighthouse Score is 94 and Your Users Still Hate the Site

The Gap Between Your Lighthouse Score and Your Users' Reality

A client came to me frustrated. Their site scored 91 on Lighthouse. Conversions were tanking. Users were bouncing. I pulled their Chrome User Experience Report (CrUX) data and found a 4.2-second LCP at the 75th percentile. That means 1 in 4 real users waited over four seconds to see the main content load. The 91 score was technically accurate and completely useless at the same time. That's the moment I stopped treating Lighthouse scores as a goal and started treating them as one diagnostic signal among several.

Here's what most tutorials skip: Lighthouse runs in a simulated environment. Specifically, it emulates a Moto G4 with 4x CPU throttling and a throttled network connection. Your MacBook Pro with 32GB RAM and a 1Gbps office connection runs Lighthouse in under 2 seconds and hands you a 94. That 94 reflects a machine that doesn't exist in your user base. The simulation is intentional — it's meant to surface problems your fast hardware hides — but the moment you treat that score as "performance achieved," you've lost the plot. I've seen teams spend two sprints chasing a 95+ score while their real users on mid-range Android devices in suburban Ohio were staring at spinners.

The fundamental issue is lab data vs field data. Lighthouse is a lab tool. It runs one controlled test, in one browser context, with one set of throttling parameters. Field data is what actually happened to real users across thousands of sessions — different devices, different network conditions, different geographies, different cache states. Google exposes field data through CrUX, which feeds into PageSpeed Insights. Run the same URL through PageSpeed Insights and look at the "Discover what your real users are experiencing" section. If your lab LCP is 1.8s but your field LCP is 4.1s, you have a real-world problem that your Lighthouse workflow is not catching. The two numbers can diverge by 2x or more, and they frequently do.

This guide covers three things specifically: running Lighthouse in a way that gives you trustworthy output (not just the Chrome DevTools click-and-pray method), reading the report honestly instead of fixating on the score, and identifying which fixes actually move field metrics. That last part is the hard one. Plenty of optimizations improve your Lighthouse score without touching LCP, INP, or CLS in any meaningful way for real users. Those are vanity wins. We're going after the ones that show up in CrUX. For a complete list of tools that fit into this workflow, check out our guide on Productivity Workflows.

One concrete thing you can do right now: stop running Lighthouse from Chrome DevTools on your dev machine. Instead, use the CLI with explicit flags so the test conditions are repeatable:

npx lighthouse https://yoursite.com \
  --preset=desktop \
  --throttling-method=simulate \
  --output=html \
  --output-path=./report.html \
  --chrome-flags="--headless"

Better yet, run it against a staging URL from a CI box — a Linux container with modest specs will give you numbers far closer to what your users see than your local machine ever will. The throttling method flag matters too: simulate is Lighthouse's default and runs faster but is less accurate; devtools applies actual network throttling through Chrome and is slower but reflects real conditions more faithfully. Most teams don't know that flag exists. Switch to devtools throttling and watch your scores drop — that's not a bad thing, that's calibration.

Running Lighthouse the Right Way (Not Just Chrome DevTools)

Most developers I've talked to only ever run Lighthouse through Chrome DevTools — hit F12, click the Lighthouse tab, generate report, done. That works fine for a quick sanity check, but it's the least reliable way to get scores you can act on or compare over time. The moment your colleague runs the same URL on their machine and gets a score 15 points different from yours, you realize the problem.

There are three real ways to run Lighthouse, and each has a specific job:

Chrome DevTools — use it for exploratory debugging mid-session. Quick feedback while you're already in the browser. Never trust these numbers for reporting or comparison.
CLI — use it for local deep-dives and CI integration. Full control over throttling, output format, and Chrome flags. This is where serious performance work happens.
PageSpeed Insights API — use it when you want scores from Google's servers with real-world field data layered in (Core Web Vitals from the Chrome UX Report). Free to use, with an API key required once you go past the unauthenticated rate limit. The endpoint is https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=YOUR_URL&strategy=mobile&key=YOUR_KEY. The quota without a key is roughly 25 requests per 100 seconds — fine for occasional checks, not fine for automated pipelines.

The thing that caught me off guard early on: DevTools Lighthouse scores diverge from PageSpeed Insights scores for concrete, fixable reasons. Your browser extensions inject JavaScript and CSS. Your local network has different latency characteristics than Google's lab environment. Your machine's CPU throttling simulation differs from whatever Lighthouse Labs uses as a baseline. I've seen 20-point gaps on the same URL in the same session because of nothing more than an ad blocker and a password manager running. The fix for local runs isn't just using incognito — it's also using the CLI with controlled flags.

Here's the CLI setup and the command I actually use:

npm install -g lighthouse

lighthouse https://yoursite.com \
  --only-categories=performance \
  --output=json \
  --output-path=./report.json \
  --throttling-method=simulate \
  --preset=desktop

A few things worth breaking down here. --throttling-method=simulate uses CPU and network simulation rather than actually throttling your connection — it's faster and more consistent across machines, which matters when you're comparing runs. --only-categories=performance skips accessibility, SEO, and best practices audits, which cuts the run time significantly and keeps the JSON output focused. And --preset=desktop overrides the default, which is mobile. Mobile is Lighthouse's default because Google's ranking signals prioritize mobile performance — fair reason — but if your actual traffic is 80% desktop, optimizing for mobile scores first is the wrong trade-off. Know your users before picking a preset.

For CI pipelines, add --chrome-flags='--headless' to the command above. Without it, Lighthouse will try to open a Chrome window and either fail on a headless server or behave inconsistently depending on the display environment. The full CI-ready command looks like this:

lighthouse https://yoursite.com \
  --only-categories=performance \
  --output=json \
  --output-path=./report.json \
  --throttling-method=simulate \
  --preset=desktop \
  --chrome-flags='--headless'

One gotcha I hit on a GitHub Actions runner: Chrome wasn't installed by default on the ubuntu-22.04 image I was using. You either need to install it explicitly in your workflow or use the browser-actions/setup-chrome action before the Lighthouse step. Also, if you're running this against a localhost URL in CI — say, a preview deploy on port 3000 — you'll need --chrome-flags='--headless --no-sandbox --disable-dev-shm-usage'. The --no-sandbox flag is specifically needed in containerized environments where Chrome refuses to start without it. Skipping it silently kills the audit with a cryptic exit code.

Reading the Report Without Getting Distracted by the Score

The Score Is a Lagging Indicator — Start With the Metrics

Most developers open Lighthouse, see a score of 67, and immediately start chasing it like it's a leaderboard. That's the wrong move. The score is a weighted composite, and understanding those weights tells you where to spend your time. LCP (Largest Contentful Paint) and CLS (Cumulative Layout Shift) each contribute 25%. TBT (Total Blocking Time) contributes 30% — the single biggest lever in the formula. FCP (First Contentful Paint) and Speed Index each contribute 10%. Do the math: if your TBT is brutal, no amount of image compression is going to move that needle meaningfully. I've watched devs spend a week optimizing images and go from 61 to 64. The TBT was 3,400ms the whole time.

TBT: The Metric That Bites You Late

TBT measures the total time that the main thread was blocked for more than 50ms during page load — specifically between FCP and Time to Interactive. Every long task contributes its "excess" time above that 50ms threshold. So a 200ms task contributes 150ms to TBT. It's a lab proxy for INP (Interaction to Next Paint), which is the real-world responsiveness metric that Google uses in ranking signals. The reason most developers ignore TBT until it causes problems: it doesn't feel broken during development. Your machine is fast, your DevTools CPU throttling is set to 1x, and your test data is minimal. Ship to production, get users on mid-range Android devices, and suddenly clicking "Add to Cart" feels like submitting a form over dial-up. The culprit is almost always a third-party script or a large JS bundle executing on load with zero chunking.

Opportunities vs Diagnostics: Two Very Different Conversations

Lighthouse splits its findings into two buckets and most people treat them the same. Don't. Opportunities have estimated time savings attached — things like "Eliminate render-blocking resources (estimated savings: 1.2s)" or "Properly size images (estimated savings: 840ms)". These are actionable and come with a measurable payoff. Diagnostics are structural observations with no estimated savings: "Avoid an excessive DOM size", "Minimize main-thread work", "Avoid chaining critical requests". Diagnostics tell you something is architecturally wrong, but they won't tell you how bad. I treat Diagnostics as code review feedback and Opportunities as sprint tickets. Different priority, different process.

How to Actually Read the Filmstrip Waterfall

The filmstrip at the top of the report is more useful than most devs realize. It shows screenshots of the page rendering at timed intervals — usually every 0.5–1s. Scroll past it to look at the network waterfall in the Trace. Open Chrome DevTools, run Lighthouse from the Lighthouse tab, then flip to Performance and hit record while reloading — you'll get a full trace. Look for the long orange bars (scripting) and purple bars (rendering) on the main thread. The thing Lighthouse flags in the report isn't always the actual bottleneck. I've had cases where Lighthouse complained about render-blocking CSS, but the real problem was a 900KB third-party analytics bundle that fired on DOMContentLoaded and blocked the main thread for 2.4 seconds. That showed clearly in the Performance trace as a solid orange block. Lighthouse mentioned "reduce JavaScript execution time" as a diagnostic — easy to overlook if you're focused on the opportunity items.

Cross-Reference With Coverage Before You Optimize Anything

Lighthouse can tell you that unused JavaScript is a problem, but it can't tell you which files are the worst offenders with any granularity. The Coverage tab in DevTools does. Open it via Ctrl+Shift+P → "Show Coverage", hit record, reload the page, and stop recording. You'll get a file-by-file breakdown of what percentage of each JS and CSS file was actually executed during load. I've routinely seen files sitting at 12–15% usage. A 400KB vendor bundle where only 50KB runs on initial load is a chunking problem, not a download problem. That distinction matters because the fixes are completely different — code splitting vs compression vs tree shaking. Lighthouse gives you a category; Coverage gives you a target. Use both.

Open Coverage: F12 → Ctrl+Shift+P → type Coverage → select "Show Coverage"
Click the record button (circle icon), reload the page, then stop recording
Sort by "Unused Bytes" descending — the top offenders are your first targets
Red bars in the visualization = unused code; green = executed during load
Cross-reference filenames against your webpack/Vite bundle analysis to confirm the source

One thing that caught me off guard: Coverage shows unused code at load time specifically. A file that looks 80% unused might be fully used after user interaction. Don't delete code based on Coverage alone — use it to decide what to lazy-load, not what to remove.

The Fixes That Actually Move Your Score (Ranked by ROI)

The LCP Trap Is Where Most Sites Bleed Points First

I've audited dozens of sites where the team had "optimized images" and "added lazy loading everywhere" — and their Lighthouse score was still in the 50s. The single most common culprit: the hero image isn't preloaded. The browser discovers it late because it's buried in CSS or a component that loads after the main bundle. By the time it starts fetching, LCP is already blown. The fix is one line in your <head>:

That fetchpriority="high" attribute is the part people skip. Without it, the preload hint competes equally with everything else. With it, the browser front-queues this request. I've seen LCP drop from 4.2s to 1.8s on a production site from that single addition. If your hero image is served from a CDN with multiple formats, preload the most modern format your audience can handle and let the <picture> element do the rest — more on that in a moment.

Render-Blocking Resources: The Webpack Config That Actually Helps

Render-blocking JavaScript is still responsible for a huge chunk of poor FCP scores. The standard advice is "split your bundle" — but here's the specific config that works without causing a waterfall of requests:

// webpack.config.js
module.exports = {
optimization: {
splitChunks: {
chunks: 'all',
cacheGroups: {
vendor: {
test: /[\\/]node_modules[\\/]/,
name: 'vendors',
priority: -10,
reuseExistingChunk: true,
},
commons: {
name: 'commons',
minChunks: 2,
priority: -20,
reuseExistingChunk: true,
},
},
},
},
};

The chunks: 'all' setting is non-obvious — it tells webpack to split both async and sync chunks, not just dynamically imported ones. Without that, your vendor libraries stay monolithic. Pair this with <link rel="preload" as="script"> for your critical chunks so the browser fetches them in parallel with parsing. The gotcha: preloading too many scripts is just as bad as blocking — limit it to the two or three chunks that appear on every page.

Images: Still the Highest-ROI Fix, and Most Teams Are Half-Doing It

Switching to WebP was the right call two years ago. Switching to AVIF is the right call now — it compresses 30–50% better than WebP at equivalent visual quality. The <picture> element handles the progressive fallback cleanly:

<img
src="/hero.jpg"
alt="Hero image"
width="1200"
height="630"
loading="eager"
fetchpriority="high"

Notice loading="eager" on the hero. This is where I see teams shoot themselves in the foot — they add loading="lazy" to every image sitewide via a blanket rule, and Lighthouse flags the LCP image as lazy-loaded, which delays it significantly. The rule is simple: anything above the fold gets loading="eager" (or just omit the attribute, that's the default). Everything that's not visible on initial render gets loading="lazy". Also: always set explicit width and height. Without them, the browser doesn't know the image's aspect ratio before it loads, which causes layout shift and tanks your CLS score.

TBT Is Trickier — Break Up Long Tasks or at Least Defer the Worst Offenders

Total Blocking Time measures how long the main thread is locked up between FCP and TTI. The honest answer for most apps is that analytics and marketing scripts are the primary offenders. Before reaching for anything fancy, try this:

// Instead of this — runs immediately, blocks main thread on load
initAnalytics();

// Do this — waits for user interaction before initializing
let analyticsLoaded = false;
function loadAnalytics() {
if (!analyticsLoaded) {
analyticsLoaded = true;
initAnalytics();
}
}
['click', 'scroll', 'keydown'].forEach(evt =>
window.addEventListener(evt, loadAnalytics, { once: true, passive: true })
);

For genuinely long tasks in your own code — data processing, filtering large arrays, complex rendering logic — scheduler.postTask() is the right modern tool. It lets you yield to the browser between chunks of work:

async function processLargeDataset(items) {
const results = [];
for (let i = 0; i < items.length; i++) {
results.push(processItem(items[i]));
// Yield to browser every 50 items
if (i % 50 === 0) {
await scheduler.postTask(() => {}, { priority: 'background' });
}
}
return results;
}

Browser support for scheduler.postTask() is decent in Chromium-based browsers but absent in Firefox stable as of writing, so check if you need a fallback. The old setTimeout(() => {}, 0) trick still works but is less precise — the scheduler API gives you actual priority control.

Third-Party Scripts: The Real Villain, and Facades Are Your Best Defense

I ran Lighthouse on a client's marketing site and found that three tag manager scripts, a chat widget, and an embedded YouTube video were responsible for over 60% of the TBT. Third-party scripts can't be code-split or tree-shaken by you — you don't own them. Your options are ordering, deferral, and facades.

async and defer are not interchangeable. async downloads in parallel but executes as soon as it's ready, potentially interrupting HTML parsing. defer downloads in parallel but waits until the HTML is fully parsed before executing. For analytics and non-critical scripts, defer is almost always what you want. For scripts that depend on load order, use defer consistently across all of them — deferred scripts execute in document order.

For YouTube embeds and chat widgets, the facade pattern gives you dramatic wins. Instead of loading the full embed on page load, render a static thumbnail with a play button. Only load the actual iframe when the user clicks it:

class="yt-facade"
data-videoid="dQw4w9WgXcQ"
style="background-image: url('https://i.ytimg.com/vi/dQw4w9WgXcQ/hqdefault.jpg')"
onclick="this.innerHTML=''"

▶

For chat widgets, the same principle applies — render a fake chat bubble, load the actual SDK only when clicked. Libraries like lite-youtube-embed handle the YouTube case for you with proper accessibility built in. The performance difference on a page with three YouTube embeds is not subtle — you're looking at saving several megabytes of JavaScript and dozens of network requests on initial load.

Webpack and Build Tool Settings That Lighthouse Rewards

Start with the bundle analyzer — everything else is guesswork

Before you touch a single Webpack config option, run the bundle analyzer. I've watched developers spend hours tweaking splitChunks settings while their real problem was that they accidentally imported all of lodash, or that moment.js was dragging in 67 locale files nobody asked for. The command is simple:

# First, generate the stats file
npx webpack --profile --json > stats.json

Then visualize it

npx webpack-bundle-analyzer stats.json

You'll get an interactive treemap in your browser. Look for anything that doesn't belong — third-party libraries that are larger than your entire application code, duplicated dependencies pulled in by different packages, or dev-only utilities that somehow made it into production. I once found a full copy of faker.js in a production bundle because a developer imported it in a utility file without thinking. Lighthouse's "Reduce unused JavaScript" warning is the symptom. The bundle analyzer is the diagnosis.

Code splitting with React.lazy() — the actual pattern that moves the needle

Lighthouse's Time to Interactive score tanks when you ship a monolithic JS bundle. The fix is route-level code splitting, and here's the pattern I actually use in production:

import React, { lazy, Suspense } from 'react';
import { Routes, Route } from 'react-router-dom';

const Dashboard = lazy(() => import('./pages/Dashboard'));
const Settings = lazy(() => import('./pages/Settings'));
const Reports = lazy(() => import('./pages/Reports'));

function App() {
return (
}>

} />
} />
} />

);
}

The thing that caught me off guard the first time: the fallback matters more than it looks. If you pass null or an empty fragment, users on slow connections see a blank screen with no layout — which tanks your Cumulative Layout Shift score because the page reflows when the component loads. Use a skeleton or a minimal shell that matches the rough shape of what's coming. Also, don't lazy-load your above-the-fold content. I've seen people lazy-load their hero section and wonder why Largest Contentful Paint got worse.

Tree shaking is opt-in, not default — here's what you're missing

Everyone assumes tree shaking just works. It doesn't, not reliably. Two things have to be true simultaneously. First, you need "sideEffects": false in your package.json — or a more specific array if your package genuinely has side effects like CSS imports:

// package.json
{
"name": "your-app",
"sideEffects": ["*.css", "*.scss", "./src/polyfills.js"]
}

Second, you need ES modules throughout your dependency chain. If any library in your import tree ships only CommonJS (require()-style), Webpack can't tree-shake it because CJS modules are evaluated at runtime, not statically analyzable. Check your node_modules for packages that only have a main field in their package.json and no module or exports field — those are your tree-shaking dead zones. The practical move: when you're evaluating a new dependency, check whether it ships ESM before you install it. bundlephobia.com will show you the tree-shaken size versus the full size, and the gap tells you everything.

Font loading is two things, not one

Most developers fix half the font problem and leave the other half bleeding. The complete fix requires both a font-display: swap declaration and a preload hint for your critical font file. Here's what the full setup looks like:

rel="preload"
href="/fonts/inter-regular.woff2"
as="font"
type="font/woff2"
crossorigin
/>

/* In your CSS */
@font-face {
font-family: 'Inter';
src: url('/fonts/inter-regular.woff2') format('woff2');
font-weight: 400;
font-style: normal;
font-display: swap;
}

font-display: swap tells the browser to render text immediately in a fallback font, then swap when your custom font loads — Lighthouse rewards this because users see text faster. The preload hint tells the browser to fetch the WOFF2 file at high priority before it parses your CSS and discovers it needs the font. Without the preload, even with swap, you still get a flash. Do both. And only preload the weight you actually use above the fold — preloading your entire type system is counterproductive and will show up as an unused preload warning in Lighthouse.

HTTP caching headers: Lighthouse checks this and most devs punt on it

Lighthouse audits your caching policy under "Serve static assets with an efficient cache policy." Most dev server configs don't carry over to production, so what worked locally breaks silently. The correct setup for Webpack-generated assets is a long cache lifetime on anything with a content hash in the filename, and a short or no-cache policy on your HTML entry points:

# Nginx config for a typical Webpack output
location /static/ {
# Webpack adds [contenthash] to filenames — safe to cache forever
add_header Cache-Control "public, max-age=31536000, immutable";
}

location / {
# Your HTML should never be cached — it references the hashed assets
add_header Cache-Control "no-cache, no-store, must-revalidate";
}

The immutable directive tells browsers not to revalidate even when the user hits refresh — this is safe because if the file changes, it gets a new hash and a new URL. The gotcha I hit: if you're deploying to S3 + CloudFront, you have to set cache headers at the S3 object level and make sure CloudFront isn't overriding them with its own behavior settings. CloudFront's default behavior strips certain headers and applies its own TTL rules. Check your CloudFront cache behavior settings explicitly — don't assume the S3 metadata flows through.

Integrating Lighthouse into CI So You Stop Shipping Regressions

Setting Up lhci the Right Way

Manual Lighthouse runs are better than nothing, but they don't stop anyone from merging a PR that tanks your LCP by 800ms. I've watched that happen too many times — someone refactors an image loading strategy, CI goes green, and production is suddenly sluggish. The fix is @lhci/cli, which runs Lighthouse as part of your pipeline and fails the build when metrics cross your thresholds. Here's the exact setup that's worked for me.

Install the CLI globally or as a dev dependency (I prefer dev dependency so the version is pinned per project):

npm install --save-dev @lhci/cli

Then drop a lighthouserc.js in your project root. Here's a real config I use — not a toy example:

// lighthouserc.js
module.exports = {
  ci: {
    collect: {
      url: ['http://localhost:3000/fixture'],
      numberOfRuns: 3,
      startServerCommand: 'npm run start:ci',
    },
    assert: {
      assertions: {
        'largest-contentful-paint': ['error', { maxNumericValue: 2500 }],
        'total-blocking-time': ['error', { maxNumericValue: 300 }],
        'cumulative-layout-shift': ['warn', { maxNumericValue: 0.1 }],
        'uses-optimized-images': ['warn', {}],
        'render-blocking-resources': ['error', {}],
      },
    },
    upload: {
      target: 'temporary-public-storage',
    },
  },
};

The maxNumericValue for LCP is in milliseconds, and TBT is also in milliseconds — don't confuse them with the score (0–100). Setting largest-contentful-paint to error at 2500ms means any PR that pushes LCP past 2.5 seconds will fail the build. warn for CLS logs it without blocking the merge, which is useful when you're still fixing layout shift issues and don't want to halt all development.

GitHub Actions Workflow That Actually Works

Here's the workflow snippet I use. It assumes you're deploying a preview URL with something like Vercel or Netlify, but I've wired it to a local server for projects that don't have preview deployments:

name: Lighthouse CI

on:
  pull_request:
    branches: [main]

jobs:
  lighthouse:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Node
        uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Install dependencies
        run: npm ci

      - name: Build
        run: npm run build

      - name: Run Lighthouse CI
        run: |
          npx lhci autorun
        env:
          LHCI_GITHUB_APP_TOKEN: ${{ secrets.LHCI_GITHUB_APP_TOKEN }}

The LHCI_GITHUB_APP_TOKEN is optional but worth setting up — it lets lhci post a status check directly on the PR with a link to the Lighthouse report. Without it, you're digging through Action logs to find failures. The GitHub App setup takes about five minutes via the Lighthouse CI GitHub App page.

The Fixture Page Problem (This One Will Bite You)

The thing that caught me off guard when I first set this up: lhci scores drift significantly if you're pointing at a page that loads live data. An API that returns 10 items on Monday might return 500 items on Friday, your LCP changes, and now you're chasing phantom regressions that have nothing to do with your frontend code. The fix is to create a stable fixture route — a page that always renders the same content, same image sizes, same everything. I keep /fixture in my app specifically for CI. It renders a hardcoded dataset with no external API calls.

# Instead of pointing at your real page:
# url: ['https://preview-abc123.vercel.app/products']

# Point at your fixture:
url: ['https://preview-abc123.vercel.app/fixture']

# Or even better, run against localhost with a seed:
startServerCommand: 'NODE_ENV=test npm run start'
url: ['http://localhost:3000/fixture']

This also makes your runs faster — no waiting on external APIs, no flakiness from network variance. Three Lighthouse runs against a local server with static data will be far more consistent than three runs against a live preview hitting a real database.

lhci Assertions vs Webpack Performance Budgets — They're Not the Same Thing

I see developers treat these as interchangeable. They're not. Webpack performance budgets (set via performance.maxAssetSize and performance.maxEntrypointSize in your webpack config) catch bundle size problems at build time — before the browser ever loads anything. They're static analysis on file sizes. lhci assertions catch runtime performance problems — actual paint timing, main thread blocking, and layout shifts as measured in a real browser. You need both.

Use webpack budgets to fail the build when a bundle exceeds 250KB gzipped, before anyone has to run a browser.
Use lhci assertions to catch problems that don't show up in file sizes — a third-party script that blocks rendering, a large DOM that causes TBT to spike, or an unoptimized font load that delays LCP.

A perfectly sized bundle can still have terrible LCP if you're loading a hero image without fetchpriority="high". webpack won't catch that. lhci will. Run both in your pipeline and treat them as complementary checks, not redundant ones.

Lab Data vs Field Data: The Part Everyone Skips

Your Lighthouse score is a lab measurement. It runs on a controlled machine, throttled network, no browser extensions, no third-party cookies messing with layout, no real user variability. That 95 you got on staging? Meaningless until you cross-reference it with what actual humans on actual phones are experiencing. I learned this the hard way when a client's site sat at 92 in Lighthouse while Google Search Console was showing that 40% of their real users had a CLS score in the "Poor" range. The two numbers were measuring completely different realities.

The ground truth is CrUX — the Chrome User Experience Report. Google collects real-world performance data from Chrome users who have opted into syncing, and it covers LCP, FID (now INP), CLS, and TTFB. You can access this data two ways without spending anything. The first is PageSpeed Insights — scroll past the Lighthouse section and you'll see a "Discover what your real users are experiencing" block. That's CrUX data for your URL or origin. The second is BigQuery, which gives you the raw dataset. Google loads the full CrUX data monthly into a public BigQuery table and you get 1TB of free query processing per month, which is more than enough for most audits. Here's a query to pull your origin's Core Web Vitals breakdown:

SELECT
  origin,
  DATE(yyyymm, '%Y%m') AS month,
  experimental.popularity.rank AS rank,
  largest_contentful_paint.histogram.start AS lcp_start,
  largest_contentful_paint.histogram.density AS lcp_density
FROM
  `chrome-ux-report.country_us.202404`
WHERE
  origin = 'https://yourdomain.com'
ORDER BY
  lcp_start ASC;

Swap the table suffix for the month you want (format: YYYYMM) and swap the origin. If you get no rows back, your site doesn't have enough CrUX data — Google requires a minimum traffic threshold, and low-traffic sites simply won't appear. That's a genuine limitation nobody mentions in tutorials. For those sites, Lighthouse is your only structured signal, so weigh it accordingly.

Setting Up Search Console So You Actually See the Problem

Go to Google Search Console → Core Web Vitals (under Experience in the left sidebar). You'll see a chart split by mobile and desktop, with URLs bucketed into Good, Needs Improvement, and Poor. The thing that caught me off guard the first time was the 28-day rolling window — you won't see the impact of your optimization immediately. You ship a fix, and you're waiting nearly a month for the field data to reflect it. Plan your sprint timelines around that. If you click into a "Poor" URL group, Search Console often clusters similar pages together (it's not always exact URL matches), and it'll tell you which metric is the problem. That's your starting point, not your Lighthouse run.

The Web Vitals Extension for Development

Install the Web Vitals Chrome extension from Google. It's not a replacement for CrUX, but it gives you a live overlay of LCP, INP, and CLS as you browse your own site on your real machine. Use it while testing on a mid-range Android device over a mobile hotspot — not your M2 MacBook on gigabit fiber. That device/network gap is where most of the discrepancy between lab and field scores lives. A font-swap causing CLS on a fast connection is invisible; on a slow connection it's catastrophic because the FOIT window is much longer. The extension will catch it. Lighthouse often won't flag it at the same severity because its throttling simulation only goes so far.

When to Trust Your Lighthouse Score and When to Ignore It

Trust Lighthouse when: you're catching regressions in CI before they ship, you're diagnosing specific technical issues (render-blocking resources, unused JavaScript, image sizing), and when CrUX data isn't available for your site. Lighthouse is excellent as a diagnostic tool and a regression gate. Ignore it as a success metric. A score of 90+ does not mean your real users have good Core Web Vitals — it means your configured test scenario passed. Sites with heavy A/B testing frameworks, personalization scripts, or ads will almost always score well in Lighthouse (because those scripts don't run in the test environment) and terribly in field data. I've seen e-commerce sites with Lighthouse scores in the high 80s that Google flagged as failing Core Web Vitals because the ad stack was injecting layout shifts 4 seconds after load on real sessions. Lighthouse never saw that script. CrUX did.

The honest answer on when to use which: use Lighthouse daily during development as a feedback loop, use CrUX and Search Console weekly as your actual performance scoreboard. If the two diverge significantly — and they often do — trust CrUX. It's measuring your users. Lighthouse is measuring a robot.

When Lighthouse Scores Don't Actually Matter

Lighthouse Is a Lab Tool — Don't Confuse It for Ground Truth

The most expensive mistake I see junior devs make is spending two days chasing a Lighthouse score from 72 to 90 for an internal HR dashboard that 40 people use on a corporate LAN. Lighthouse literally cannot authenticate against your OAuth-protected app without a custom Chrome extension or a lot of puppeteer gymnastics. Even if you hack it to work, you're measuring a simulated 4G throttled experience for users who are on a gigabit office network sitting 10ms from your CDN edge node. The score tells you nothing useful about their actual experience.

Internal tooling is the clearest case, but real-time SPAs are a close second. If you're building a trading platform, a collaborative editor like a Notion clone, or a live logistics dashboard, your users are hammering the app with constant DOM mutations from WebSockets. Lighthouse's LCP and TBT measurements happen once, on initial load, on a cold cache. They don't capture what your users actually feel — which is whether the app freezes when 50 data rows update simultaneously. The metric you actually want to optimize is INP (Interaction to Next Paint). A Lighthouse score of 65 with a p75 INP under 100ms is a better outcome than a score of 95 with an INP of 400ms. Stop optimizing the wrong thing.

Your CrUX Data Is the Real Answer

Chrome User Experience Report data is field data — actual users, actual devices, actual network conditions. If your CrUX shows green Core Web Vitals but Lighthouse is flagging your site, stop. The field data wins every time. You can pull your CrUX data directly from the PageSpeed Insights API without opening a browser:

curl "https://www.googleapis.com/pagespeedonline/v5/runPagespeed\
?url=https://yoursite.com\
&strategy=mobile\
&key=YOUR_API_KEY" | jq '.loadingExperience.metrics'

If LARGEST_CONTENTFUL_PAINT_MS.category comes back as "FAST" and your CUMULATIVE_LAYOUT_SHIFT_SCORE is green, you have a good user experience. Lighthouse complaining about render-blocking resources or image formats at that point is just noise. I've seen teams add lazy loading in ways that actually hurt perceived performance because they were fixing a Lighthouse flag on a page where CrUX was already excellent. Don't break what's working.

Measure What Users Actually Experience With the web-vitals Package

The web-vitals npm package is the correct tool for this job. It uses the same measurement logic that Chrome uses for CrUX. Install it once and wire it into your analytics pipeline:

npm install web-vitals

import { onCLS, onINP, onLCP, onFCP, onTTFB } from 'web-vitals';

function sendToAnalytics(metric) {
  // Replace with your own endpoint or use GA4, Datadog, whatever
  navigator.sendBeacon('/analytics', JSON.stringify({
    name: metric.name,
    value: metric.value,
    rating: metric.rating, // "good", "needs-improvement", "poor"
    id: metric.id,
    navigationType: metric.navigationType,
  }));
}

onCLS(sendToAnalytics);
onINP(sendToAnalytics);
onLCP(sendToAnalytics);
onFCP(sendToAnalytics);
onTTFB(sendToAnalytics);

The thing that caught me off guard the first time I set this up is that onINP fires at the end of the page lifecycle (when the user navigates away or the page is hidden), not on each interaction. So if you're logging to console during development and wondering why nothing shows up immediately — that's why. The metric.rating field is genuinely useful because it maps to the same thresholds Google uses: LCP under 2.5s is "good", 2.5–4s is "needs improvement", over 4s is "poor". You're not guessing anymore.

The practical split I use now: Lighthouse in CI to catch regressions on public, cacheable pages (landing pages, marketing, docs). The web-vitals package running in production for everything else — authenticated apps, SPAs, anything with real user variance. Chasing a Lighthouse number without field data backing it up is cargo-cult optimization. The score isn't the point — your users' actual experience is.

Disclaimer: This article is for informational purposes only. The views and opinions expressed are those of the author(s) and do not necessarily reflect the official policy or position of Sonic Rocket or its affiliates. Always consult with a certified professional before making any financial or technical decisions based on this content.

Originally published on techdigestor.com. Follow for more developer-focused tooling reviews and productivity guides.

DEV Community