What Field Data Tells You That Lighthouse Can't

#react #performance #javascript #webdev

A Lighthouse 95 on a marketing page. CrUX showing 68% of real users experiencing "Poor" LCP. Both measuring the same URL.

The page was getting most of its traffic from users in Southeast Asia on mid-range Android devices over 4G. Lighthouse was running on a simulated Moto G4, but with a fast local network and a warm server. The simulated device wasn't the problem — the network conditions and server geography were.

This is the gap that field data closes. Lighthouse tells you what's possible in a controlled environment. Field data tells you what's actually happening.

Start with what you already have

Before building anything, check the Chrome UX Report. PageSpeed Insights shows CrUX data for any URL with enough traffic — real-user LCP, INP, and CLS distributions, broken down into Good / Needs Improvement / Poor buckets. Google Search Console's Core Web Vitals report shows the same data aggregated by page group.

CrUX has two limitations worth knowing. It only includes URLs that Chrome users have visited with usage reporting enabled, so low-traffic pages won't appear. And it's aggregated over a 28-day rolling window, which means a regression you shipped last week might not fully show up for another three weeks.

For most teams, CrUX is the right first stop. If your CrUX data looks fine, synthetic testing is probably sufficient. If it doesn't — or if you don't have enough traffic to appear in CrUX — you need your own RUM pipeline.

The web-vitals library is the right foundation

The web-vitals library handles the attribution and calculation details that the raw PerformanceObserver API leaves to you. It reports the same values Google uses for CrUX, which matters when you're trying to correlate your internal data with what's affecting your search ranking.

import { onLCP, onINP, onCLS } from 'web-vitals/attribution';

function sendToAnalytics(metric) {
  fetch('/api/vitals', {
    method: 'POST',
    body: JSON.stringify({
      name: metric.name,
      value: metric.value,
      rating: metric.rating,        // 'good' | 'needs-improvement' | 'poor'
      attribution: metric.attribution,
      page: location.pathname,
      navigationType: metric.navigationType,
    }),
    keepalive: true,
  });
}

onLCP(sendToAnalytics);
onINP(sendToAnalytics);
onCLS(sendToAnalytics);

The /attribution import is the important part. metric.attribution for INP includes the interaction target, the event type, and the three-phase breakdown — input delay, processing time, presentation delay — that tells you where the time was lost. For CLS, it includes the element that shifted and how far. This attribution data is what makes field measurements actionable rather than just decorative.

keepalive: true on the fetch ensures the request completes even if the user navigates away before it finishes.

Segment before you draw conclusions

Raw averages hide almost everything. A p75 LCP of 2.4 seconds looks fine — until you split by device type and see that desktop users are at 1.1s and mobile users are at 3.8s. Or split by page and find that 90% of the bad INP scores are concentrated on one product listing page.

The minimum useful dimensions to capture with every metric:

Page path (not the full URL — strip query strings and dynamic segments)
Connection type (navigator.connection?.effectiveType — gives you "4g", "3g", "2g", "slow-2g")
Device memory (navigator.deviceMemory — rough bucketing into low/mid/high)

function getDeviceContext() {
  return {
    connection: (navigator as any).connection?.effectiveType ?? 'unknown',
    memory: (navigator as any).deviceMemory
      ? (navigator as any).deviceMemory <= 2 ? 'low' : 'high'
      : 'unknown',
  };
}

navigator.connection and navigator.deviceMemory are Chrome-only and non-standard, but they're available for the large share of your users on Chrome where the data is most useful. Treat unknown as a valid bucket rather than an error.

With connection type as a dimension, "mobile users have worse LCP" becomes "mobile users on 3G have worse LCP, but mobile users on 4G are on par with desktop" — and that's a different optimization problem.

The metrics you're not collecting yet

INP is reported once per page session — the worst interaction in the session. That's the metric Google uses, but for debugging you want to know which specific interactions are bad, not just the worst one.

The web-vitals library's onINP callback includes attribution for the worst interaction. For catching the long tail, collecting individual slow interaction events separately is more useful:

onINP((metric) => {
  sendToAnalytics({
    ...metric,
    // attribution.interactionTarget is the CSS selector of the element
    interactionTarget: metric.attribution.interactionTarget,
    interactionType: metric.attribution.interactionType,
  });
});

interactionTarget gives you the CSS selector of the element the user interacted with. When you see the same selector appearing repeatedly in your slow INP data, you've found the component to fix.

From measurement to alerting

Collecting data answers "how bad is it." Alerting answers "when did it get worse."

A performance regression that appears in your RUM data on Tuesday might not surface in a weekly review until Friday. By then it's been affecting real users for three days and the deployment that caused it is buried under subsequent changes.

The gap between measurement and alerting is where most RUM setups fall short. Teams collect the data, build a dashboard, and check it periodically. They're doing monitoring — looking at history — rather than alerting.

Setting up threshold alerts on your p75 values — LCP over 2.5s, INP over 200ms, CLS over 0.1 — against a rolling window of real user data catches regressions close to when they happen. If you'd rather not build and maintain that alerting layer, RPAlert handles it for React apps: it collects Web Vitals from real browsers and sends a Slack or Discord notification within 60 seconds of a threshold crossing. The free tier covers one app, which is enough to see whether the alerting model is useful before committing to it.

What to do with the data

Field data changes the sequence of performance work.

Without it: pick metrics to optimize based on Lighthouse scores and engineering intuition, ship improvements, re-run Lighthouse, hope.

With it: look at where real users are experiencing the worst scores, segment to find the specific pages and conditions, fix those, watch the field data improve.

The third article in this series noted that CLS is slow to debug because the shifts depend on network timing and server conditions that don't exist locally. The same is true for INP on low-memory devices and LCP on slow connections. Lab data can't replicate those conditions reliably. Field data is the only measurement that captures them.

Running the web-vitals instrumentation in production for a week generates enough data to prioritize the rest of the work in this series. That's where to start.