As a best-selling author, I invite you to explore my books on Amazon. Don't forget to follow me on Medium and show your support. Thank you! Your support means the world!
When I first built web applications, I would often find out about performance problems from frustrated users. My phone would ring with reports that a page was "slow" or "broken." I was stuck in a cycle of reacting to problems after they had already damaged the user's experience. I needed a way to see what was happening inside my application as it ran, to catch issues before my users did. This is what performance monitoring and observability are for.
Think of it like the dashboard in your car. You don't wait for the engine to smoke to know there's a problem. You watch the gauges for temperature, oil pressure, and fuel levels. In software, our gauges are metrics, traces, and logs. They tell us the health of the application in real time.
In JavaScript, especially in the browser, this is both crucial and challenging. Users have different devices, network speeds, and browser capabilities. A feature that works perfectly on my fast development machine might be unusable on an older phone on a slow cellular network. To build resilient applications, we must instrument them to show us this reality.
Here is a set of techniques I use to build that dashboard. They help move from guessing about performance to knowing exactly what is happening.
1. Gathering the Standard Gauges: Core Web Vitals
The web platform provides a powerful set of built-in tools to measure fundamental user experience metrics. These are the Core Web Vitals: Largest Contentful Paint (LCP), First Input Delay (FID), and Cumulative Layout Shift (CLS). They answer basic questions. How long does it take for the main content to appear? How responsive is the page to my first click? Does the page jump around as it loads?
We can collect this data using the PerformanceObserver API. It allows us to watch for specific performance events as they happen. Instead of writing complex timing logic, we ask the browser to tell us when these events occur.
Here’s how I set up a basic observer for these core metrics.
class VitalSignsMonitor {
constructor() {
this.metricData = {};
this.setupCoreVitalObserver();
}
setupCoreVitalObserver() {
const vitalObserver = new PerformanceObserver((entryList) => {
for (const entry of entryList.getEntries()) {
this.recordVital(entry);
}
});
// Tell the observer what to look for
vitalObserver.observe({ entryTypes: ['largest-contentful-paint', 'first-input', 'layout-shift'] });
}
recordVital(entry) {
switch (entry.entryType) {
case 'largest-contentful-paint':
// LCP: When the main content paints
this.metricData.lcp = {
value: entry.renderTime || entry.loadTime,
elementTag: entry.element?.tagName
};
console.log(`LCP: ${this.metricData.lcp.value}ms`);
break;
case 'first-input':
// FID: Delay before responding to first click/tap
const delay = entry.processingStart - entry.startTime;
this.metricData.fid = { value: delay, interaction: entry.name };
console.log(`FID: ${delay}ms for ${entry.name}`);
break;
case 'layout-shift':
// CLS: Only count shifts not caused by user interaction (like typing)
if (!entry.hadRecentInput) {
// Add to the cumulative score
this.metricData.cls = (this.metricData.cls || 0) + entry.value;
console.log(`CLS added: ${entry.value}. Total: ${this.metricData.cls}`);
}
break;
}
}
}
// Start monitoring when the page loads
const vitals = new VitalSignsMonitor();
This gives me the foundational health check. If LCP is over 2.5 seconds, I know the page feels slow to load. If FID is over 100 milliseconds, the page feels unresponsive. If CLS is over 0.1, the visual experience is jarring. These are my first, most important alerts.
2. Spotting Long Tasks That Block the Main Thread
JavaScript runs on a single main thread in the browser. If a piece of your code takes too long to run, it blocks everything else. The user can't click, type, or scroll. This creates a terrible, "frozen" feeling.
The browser can tell us about these "Long Tasks." A long task is any operation that keeps the main thread busy for more than 50 milliseconds. Monitoring them is critical for understanding jank.
I extend my PerformanceObserver to watch for these.
setupLongTaskObserver() {
const longTaskObserver = new PerformanceObserver((list) => {
for (const entry of list.getEntries()) {
// Entry is already > 50ms by definition
console.warn(`Main thread blocked for ${entry.duration}ms`, entry);
this.reportToServer('long_task', {
duration: entry.duration,
startTime: entry.startTime,
// What caused it? The attribution may tell us (like a script URL)
culprit: entry.attribution?.[0]?.containerSrc || 'unknown'
});
}
});
longTaskObserver.observe({ entryTypes: ['longtask'] });
}
When I see a long task reported, I know I need to investigate that specific block of code. Maybe I need to break up a large calculation, defer non-essential work, or move something to a Web Worker.
3. Creating Your Own Custom Metrics
Standard vitals are great, but every application is unique. You need to measure what matters for your specific business. How long does the checkout process take? How fast does the search results page load? How quickly does a chart render after data arrives?
This is where custom metrics come in. I use the performance.mark() and performance.measure() APIs to track these custom journeys.
Let's say I want to measure the time from when a user opens a product filter to when the updated product list is fully displayed.
function monitorFilterInteraction() {
// Mark the start point
performance.mark('filter-open-start');
// Later, when the results are drawn on screen
function onResultsRendered() {
// Mark the end point
performance.mark('filter-results-ready');
// Create a measurement between the two marks
performance.measure('filter-to-results', 'filter-open-start', 'filter-results-ready');
// Get the duration
const measures = performance.getEntriesByName('filter-to-results');
const lastMeasure = measures[measures.length - 1];
console.log(`Filter to results took ${lastMeasure.duration}ms`);
// Send to analytics
sendMetric('ui_filter_duration', lastMeasure.duration);
}
// Simulate calling this when rendering is done
setTimeout(onResultsRendered, 350); // Simulated render delay
}
// Simulate user opening a filter
document.querySelector('#filter-button').addEventListener('click', monitorFilterInteraction);
I can wrap this pattern into a helper class to make it cleaner for tracking many different user actions.
class CustomTimingTracker {
constructor() {
this.timers = new Map();
}
start(timerName) {
const startMark = `${timerName}-start`;
performance.mark(startMark);
this.timers.set(timerName, startMark);
}
end(timerName, metadata = {}) {
const startMark = this.timers.get(timerName);
if (!startMark) {
console.error(`Timer "${timerName}" was not started.`);
return;
}
const endMark = `${timerName}-end`;
performance.mark(endMark);
performance.measure(timerName, startMark, endMark);
const measures = performance.getEntriesByName(timerName);
const duration = measures[measures.length - 1].duration;
// Clean up old marks to avoid memory buildup
performance.clearMarks(startMark);
performance.clearMarks(endMark);
performance.clearMeasures(timerName);
this.timers.delete(timerName);
// Report with context
this.reportTiming(timerName, duration, metadata);
}
reportTiming(name, duration, metadata) {
const data = {
name,
duration,
timestamp: Date.now(),
path: window.location.pathname,
...metadata
};
// Use sendBeacon for reliable delivery, even during page unload
navigator.sendBeacon('/api/user-timing', JSON.stringify(data));
}
}
const tracker = new CustomTimingTracker();
// Usage in my app
tracker.start('add-to-cart-process');
// ... after the item is confirmed in the cart
tracker.end('add-to-cart-process', { productId: 'ABC123', cartSize: 5 });
Now I have data that is specific to my application's logic. I can see which features are fast and which are slow, and I can prioritize optimization efforts based on real user impact.
4. Following a Request Through the System: Distributed Tracing
Modern applications are rarely one single piece. A click in the browser might trigger a call to an API gateway, which calls a user service, which queries a database, and then calls a payment service. If the whole process is slow, which part is to blame?
Distributed tracing solves this. The idea is to generate a unique ID for the initial request (the "trace") and pass it through every subsequent service call. Each step of the journey is recorded as a "span" with its own timing and metadata. At the end, you can see the entire flow in one place.
Implementing this on the frontend means instrumenting our network requests (like fetch or XMLHttpRequest) to add these trace headers and record spans.
Here’s a simplified version of how I might wrap the fetch API.
class FrontendTracer {
constructor(serviceName) {
this.serviceName = serviceName;
}
instrumentFetch() {
const originalFetch = window.fetch;
const self = this;
window.fetch = async function(resource, init = {}) {
// Generate or get existing trace/span IDs
const traceId = self.getCurrentTraceId() || self.generateId();
const spanId = self.generateId();
// Start timing this span
const startTime = performance.now();
// Prepare headers, adding trace context
const headers = new Headers(init.headers || {});
headers.set('X-Trace-ID', traceId);
headers.set('X-Span-ID', spanId);
headers.set('X-Parent-Span-ID', self.getCurrentSpanId() || '');
try {
// Make the request with the new headers
const response = await originalFetch(resource, { ...init, headers });
const duration = performance.now() - startTime;
// Record this span as successful
self.recordSpan({
traceId,
spanId,
name: `fetch:${resource}`,
service: self.serviceName,
duration,
startTime,
tags: {
httpMethod: init.method || 'GET',
httpStatus: response.status,
url: resource
}
});
return response;
} catch (error) {
const duration = performance.now() - startTime;
// Record this span as failed
self.recordSpan({
traceId,
spanId,
name: `fetch:${resource}`,
service: self.serviceName,
duration,
startTime,
tags: {
httpMethod: init.method || 'GET',
url: resource,
error: error.message
},
error: true
});
throw error; // Re-throw the original error
}
};
}
generateId() {
return Math.random().toString(36).substring(2) + Date.now().toString(36);
}
getCurrentTraceId() {
// In a real app, this might be stored in a context or global state
return window.__ACTIVE_TRACE_ID;
}
getCurrentSpanId() {
return window.__ACTIVE_SPAN_ID;
}
recordSpan(spanData) {
// Send span data to a backend collector
const spanPayload = {
...spanData,
timestamp: Date.now()
};
// Use sendBeacon for non-blocking, reliable delivery
navigator.sendBeacon('/api/spans', JSON.stringify(spanPayload));
}
}
// Initialize the tracer for my "web-store-frontend" service
const tracer = new FrontendTracer('web-store-frontend');
tracer.instrumentFetch();
Now, every outgoing request carries a trace ID. When my backend services also read these headers and forward them, the entire chain of events can be pieced together in a tool like Jaeger or Zipkin. I can see that a slow checkout was caused by a delay in the payment service, not my frontend code.
5. Catching and Understanding Errors
Errors will happen. A network request fails, a third-party library throws an exception, or a user encounters a state you didn't anticipate. Silently logging to the console is not enough. We need to capture these errors with context so we can fix them.
We need to listen for global errors, promise rejections, and even intercept console.error calls. For each error, we should gather as much useful information as possible: the error message, the stack trace, the user's browser, the page they were on, and even what the performance was like at that moment.
class ErrorCatcher {
constructor() {
this.errors = [];
this.setupGlobalHandlers();
}
setupGlobalHandlers() {
// Catch standard JS errors
window.addEventListener('error', (event) => {
this.capture({
type: 'global_error',
message: event.message,
file: event.filename,
line: event.lineno,
column: event.colno,
errorStack: event.error?.stack,
timestamp: new Date().toISOString()
});
// Optional: Prevent default browser error reporting if needed
// event.preventDefault();
});
// Catch unhandled Promise rejections
window.addEventListener('unhandledrejection', (event) => {
this.capture({
type: 'unhandled_rejection',
reason: event.reason?.message || String(event.reason),
promise: event.promise,
timestamp: new Date().toISOString()
});
});
// Optionally, catch console.errors (useful for captured library errors)
const originalConsoleError = console.error;
console.error = (...args) => {
this.capture({
type: 'console_error',
message: args.join(' '),
timestamp: new Date().toISOString(),
stack: new Error().stack // Get a stack trace for the console call
});
// Call the original to still see it in dev tools
originalConsoleError.apply(console, args);
};
}
capture(errorInfo) {
// Add contextual information
const enrichedError = {
...errorInfo,
url: window.location.href,
userAgent: navigator.userAgent,
language: navigator.language,
viewport: `${window.innerWidth}x${window.innerHeight}`,
// Add performance context at the moment of error
memory: performance.memory ? {
usedMB: Math.round(performance.memory.usedJSHeapSize / (1024 * 1024)),
totalMB: Math.round(performance.memory.totalJSHeapSize / (1024 * 1024))
} : null,
// Get the page's load performance
pageLoadTiming: this.getPageLoadTiming()
};
this.errors.push(enrichedError);
this.sendToServer(enrichedError);
}
getPageLoadTiming() {
const navEntry = performance.getEntriesByType('navigation')[0];
if (navEntry) {
return {
dnsTime: navEntry.domainLookupEnd - navEntry.domainLookupStart,
tcpTime: navEntry.connectEnd - navEntry.connectStart,
requestTime: navEntry.responseEnd - navEntry.requestStart,
domReadyTime: navEntry.domContentLoadedEventEnd - navEntry.startTime,
fullLoadTime: navEntry.loadEventEnd - navEntry.startTime
};
}
return null;
}
sendToServer(errorData) {
// Use fetch with keepalive or sendBeacon to ensure it's sent even during page unload
const url = '/api/client-errors';
const data = JSON.stringify(errorData);
if (navigator.sendBeacon) {
navigator.sendBeacon(url, data);
} else {
// Fallback for older browsers
fetch(url, {
method: 'POST',
body: data,
headers: { 'Content-Type': 'application/json' },
keepalive: true // Important for error reporting on page exit
});
}
}
}
const errorCatcher = new ErrorCatcher();
With this in place, I no longer have to ask users, "What did you do before it broke?" The error report tells me their browser, the page, the sequence of events, and the state of the page's performance. This turns vague bug reports into specific, actionable tickets.
6. Watching Resource Load Performance
A page is made up of many files: HTML, JavaScript, CSS, images, fonts. If one of these is very large or takes a long time to download, it holds up the entire page. The Resource Timing API lets us see the detailed timeline of every single resource loaded by the page.
We can use a PerformanceObserver to capture this data as well.
function monitorResourceLoads() {
const resourceObserver = new PerformanceObserver((list) => {
for (const entry of list.getEntries()) {
// Only care about resources that took a significant time
if (entry.duration > 1000) { // 1 second threshold
console.log(`Slow resource: ${entry.name}`, {
type: entry.initiatorType, // 'script', 'link', 'img', etc.
size: entry.transferSize, // bytes transferred
duration: entry.duration,
dnsTime: entry.domainLookupEnd - entry.domainLookupStart,
tcpTime: entry.connectEnd - entry.connectStart,
requestTime: entry.responseEnd - entry.requestStart
});
// Flag particularly large JavaScript or CSS files
if ((entry.initiatorType === 'script' || entry.initiatorType === 'link') && entry.transferSize > 500000) {
reportSlowAsset(entry.name, entry.transferSize);
}
}
}
});
// Start observing resource entries
resourceObserver.observe({ entryTypes: ['resource'] });
}
// Call this early in page life
monitorResourceLoads();
This technique helped me find a problem where a marketing analytics script from a third party was taking over 4 seconds to load on mobile networks. Because I could see it in the resource timing data, I had the evidence to push back and request a lighter-weight alternative or load it asynchronously.
7. Measuring Memory Usage Over Time
Memory leaks in single-page applications are a common source of gradual performance decline. The user starts with a snappy app, but after using it for 10 minutes, it becomes sluggish and eventually crashes. JavaScript's garbage collector manages memory, but if you accidentally keep references to DOM elements or large objects, that memory can never be freed.
The performance.memory API (available in Chrome) gives insight into the memory heap. While you can't get a full memory profile from this, you can watch for trends.
class MemoryWatchdog {
constructor(sampleInterval = 30000) { // Sample every 30 seconds
this.samples = [];
this.intervalId = null;
this.start(sampleInterval);
}
start(interval) {
if (!performance.memory) {
console.warn('performance.memory API not available.');
return;
}
this.intervalId = setInterval(() => {
this.sample();
}, interval);
}
sample() {
const mem = performance.memory;
const sample = {
timestamp: Date.now(),
usedMB: Math.round(mem.usedJSHeapSize / (1024 * 1024)),
totalMB: Math.round(mem.totalJSHeapSize / (1024 * 1024))
};
this.samples.push(sample);
console.log(`Memory: ${sample.usedMB}MB / ${sample.totalMB}MB used`);
// Check for a steady upward trend, indicating a possible leak
if (this.samples.length > 10) {
const recent = this.samples.slice(-10);
const trend = this.calculateTrend(recent.map(s => s.usedMB));
if (trend > 5) { // Increasing by more than 5 MB per sample on average
console.warn(`Potential memory leak detected. Trend: +${trend.toFixed(1)}MB per sample.`);
this.reportLeakWarning(recent);
}
}
// Keep only the last 100 samples
if (this.samples.length > 100) {
this.samples = this.samples.slice(-100);
}
}
calculateTrend(values) {
// Simple linear trend calculation
const n = values.length;
const indices = Array.from({ length: n }, (_, i) => i);
const sumX = indices.reduce((a, b) => a + b, 0);
const sumY = values.reduce((a, b) => a + b, 0);
const sumXY = indices.reduce((sum, x, i) => sum + x * values[i], 0);
const sumX2 = indices.reduce((sum, x) => sum + x * x, 0);
const slope = (n * sumXY - sumX * sumY) / (n * sumX2 - sumX * sumX);
return slope;
}
reportLeakWarning(sampleData) {
const data = {
type: 'memory_leak_warning',
samples: sampleData,
url: window.location.href
};
navigator.sendBeacon('/api/memory-alert', JSON.stringify(data));
}
stop() {
if (this.intervalId) {
clearInterval(this.intervalId);
}
}
}
// Be cautious with this in production. Sampling itself has a cost.
// It's often better used during development or for specific user sessions.
// const memWatcher = new MemoryWatchdog();
This watchdog won't tell you what is leaking, but it's a strong signal that you need to open your browser's DevTools Memory profiler and investigate further.
8. Putting It All Together: A Coherent Monitoring Strategy
Individually, these techniques are useful. Together, they form a powerful observability system. The key is to connect the data points. When an error occurs, can I see the trace ID from the failed network request? When a user reports slowness, can I check the Core Web Vitals and Long Tasks for their session?
Here’s a conceptual "orchestrator" class that ties a few of these pieces together for a cohesive user session.
class SessionMonitor {
constructor(sessionId) {
this.sessionId = sessionId;
this.vitals = new VitalSignsMonitor();
this.errorCatcher = new ErrorCatcher();
this.tracker = new CustomTimingTracker();
this.pageChanges = 0;
this.captureInitialPerf();
this.setupVisibilityListeners();
}
captureInitialPerf() {
// Capture the initial page load performance once everything is ready
window.addEventListener('load', () => {
setTimeout(() => { // Wait a tick for all late actions to settle
const navTiming = performance.getEntriesByType('navigation')[0];
if (navTiming) {
this.tracker.reportTiming('full_page_load', navTiming.loadEventEnd - navTiming.startTime, { isInitial: true });
}
}, 0);
});
}
setupVisibilityListeners() {
// Track how users interact with the page (tab switches, minimizes)
document.addEventListener('visibilitychange', () => {
this.logEvent('visibility_change', { state: document.visibilityState });
});
}
logEvent(eventName, details) {
const event = {
sessionId: this.sessionId,
event: eventName,
timestamp: Date.now(),
details
};
// Send event stream to backend
navigator.sendBeacon('/api/user-events', JSON.stringify(event));
}
// A method to call for starting a monitored user journey
startJourney(journeyName, startDetails) {
const journeyId = `${journeyName}_${Date.now()}`;
this.logEvent('journey_started', { journeyName, journeyId, ...startDetails });
this.tracker.start(journeyId);
return {
end: (endDetails) => {
this.tracker.end(journeyId, endDetails);
this.logEvent('journey_ended', { journeyName, journeyId, ...endDetails });
},
step: (stepName) => {
const stepId = `${journeyId}_step_${stepName}`;
this.tracker.start(stepId);
return {
end: () => this.tracker.end(stepId)
};
}
};
}
}
// On application startup
const sessionId = generateSessionId(); // Store this in sessionStorage
const monitor = new SessionMonitor(sessionId);
// Example: Monitoring a video upload flow
const uploadJourney = monitor.startJourney('video_upload', { fileSize: 1250000 });
const encodeStep = uploadJourney.step('encode_server_call');
// Simulate the encode API call
fetch('/api/encode', { method: 'POST', body: videoData })
.then(() => {
encodeStep.end();
uploadJourney.end({ status: 'success' });
})
.catch((err) => {
encodeStep.end();
uploadJourney.end({ status: 'error', message: err.message });
});
With this structure, I can reconstruct a user's session. I can see they loaded the page (initial perf data), clicked on "Upload" (custom timing started), made a server call (distributed trace), and then either succeeded or hit an error (captured by the error handler, linked by the session and journey IDs).
The Goal is Clarity, Not Just Data
The biggest mistake I made early on was collecting too much data without a plan. I ended up with oceans of numbers and no insight. The goal isn't to log everything. The goal is to ask specific questions and instrument your code to answer them.
- Is the user's first experience fast? → Measure Core Web Vitals.
- Does the app feel responsive during use? → Monitor Long Tasks and custom interaction timings.
- When a process fails, why did it fail? → Implement error catching with context and distributed tracing.
- Is performance getting worse over time? → Track key metrics over sessions and set up alerts for regressions.
Start small. Add basic error catching and Core Web Vitals monitoring to your next project. Then, as you encounter specific performance questions, add the custom metrics or traces needed to answer them. Over time, you will build a comprehensive monitoring system that gives you confidence. You'll know how your application performs in the wild, and you'll be able to fix problems before most users ever notice them.
📘 Checkout my latest ebook for free on my channel!
Be sure to like, share, comment, and subscribe to the channel!
101 Books
101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.
Check out our book Golang Clean Code available on Amazon.
Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!
Our Creations
Be sure to check out our creations:
Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | Java Elite Dev | Golang Elite Dev | Python Elite Dev | JS Elite Dev | JS Schools
We are on Medium
Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva
Top comments (0)