Touseef Ibn Khaleel

Posted on Jan 15

From Lag Spike to Lightning Fast: The Engineering Behind Gaze Guard's Massive Performance Overhaul

#webdev #ai #tutorial #opensource

As developers, we know the pain of a slow browser extension. The moment a tool designed to help you starts hogging resources, it becomes a liability. That was the challenge we faced with Gaze Guard, our image classification extension. The initial, naive implementation — a simple "scan everything all the time" approach — was functional but created noticeable performance bottlenecks on image-heavy sites.

We decided to go back to the drawing board and completely re-engineer the core scanning logic. The goal was simple: deliver powerful, real-time image classification with near-zero impact on the user's browsing experience. This article breaks down the eight key architectural changes that transformed Gaze Guard from a resource hog into a performance powerhouse.

1. Viewport-Based Scanning: The IntersectionObserver Revolution

The most significant change was moving away from classifying every image on the page immediately. This led to massive, synchronous classification spikes on page load.

The Fix: We adopted a viewport-based scanning strategy using the IntersectionObserver API.

Lazy Discovery: Images are only processed when they come near the viewport, using a generous rootMargin.
Minimal Workload: As soon as an image is handled (blurred or cleared), it is immediately unobserved. This keeps the observer's workload small and prevents the system from being overwhelmed by hundreds or thousands of off-screen elements.

This change alone eliminated the "big synchronous classification spikes" that plagued the previous version, ensuring a smooth initial page load.

2. Event-Driven DOM Updates: Replacing Polling with MutationObserver

The old architecture relied on "aggressive interval scanning" — a fixed timer that constantly woke up to check for new images. This is a classic anti-pattern for performance.

The Fix: We replaced polling with an event-driven approach using MutationObserver.

Initial Scan: A single, non-recurring scanImagesOnce() handles the initial page content.
Dynamic Content: For sites with infinite scroll or dynamic feeds, a MutationObserver watches for added nodes. It only scans the subtrees of the newly added elements.
Throttling: To handle bursts of DOM changes, the observer's callback is debounced by 100ms, collapsing multiple mutations into a single, efficient scan.

The result is that work only happens when the page actually changes, eliminating the constant, unnecessary overhead of fixed timers.

3. Batching and Throttling the ML Pipeline

Even with smart discovery, the machine learning classification process is the heaviest part of the workload. Running a large batch of classifications synchronously will inevitably freeze the UI.

The Fix: We introduced batching and cooperative scheduling for the ML pipeline.

Batch Processing: Images waiting for classification are put into an analysisQueue. The processQueue() function handles them in small batches (e.g., BATCH_SIZE = 5).
Yielding to the Browser: Crucially, between batches, the process yields to the browser using a tiny setTimeout(..., 5) and tf.nextFrame(). This allows layout and paint operations to run, keeping the UI responsive and preventing jank.
Timeouts: Each classification is wrapped in a timeout (TIMEOUT_MS = 3000). This prevents "stuck" images (due to slow network or bad CORS) from blocking the entire queue.

4. Caching Classification Results to Avoid Rework

Why classify the same image twice? On sites with repeated elements, or during back/forward navigation, re-running the ML model is pure waste.

The Fix: A robust, multi-layered verdict cache.

In-Memory Cache: srcVerdicts provides fast, O(1) lookups for the current session.
Persistent Cache: persistentVerdicts stores results in chrome.storage.local, ensuring that if a user navigates away and comes back, or even restarts the browser, known images are instantly recognized.

When an image is spotted, the extension checks the cache first. If a verdict exists, it immediately applies the blur or clears the image, skipping the entire classification pipeline.

5. Smarter and Cheaper DOM Operations (Micro-Optimizations)

Performance often comes down to avoiding expensive browser operations, particularly those that trigger layout thrashing (reflows).

Optimization	Old Approach (Expensive)	New Approach (Efficient)	Why it Matters
Image Dimensions	Accessing `element.width/height`	Using `naturalWidth / naturalHeight`	Accessing `element.width/height` can force a synchronous layout/reflow.
Background Images	Using `getComputedStyle`	Using only inline `element.style.backgroundImage`	`getComputedStyle` can also trigger reflows, while inline style access is cheap.
Irrelevant Content	Scanning all images	Skipping images smaller than 16x16px	Avoids wasting cycles on icons, tracking pixels, and other irrelevant elements.
Vector Graphics	Feeding SVGs to the raster classifier	Short-circuiting SVGs as "safe"	Prevents feeding vector graphics into a model designed for raster images, saving classification time.
DOM Traversal	`querySelectorAll('*')` deep scan	Targeted `findAllImages()` strategy	The previous deep scan was explicitly marked as "too expensive" and replaced with a targeted search for `<img>` tags and selective iframe handling.

6. More Efficient Model and Backend Usage

The machine learning model itself needs to be managed efficiently.

Model Caching: The NSFWJS model is loaded once and cached in memory, preventing repeated heavy initializations.
TensorFlow Backend: ensureTfReady() is used to enable TensorFlow.js production mode and allow it to select the best available backend (WebGL, WASM, etc.), rather than forcing a slower CPU-based backend.
Clean Cleanup: Dangerous blanket cleanup calls like tf.disposeVariables() (which could wipe model weights) were removed, ensuring the model stays "warm" and ready for immediate use.

7. Background Fetching for Problematic Images

Some images, particularly those with strict CORS policies or special hosting, can fail to load in the content script, leading to repeated, failing attempts.

The Fix: We defer to the background service worker (background.js).

For images that fail to load directly, the content script sends a message to the background worker, which fetches the image and returns a data URI. This offloads network details and retries to Chrome's more robust extension architecture, keeping the main content script simpler and less prone to getting stuck.

8. Turning Off Heavy Work on Unsupported Contexts

Finally, we prevent the extension from running in contexts where it can't or shouldn't work, such as PDF viewers.

Robust Detection: A comprehensive isPdfEnvironment() check detects PDF viewers based on URLs, content types, and embedded tags.
Immediate Stop: When a PDF is detected, stopAll() is called immediately. This cancels all observers, clears queues, and removes any existing blur markers, eliminating all overhead in a context where image classification is not useful.

Conclusion: Performance as a Feature

The performance improvements in Gaze Guard are not just a nice-to-have; they are a core feature. By applying principles of lazy loading, event-driven architecture, cooperative scheduling, and aggressive caching, we've managed to integrate a heavy machine learning workload into the browser without compromising the user experience.

This journey highlights that in extension development, optimizing for the DOM and the browser's event loop is just as critical as optimizing the core algorithm. We hope this deep dive provides valuable insights for anyone building performance-sensitive tools for the web.

DEV Community