How to build a visual uptime monitor with Go and headless Chrome

#go #tutorial #webdev #devops

Most uptime monitors work by making an HTTP request and checking the response code. It's fast, cheap, and catches about half the things that actually go wrong in production.

The other half - JavaScript crashes, CDN serving stale cache, React hydration failures, missing elements - only show up when you look at what the page actually renders, not what the server returns.

This is a walkthrough of how I built GrabDiff - the visual monitoring piece specifically: capture screenshots with headless Chrome, diff them against a baseline, and send an alert when something looks wrong. I'll use Go and chromedp, which is what GrabDiff runs under the hood. If you'd rather just use the thing than build it, GrabDiff has a free plan with three monitors and no card required.

The architecture

The core loop is simple:

On a schedule, capture a screenshot of a URL using headless Chrome
Compare it pixel-by-pixel against a stored baseline image
If the diff percentage exceeds a threshold, send an alert with the diff image attached
Otherwise, store the new screenshot (optionally updating the baseline over time)

The interesting engineering is in steps 2 and 3 - getting the diff right and making alerts actionable.

Capturing screenshots with chromedp

chromedp is a Go library that controls Chrome via the DevTools Protocol. It handles the browser lifecycle, navigation, and screenshot capture.

package screenshot

import (
    "context"
    "time"

    "github.com/chromedp/chromedp"
)

func Capture(url string) ([]byte, error) {
    opts := append(chromedp.DefaultExecAllocatorOptions[:],
        chromedp.Flag("headless", true),
        chromedp.Flag("disable-gpu", true),
        chromedp.Flag("no-sandbox", true),
        chromedp.WindowSize(1280, 800),
    )

    allocCtx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
    defer cancel()

    ctx, cancel := chromedp.NewContext(allocCtx)
    defer cancel()

    ctx, cancel = context.WithTimeout(ctx, 30*time.Second)
    defer cancel()

    var buf []byte
    err := chromedp.Run(ctx,
        chromedp.Navigate(url),
        chromedp.Sleep(2*time.Second), // wait for JS to settle
        chromedp.FullScreenshot(&buf, 90),
    )
    if err != nil {
        return nil, err
    }
    return buf, nil
}

A few things worth noting:

chromedp.Sleep(2*time.Second) is a blunt instrument but effective. For most pages, 2 seconds is enough for the JavaScript to execute and the page to reach a stable state. For pages with complex async data fetching you might need more, or you can use chromedp.WaitVisible to wait for a specific element.

chromedp.FullScreenshot captures the entire page height, not just the viewport. This is usually what you want for monitoring - you care about the whole page, not just what happens to be visible above the fold.

The 90 in FullScreenshot is JPEG quality. You can use chromedp.CaptureScreenshot instead for PNG (larger files, lossless).

Pixel diffing

Once you have a screenshot, you need to compare it against the baseline. The core operation is straightforward: decode both images, iterate over pixels, count how many differ by more than some per-channel threshold.

package screenshot

import (
    "bytes"
    "image"
    "image/color"
    _ "image/jpeg"
    "image/png"
    "math"
)

type DiffResult struct {
    DiffPercent float64
    DiffImage   []byte // PNG with differences highlighted
}

func Diff(baseline, current []byte) (*DiffResult, error) {
    baseImg, _, err := image.Decode(bytes.NewReader(baseline))
    if err != nil {
        return nil, err
    }
    currImg, _, err := image.Decode(bytes.NewReader(current))
    if err != nil {
        return nil, err
    }

    bounds := baseImg.Bounds()
    diffImg := image.NewRGBA(bounds)

    var diffPixels int
    totalPixels := bounds.Dx() * bounds.Dy()

    for y := bounds.Min.Y; y < bounds.Max.Y; y++ {
        for x := bounds.Min.X; x < bounds.Max.X; x++ {
            br, bg, bb, _ := baseImg.At(x, y).RGBA()
            cr, cg, cb, _ := currImg.At(x, y).RGBA()

            // RGBA() returns values in [0, 65535]
            dr := math.Abs(float64(br) - float64(cr))
            dg := math.Abs(float64(bg) - float64(cg))
            db := math.Abs(float64(bb) - float64(cb))

            // threshold: 10% channel difference (6553 out of 65535)
            if dr > 6553 || dg > 6553 || db > 6553 {
                diffPixels++
                // highlight in red
                diffImg.Set(x, y, color.RGBA{R: 255, G: 0, B: 0, A: 255})
            } else {
                // keep original, slightly dimmed for context
                r, g, b, a := currImg.At(x, y).RGBA()
                diffImg.Set(x, y, color.RGBA{
                    R: uint8(r>>8) / 2,
                    G: uint8(g>>8) / 2,
                    B: uint8(b>>8) / 2,
                    A: uint8(a >> 8),
                })
            }
        }
    }

    diffPercent := float64(diffPixels) / float64(totalPixels) * 100

    var buf bytes.Buffer
    if err := png.Encode(&buf, diffImg); err != nil {
        return nil, err
    }

    return &DiffResult{
        DiffPercent: diffPercent,
        DiffImage:   buf.Bytes(),
    }, nil
}

The diff image produced here shows changed pixels in red against a dimmed version of the current screenshot. This gives you at a glance where the change is - useful when you're trying to tell whether it's a minor layout shift or something more serious.

On threshold tuning: 1% is a good starting point for DiffPercent. Anything above that is almost certainly a real change. Below 0.1% is usually antialiasing noise. The right number depends on how dynamic your pages are.

Storing baselines

You need to store the baseline image somewhere. For a simple setup, an object store (S3, Backblaze B2, R2) works well - store the baseline under a key like {monitor_id}/baseline.jpg and update it when the user explicitly marks a new baseline.

// On first check, or when user resets the baseline
func (s *Store) SetBaseline(ctx context.Context, monitorID string, img []byte) error {
    key := fmt.Sprintf("%s/baseline.jpg", monitorID)
    return s.upload(ctx, key, img, "image/jpeg")
}

// On each check
func (s *Store) GetBaseline(ctx context.Context, monitorID string) ([]byte, error) {
    key := fmt.Sprintf("%s/baseline.jpg", monitorID)
    return s.download(ctx, key)
}

One design decision worth thinking about: should the baseline update automatically? There are arguments either way. If you update it automatically after every "clean" check, you adapt to intentional page changes without manual intervention. If you require explicit resets, every change that slips past your threshold accumulates silently, and you'll eventually be diffing against something that looks nothing like your original known-good state.

GrabDiff requires explicit baseline resets. The reasoning: if you're updating the baseline automatically, you can drift into a state where "clean" means "whatever the page looked like yesterday" rather than "the page as I intended it." Explicit resets keep you honest.

Alerting

When DiffPercent exceeds your threshold, you want to notify someone fast. The two most useful channels are email (with the diff image attached) and webhooks.

type Alert struct {
    MonitorURL  string
    DiffPercent float64
    DiffImage   []byte
    CheckedAt   time.Time
}

func (s *EmailSender) SendAlert(ctx context.Context, to string, a Alert) error {
    body := fmt.Sprintf(
        "Visual change detected on %s\n\nDiff: %.2f%% of pixels changed\nChecked at: %s\n\nSee attached diff image.",
        a.MonitorURL, a.DiffPercent, a.CheckedAt.Format(time.RFC1123),
    )

    msg := gomail.NewMessage()
    msg.SetHeader("From", s.from)
    msg.SetHeader("To", to)
    msg.SetHeader("Subject", fmt.Sprintf("[GrabDiff] Change detected: %s", a.MonitorURL))
    msg.SetBody("text/plain", body)
    msg.AttachReader("diff.png", bytes.NewReader(a.DiffImage))

    return s.dialer.DialAndSend(msg)
}

The diff image as an attachment is the key thing. An alert that just says "something changed" is nearly useless - you have to go look at the site yourself to know if it matters. An alert with a diff image attached tells you immediately whether this is "someone changed a button color" or "the entire main content area is gone."

Scheduling

For the scheduling loop, a simple approach is a ticker per monitor:

func (w *Worker) Run(ctx context.Context, monitor Monitor) {
    ticker := time.NewTicker(monitor.Interval)
    defer ticker.Stop()

    for {
        select {
        case <-ctx.Done():
            return
        case <-ticker.C:
            if err := w.check(ctx, monitor); err != nil {
                slog.Error("check failed", "monitor", monitor.ID, "err", err)
            }
        }
    }
}

For anything beyond a handful of monitors, you'll want a proper job queue (River, Asynq, or even a simple Postgres-backed queue) rather than in-process goroutines. In-process schedulers don't survive restarts gracefully and make horizontal scaling harder.

The tradeoffs you'll hit in production

False positives from dynamic content. Timestamps, "last updated" labels, live counters, personalized greetings - these change on every screenshot and will trigger alerts constantly. You either need to mask those regions before diffing, or accept a higher threshold that makes you insensitive to small changes everywhere.

Headless Chrome resource usage. A single Chrome instance capturing a screenshot uses roughly 200-400MB of RAM and takes 3-8 seconds depending on page complexity. If you're running hundreds of monitors at frequent intervals, you need a pool of browser instances and careful scheduling to avoid spiking resource usage.

Authentication. Monitoring pages behind a login requires scripting the auth flow:

chromedp.Run(ctx,
    chromedp.Navigate("https://app.example.com/login"),
    chromedp.SendKeys(`input[name="email"]`, email),
    chromedp.SendKeys(`input[name="password"]`, password),
    chromedp.Click(`button[type="submit"]`),
    chromedp.WaitVisible(`#dashboard`),
    chromedp.Navigate(targetURL),
    chromedp.FullScreenshot(&buf, 90),
)

This works, but it's fragile - login flows change, CAPTCHAs appear, session handling has edge cases. Plan for maintenance.

SSRF. If you're accepting URLs from users, you need to validate them against a blocklist before passing them to Chrome. Users will point your monitor at http://169.254.169.254/latest/meta-data/ or internal network addresses. Validate the resolved IP against RFC 1918 and link-local ranges before making any request.

What this gets you

A working version of the above - capture, diff, alert - is maybe 500 lines of Go. It'll catch blank pages, missing elements, major layout regressions, and CDN serving stale content. It won't catch every failure, but it catches the ones that HTTP monitors miss entirely.

If you want to run it yourself, the full approach is essentially what I described. If you'd rather not manage the Chrome instances and the storage and the scheduling, I built GrabDiff to handle all of that - it does the screenshot diffing, sends you the diff image in the alert, and handles SSL/domain/cron monitoring alongside the visual checks. Free plan, three monitors, no credit card.

The point is that "HTTP 200" and "the page works" are not the same thing, and the gap between them is where the interesting production failures live. Visual monitoring is how you close that gap.