DEV Community

Cover image for Product Spotlight: How Apogee Watcher Discovers Pages Automatically
Apogee Watcher
Apogee Watcher

Posted on • Originally published at apogeewatcher.com

Product Spotlight: How Apogee Watcher Discovers Pages Automatically

You cannot monitor what you have not listed. For a single marketing site, that list might live in a spreadsheet. For an agency with dozens of properties, each with new landing pages, campaign paths, and refactors, the list rots within weeks. Someone adds /pricing/v2 and nobody updates the monitor. A client launches a seasonal URL and your Core Web Vitals monitoring checklist still points at last quarter’s sitemap export.

This post is a product spotlight on how Apogee Watcher discovers pages automatically so your PageSpeed coverage stays aligned with what is actually on the site—not with what someone remembered to paste into a config file.

The problem: static URL lists do not scale

Manual URL maintenance fails in predictable ways:

  • Onboarding friction — Every new client site means transcribing URLs from a crawl export or guessing priorities.
  • Drift — Product and marketing teams ship pages continuously; monitoring configs rarely keep pace.
  • Hidden risk — High-traffic templates (PLPs, hubs, key funnels) often live several clicks from the homepage. If they never make it into your test set, you will not see regressions until search, ads, or support tell you.

Automated PageSpeed monitoring only helps if the set of URLs under test reflects reality. Discovery is the bridge between “we run Lighthouse on a schedule” and “we run Lighthouse on the pages that matter.”

What “automatic page discovery” should mean

A serious discovery flow does more than grab the homepage:

  1. Prefer structured sources — Read the site’s own inventory (sitemap.xml, sitemap indexes, nested sitemaps) before guessing from links.
  2. Fall back gracefully — When sitemaps are missing, incomplete, or stale, follow internal links within the same domain with sensible depth and rate limits.
  3. Stay within bounds — Cap how many URLs you ingest per run so large sites do not overwhelm quotas or your monitoring plan.
  4. Make ownership obvious — Show which URLs were found automatically versus added manually, so teams can trust and prune the list.
  5. Fit agency workflows — Discovery should support portfolios: many sites, each with its own rules, without a separate crawl script per client.

That is the bar we designed Apogee Watcher against.

How Apogee Watcher discovers pages

Apogee Watcher uses a sitemap-first, crawl-second pipeline for external sites (no access to your clients’ codebases required).

Sitemap discovery (primary)

For most production sites, the canonical list of important URLs already exists in XML:

  • Standard locations such as /sitemap.xml and sitemap index files.
  • Recursive handling of sitemap indexes so nested sitemaps are followed.
  • Support for compressed sitemaps where applicable.

When a sitemap is healthy, you get broad coverage quickly—often including sections marketing forgot to mention in the handover doc.

HTML crawling (fallback)

When sitemaps are absent, blocked, or incomplete, Watcher can crawl HTML using the same host:

  • Follow links that stay on the same domain so discovery does not wander onto third parties.
  • Respect depth and delay settings so crawls remain predictable.

Crawling is a safety net, not a replacement for a good sitemap—but in the real world, you need both.

Controlled scope

Discovery runs honour configuration for maximum URLs and maximum crawl depth, aligned with how agencies actually operate: you want coverage, not an accidental import of ten thousand tracking-parameter variants.

Syncing with your monitoring inventory

Discovered URLs are synced into your page list and marked as auto-discovered, so you can filter, review, and deactivate what should not be tested—without losing the audit trail of what the system found.

Together with scheduled PageSpeed Insights-based tests, performance budgets, and email alerts when scores breach thresholds, discovery closes the loop from “what exists on the site” to “what we measure continuously.”

Why this matters beyond monitoring

Reliable URL inventory is also the substrate for other workflows. Once pages are known and tested, the same performance signals can feed prioritisation—what to fix first, what to show a client, and how prospecting fits alongside monitoring. Our introduction to that pipeline is in From monitoring to pipeline: why PageSpeed data works for agency prospecting; this spotlight focuses on the discovery piece that makes the rest possible.

If you are new to how Core Web Vitals fit into the story, start with What are Core Web Vitals? A practical guide for 2026—then come back here for how Watcher keeps the URL list honest.

Running discovery in Apogee Watcher

From the site view in the admin, use the Discover Pages action. A modal lets you choose discovery method (sitemap or HTML crawl), caps for URLs and depth, whether to respect robots.txt, optional custom sitemap URL, and include/exclude patterns where you need tighter control. After the run, you see counts for discovered versus skipped URLs and any errors—then new pages appear in your inventory marked as auto-discovered, ready for scheduling alongside the pages you added by hand.

What you can achieve

You can move from brittle spreadsheets to a living inventory of URLs that updates when you run discovery—so your lab scores and trends track real site structure, not last month’s export. That is how agencies keep Core Web Vitals coverage credible without hiring someone to babysit URL lists.


Join the early-access waitlist if you want multi-site PageSpeed monitoring with automated discovery, budgets, and alerts built for agency portfolios.

FAQ

Does Apogee Watcher replace my SEO crawler?

No. Discovery is optimised to build and refresh a monitoring URL set, not to replace full SEO audits. Use your SEO tools for indexation and content strategy; use Watcher to keep performance tests aligned with the URLs you care about.

What if my client’s sitemap is wrong?

You can add or remove pages manually, adjust discovery settings, and re-run discovery when the site changes. Auto-discovered pages are labelled so you can audit the list.

Will discovery hit my PageSpeed API quota?

Discovery itself ingests URLs; scheduled tests consume PSI quota. Caps on URLs and crawl behaviour help keep both discovery and testing within plan limits—see API usage in the product for your tier.

Is crawling allowed on every site?

You should follow each client’s robots.txt policy and contractual terms. Watcher exposes options (including respect for robots rules in configuration) so teams can align discovery with site owner expectations.

Top comments (0)