Spencer Johnson's Who Moved My Cheese? is about adapting when things change around you. In cloud security compliance, the cheese moves constantly. Microsoft renames a menu. Google moves a toggle to a different admin panel. Your carefully written implementation guide now points to a screen that no longer exists. Your client follows the old steps and hits a dead end.
We built an automated pipeline that crawls vendor documentation weekly, compares it against our internal implementation guides, and flags meaningful drift: renamed UI paths, deprecated features, relocated settings. It runs on Cloudflare Workers with a Durable Workflow, uses Claude for semantic analysis, and costs about 50 cents a week at scale. Here's how.
The Problem: Documentation Drift
We maintain implementation guides for security controls, such as "Enable MFA for Admin Accounts in Microsoft Entra" or "Configure Conditional Access Policies." Each guide has specific, click-by-click instructions referencing particular admin portal locations, feature names, and configuration options.
Cloud vendors update their platforms constantly. Microsoft alone pushes changes to Entra ID (formerly Azure AD) multiple times a month. When they rename a feature or reorganize an admin panel, our guides become incorrect. We call this documentation drift.
Before building this system, drift detection was entirely manual. A consultant would notice a discrepancy during a client engagement, report it, and someone would update the guide. The feedback loop was slow and reactive.
The Architecture: A Three-Gate Pipeline
We wanted something cheap to run, resistant to false alarms, and automated. The solution is a three-gate pipeline where each gate filters out unnecessary work before reaching the next, more expensive step.
Gate 1: RSS Check (Free, Milliseconds)
Microsoft Learn exposes an RSS endpoint that reports page modification dates. Before crawling anything, we check whether the vendor has even published changes since our last run. If the RSS timestamps haven't moved, we skip the entire source. This gate eliminates most checks in a typical week.
Gate 2: Content Hash (Cheap, Seconds)
When RSS indicates something might have changed, we crawl the vendor page and compute a SHA-256 hash of the markdown content. If the hash matches what we stored last time, the page content hasn't actually changed. Maybe the RSS date shifted for editorial reasons, or the page got a cosmetic update. Skip.
Gate 3: AI Drift Analysis (API Cost, Seconds)
Only when the content has genuinely changed do we send it to Claude for semantic comparison against our implementation guide. The AI isn't doing a text diff. It detects specific categories of drift:
- UI path changes: "Settings > Security" became "Protection > Authentication"
- Feature deprecation: a feature we reference no longer exists
- Step reordering: the vendor changed the sequence of configuration steps
- New prerequisites: a step now requires something it didn't before
- Setting relocation: a toggle moved from one admin panel to another
It deliberately ignores minor wording changes, formatting differences, and additive features that don't affect existing instructions.
Why a Companion Worker?
Our main application runs on SvelteKit with Cloudflare's adapter, which doesn't support Workflow classes or scheduled handlers. So the crawler runs as a separate Worker that shares the same D1 database. Two Workers, one database, clean separation of concerns.
The companion Worker uses Cloudflare Workflows for durability. Each doc source gets its own named steps. If a crawl job fails partway through, the workflow resumes from the last completed step rather than starting over. Durable sleep steps let the Worker wait for asynchronous crawl jobs without burning CPU time.
Dealing with a Flaky API
The Cloudflare Browser Rendering /crawl endpoint is asynchronous and occasionally unreliable. Jobs sometimes error out immediately for no apparent reason. We handle this with:
- 3 crawl attempts with 20-second backoff between retries
- 15-second stagger between sources to avoid rate limiting
- Polling with durable sleep: the Worker sleeps (zero CPU cost) between poll checks
This brought our success rate from roughly 50% to about 90%. The remaining failures are typically transient and resolve on the next weekly run.
What Drift Detection Looks Like
When the system detects drift, it stores a bilingual summary (English and Japanese) with structured change details. A staff member reviews the flag and either updates the implementation guide or dismisses it as a false positive.
Real drift we've caught so far:
- Microsoft deprecated per-user MFA in favor of Conditional Access policies
- UI paths in the Entra admin center were reorganized
- A grace period for legacy authentication blocking was removed
Each of these would have eventually caused confusion during a client engagement. Instead, we caught them within a week of the vendor change.
Cost
For our current 10 doc sources (5 M365 controls × 2 languages), the weekly cost rounds to zero. At scale (say 120 sources covering Microsoft 365, Google Workspace, and Cloudflare) we estimate about $0.50 per week. Gate 1 and Gate 2 eliminate most of the expensive AI calls.
| Component | Weekly Cost |
|---|---|
| RSS checks (Gate 1) | Free |
| Browser Rendering crawls | ~$0.30 |
| Claude AI analysis (Gate 3) | ~$0.15 |
| Worker compute + D1 storage | Free tier |
| Total (~120 sources) | ~$0.50 |
What's Next
The current system covers Microsoft 365 identity controls. The pipeline is vendor-agnostic. Adding a new doc source is just a database row with a URL and crawl configuration. Next steps:
- Expand coverage to remaining M365 controls plus Google Workspace and Cloudflare
- Build the review UI so staff can view diffs, approve updates, and track guide versions
- Vendor-specific adapters for Gate 1 (Google and Cloudflare have different changelog patterns than Microsoft's RSS)
- Automatic guide updates where the AI suggests specific text changes, not just flags
Key Takeaways
Vendor documentation drift is a real operational risk for any team maintaining compliance guides. Catching it doesn't require a complex system. A pipeline that filters aggressively at each stage keeps both costs and false alarms low. Cloudflare's Worker platform, with Workflows for durability and Browser Rendering for crawling, turned out to be a natural fit for this kind of periodic, asynchronous inspection work.
The code runs once a week, costs pocket change, and has already caught real drift that would have burned consultant time. That's a good trade.
Originally published at cogley.jp
Rick Cogley is CEO of eSolia Inc., providing bilingual IT outsourcing and infrastructure services in Tokyo, Japan.


Top comments (0)