DEV Community

Cover image for I stopped letting my agent browse 50 sites and the monitoring got way more reliable
Lars Winstand
Lars Winstand

Posted on • Originally published at standardcompute.com

I stopped letting my agent browse 50 sites and the monitoring got way more reliable

I got into this problem the same way a lot of people do: by building the wrong thing first.

The wrong thing was: "What if I just let an agent browse everything?"

On paper, it sounds great. Point OpenClaw at 50 docs sites, blogs, and changelogs. Let GPT-5 or Claude watch for updates. Done.

In practice, it turns into a very expensive way to rediscover the same navigation menus over and over.

Sites time out. Selectors break. JavaScript changes. Login sessions expire. Your browser agent spends half its day proving that /docs still exists.

The setup looks smart in a diagram and fragile everywhere else.

The pattern that actually worked was much simpler:

  • poll XML sitemaps
  • poll RSS feeds
  • poll known changelog pages
  • dedupe unseen URLs into a queue
  • run an LLM only on new items
  • use browser automation only as a fallback

That one change made monitoring way more reliable.

The rule: discovery first, browsing second

Most monitoring pipelines start at the most expensive layer.

They start with Chromium, DOM selectors, and agent loops.

That is backwards.

If your job is to detect change across 50 sites, the first question is not:

Can an agent navigate this site?

It is:

Does this site already publish structured signals that tell me what changed?

Usually the answer is yes.

For docs sites, blogs, and release notes, the structured signals are often:

  • sitemap.xml
  • RSS/Atom feeds
  • changelog pages

If those exist, your monitor should start there.

XML sitemaps are boring, which is why they work

A lot of docs sites already expose exactly what you need.

Typical sitemap entry:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/docs/new-page</loc>
    <lastmod>2025-06-01</lastmod>
  </url>
</urlset>
Enter fullscreen mode Exit fullscreen mode

That gives you:

  • the canonical URL
  • sometimes a lastmod timestamp
  • a much cheaper way to detect changes than full crawling

So instead of asking an agent to wander around a docs site every 6 hours, you fetch one XML file and compare entries.

That is the whole trick.

Not fancy. Very effective.

RSS is old internet plumbing. Good.

RSS still rules for blogs, release notes, and changelogs.

Why?

Because the metadata is stable.

You usually get fields like:

  • guid
  • link
  • title
  • pubDate
  • description

That means less guessing, less parsing nonsense, and fewer brittle selectors.

If a site gives you RSS, take the win.

If it gives you a sitemap too, even better.

If it gives you both and you still launch a browser for every poll, you're paying compute to ignore free structure.

The architecture I keep coming back to

This is the setup I would use again tomorrow.

  1. Schedule polling.
  2. Fetch RSS feeds, sitemaps, and changelog pages.
  3. Normalize everything into a common shape.
  4. Deduplicate by URL or GUID.
  5. Queue unseen items.
  6. Run an LLM only on new items.
  7. Use browser automation only for edge cases.

In code, the normalized record usually looks something like this:

{
  "source": "rss",
  "site": "example.com",
  "url": "https://example.com/blog/new-release",
  "guid": "release-2-4-1",
  "title": "Release 2.4.1",
  "timestamp": "2025-06-01T10:30:00Z"
}
Enter fullscreen mode Exit fullscreen mode

And the dedupe logic can be painfully simple:

def dedupe_key(item):
    return item.get("guid") or item["url"]

seen = set()
queue = []

for item in items:
    key = dedupe_key(item)
    if key in seen:
        continue
    seen.add(key)
    queue.append(item)
Enter fullscreen mode Exit fullscreen mode

You do not need a heroic architecture here.

You need a queue and memory.

That can be:

  • PostgreSQL
  • SQLite
  • Redis
  • DuckDB
  • even a tiny Python service if your volume is low

A practical n8n version

If you're building this in n8n, the workflow is straightforward.

Schedule Trigger
  -> RSS Read
  -> HTTP Request (sitemap.xml or changelog URL)
  -> Code node / Function node to normalize items
  -> Data store or DB lookup for dedupe
  -> Loop Over Items
  -> LLM node for classification + summary
  -> Slack / email / webhook / database
Enter fullscreen mode Exit fullscreen mode

A minimal polling sketch:

# n8n building blocks
1) Schedule Trigger -> every 30 minutes
2) RSS Read -> fetch feed items
3) HTTP Request -> GET https://site.com/sitemap.xml
4) Code -> parse + normalize entries
5) Database -> upsert URL/GUID
6) If -> process only unseen items
7) LLM -> summarize/classify
8) Send -> Slack, Discord, email, webhook
Enter fullscreen mode Exit fullscreen mode

The important part is not the LLM node.

The important part is that unseen URLs go through once.

Without that, your pipeline gets weird fast:

  • duplicate summaries
  • duplicate alerts
  • retries reprocessing old items
  • one flaky site stalling everything else

When browser automation actually makes sense

I am not anti-agent.

I am anti-agent-for-the-easy-90%.

There are real cases where browser automation is the right move:

  • client-rendered changelog pages
  • authenticated docs
  • stale or missing sitemaps
  • no RSS feed
  • weird navigation that hides updated content

That is where OpenClaw, Playwright, or browser-based Apify actors earn their keep.

But that should be the fallback path, not the default path.

A clean split looks like this:

Method What it's good at
XML sitemap polling Best for docs and blog URL discovery with low request volume
RSS feed polling Best for blogs, release notes, and changelogs with stable metadata
Browser/agent crawling Best for dynamic, authenticated, or broken edge cases

That split is what makes the system survive contact with reality.

Why the LLM output got better too

This was the part I did not expect.

Once I stopped asking GPT-5 or Claude to browse everything and only handed them new, deduped URLs, the output improved.

Summaries got tighter.

Classification got more consistent.

Noise dropped.

Of course it did.

I stopped using the model as:

  • a crawler
  • a scheduler
  • a diff engine
  • a parser
  • a summarizer

and gave it one job.

That is usually where LLM systems get better: not by adding more autonomy, but by reducing how much junk reaches the model.

Cost matters more than people admit

This is also where the pricing model starts to matter.

If your monitoring stack depends on browser sessions and repeated LLM calls for every poll, you feel every bad architectural choice in your bill.

That is why this pattern pairs well with Standard Compute.

If you're running agents or automations all day in n8n, Make, Zapier, OpenClaw, or custom workflows, flat-rate API access changes the way you build. You stop optimizing around token anxiety and start optimizing around reliability.

That is a much healthier constraint.

Standard Compute is a drop-in OpenAI-compatible API that gives you unlimited AI compute for a predictable monthly price, with dynamic routing across GPT-5.4, Claude Opus 4.6, and Grok 4.20.

That makes a big difference for pipelines like this, where you want to summarize and classify lots of changes without constantly thinking, "Should I skip this call to save money?"

The architecture still matters most.

But predictable cost lets you build the sane version instead of the anxious version.

A tiny Python example

Here is the kind of flow I mean.

import feedparser
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin

RSS_FEEDS = [
    "https://example.com/feed.xml",
]

SITEMAPS = [
    "https://example.com/sitemap.xml",
]

def fetch_rss_items(url):
    feed = feedparser.parse(url)
    items = []
    for entry in feed.entries:
        items.append({
            "source": "rss",
            "url": entry.get("link"),
            "guid": entry.get("id") or entry.get("guid"),
            "title": entry.get("title"),
            "timestamp": entry.get("published"),
        })
    return items

def fetch_sitemap_urls(url):
    xml = requests.get(url, timeout=20).text
    soup = BeautifulSoup(xml, "xml")
    items = []
    for node in soup.find_all("url"):
        loc = node.find("loc")
        lastmod = node.find("lastmod")
        if not loc:
            continue
        items.append({
            "source": "sitemap",
            "url": loc.text.strip(),
            "guid": None,
            "title": None,
            "timestamp": lastmod.text.strip() if lastmod else None,
        })
    return items

def dedupe_key(item):
    return item.get("guid") or item["url"]

seen = set()
queue = []

for feed in RSS_FEEDS:
    for item in fetch_rss_items(feed):
        key = dedupe_key(item)
        if key not in seen:
            seen.add(key)
            queue.append(item)

for sitemap in SITEMAPS:
    for item in fetch_sitemap_urls(sitemap):
        key = dedupe_key(item)
        if key not in seen:
            seen.add(key)
            queue.append(item)

print(f"Queued {len(queue)} unseen items")
Enter fullscreen mode Exit fullscreen mode

That is not production-ready.

But it shows the actual shape of the solution.

The monitor's job is to discover new stuff cheaply.

The model's job is to interpret it.

Debugging checklist before you blame the agent

If your monitoring setup is flaky, check the plumbing first.

  • Are sitemap URLs still valid?
  • Are RSS feeds returning stable GUIDs?
  • Are you deduping by URL or GUID consistently?
  • Are retries re-enqueuing old items?
  • Are browser fallbacks isolated per site?
  • Are you polling too often?
  • Are you storing last-seen timestamps correctly?

If OpenClaw is in the loop, I would also sanity check the basics:

openclaw status
openclaw status --all
openclaw health --json
Enter fullscreen mode Exit fullscreen mode

A lot of "agent problems" are really monitoring problems.

The pattern I would recommend for 50 sites

If you forced me to pick one design for monitoring docs, changelogs, and blog posts across 50 sites, I would pick this every time:

  • sitemap polling first
  • RSS polling second
  • changelog page checks where needed
  • a dedupe queue in the middle
  • LLM classification and summaries only for unseen items
  • browser automation reserved for exceptions

That pattern is less impressive in a demo.

It is much better in production.

And if you're running this kind of workflow at scale, the combination that makes the most sense to me is:

  • structured discovery for reliability
  • queues for sanity
  • LLMs for interpretation
  • flat-rate compute so you can actually let the automation run

That last part is the underrated one.

Per-token pricing pushes people toward weird compromises. They under-process, under-summarize, or avoid useful checks because every extra step feels billable.

Flat-rate compute changes the posture. You can let the queue drain. You can summarize everything new. You can run agents 24/7 without babysitting a token meter.

That is the whole appeal of Standard Compute for automation-heavy teams: OpenAI-compatible API, predictable monthly cost, and enough headroom to build systems that are actually useful instead of systems designed around billing fear.

The main lesson, though, is simpler than that:

The clever answer was not "let the agent browse everything."

The clever answer was to stop being impressed by that idea.

Top comments (0)