Boehner

Posted on Mar 16

How I give my AI agents eyes with a single API call

#javascript #automation #ai #webdev

My AI agent was blind.

It could read text, write code, call APIs — but the moment I asked it to work with a webpage, it hit a wall. "Go check if this landing page looks broken." "Tell me what the pricing page says now." "Monitor this competitor's homepage for changes." All blocked.

The obvious fix: give it a browser. The actual experience: install Puppeteer, debug the Chrome binary path, hit memory limits in Lambda, watch it break on every third-party CDN that detects headless browsers. An afternoon of yak-shaving every time.

I built SnapAPI to fix this.

What SnapAPI is

A REST API that wraps a headless browser. You send a URL, you get back a screenshot, PDF, or structured page data. No Puppeteer, no containers, no Chrome binary management.

Three lines of Python vs. a weekend of DevOps:

import requests

resp = requests.get(
    "https://snapapi.tech/v1/analyze",
    params={"url": "https://example.com"},
    headers={"X-API-Key": "YOUR_KEY"}
)
data = resp.json()
print(data["title"])       # "Example Domain"
print(data["text_summary"]) # "This domain is for use in illustrative examples..."

The three calls I use constantly

1. Analyze — structured page intelligence

This is the workhorse for AI pipelines. Instead of dumping raw HTML into a context window, I use /v1/analyze to get a clean, token-efficient JSON summary:

curl "https://snapapi.tech/v1/analyze?url=https://news.ycombinator.com" \
  -H "X-API-Key: YOUR_KEY"

Response:

{
  "url": "https://news.ycombinator.com",
  "title": "Hacker News",
  "description": "Links to stuff",
  "headings": [
    { "level": 1, "text": "Hacker News" }
  ],
  "links": [
    { "text": "new", "href": "https://news.ycombinator.com/newest" },
    { "text": "past", "href": "https://news.ycombinator.com/front" },
    { "text": "comments", "href": "https://news.ycombinator.com/newcomments" }
  ],
  "text_summary": "Ask HN: What are you working on? | 312 comments\nShow HN: ...",
  "load_time_ms": 847
}

Feed that text_summary to GPT-4 instead of the raw HTML. You go from 150k tokens of angle brackets to 2k tokens of actual content.

2. Screenshot — visual verification

AI agents that can see screenshots can verify things that text parsing misses: broken layouts, missing images, visual regressions, forms that didn't render.

curl "https://snapapi.tech/v1/screenshot?url=https://snapapi.tech&width=1280&height=800&format=png" \
  -H "X-API-Key: YOUR_KEY" \
  --output page.png

Parameters worth knowing:

full_page=true — captures the entire scrollable page, not just the viewport
dark_mode=true — renders in dark mode (useful for testing)
block_ads=true — blocks ad scripts before capture
wait_for_selector=.main-content — waits for a specific element before shooting
delay=1000 — waits N milliseconds after load (for JS-heavy SPAs)

I pipe screenshots directly to GPT-4V: "Does this page look broken? What changed since yesterday?"

3. Batch — process multiple URLs in one call

When you're monitoring a competitor's entire pricing page, checking 50 product pages for freshness, or building a dataset:

curl -X POST "https://snapapi.tech/v1/batch" \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://competitor-a.com/pricing",
      "https://competitor-b.com/pricing",
      "https://competitor-c.com/pricing"
    ],
    "endpoint": "analyze",
    "params": {}
  }'

Response:

{
  "total": 3,
  "succeeded": 3,
  "failed": 0,
  "duration_ms": 2841,
  "results": [
    { "url": "https://competitor-a.com/pricing", "title": "Pricing — CompA", "text_summary": "..." },
    { "url": "https://competitor-b.com/pricing", "title": "Plans — CompB", "text_summary": "..." },
    { "url": "https://competitor-c.com/pricing", "title": "Pricing — CompC", "text_summary": "..." }
  ]
}

One API call. Three pages. No rate limit juggling, no thread management.

Real use cases from my own pipelines

AI research assistant: Agent gets asked "what does Company X's product page say?" — calls /v1/analyze, feeds structured JSON to the LLM instead of raw HTML. Works reliably even on JS-heavy SPAs.

Automated visual regression: Cron job calls /v1/screenshot on a set of pages after every deploy. Screenshots stored in S3. If diff score exceeds threshold, Slack alert fires. Cost: ~$0.001/screenshot.

Competitive monitoring: Weekly job batches competitor pricing and feature pages. LLM diffs the extracted text against last week's version. Email alert on any meaningful change.

OG image generation: /v1/render takes raw HTML and returns a screenshot. Feed it a styled HTML template, get back a 1200×630 social share image. No canvas, no serverless Chrome, no font loading headaches.

Other endpoints

/v1/pdf — generates a PDF from any URL. Useful for reports, invoices, archival. Supports custom margins, landscape mode, background printing.

/v1/metadata — lightweight, fast metadata pull (title, og:image, canonical, favicon) without a full render. Use when you just need basic page info without executing JavaScript.

Getting started

Free tier requires no credit card. API key in your dashboard within 30 seconds.

# 1. Sign up at https://snapapi.tech
# 2. Copy your API key from the dashboard
# 3. Try it:
curl "https://snapapi.tech/v1/analyze?url=https://example.com" \
  -H "X-API-Key: YOUR_KEY"

Full docs at snapapi.tech/docs.

If your AI pipeline currently handles URLs by fetching raw HTML and dumping it into the context window — this is a direct upgrade. Structured output, real rendering, lower token cost, higher reliability.

Top comments (6)

Cyber Safety Zone • Mar 18

This is a powerful example of how “vision” is becoming the missing layer in AI agents. Instead of relying on DOM or assumptions, giving agents real visual context changes everything—from UI understanding to automation reliability. As more workflows shift toward perception-first design, APIs like this feel less like a feature and more like a necessity.

Apex Stack • Mar 20

The batch endpoint for monitoring multiple pages is where this really shines for me. I run AI agents that audit a financial data site across 8,000+ stock pages in 12 languages, and the biggest pain point has always been verifying that programmatically generated pages actually render correctly — not just that the data is there, but that layouts aren't broken, charts loaded, and content didn't get garbled during translation. Right now I'm doing that through headless browser checks, and the yak-shaving you described is painfully accurate. The structured text_summary approach is especially interesting for SEO monitoring — instead of scraping competitor pages and parsing messy HTML, you could feed clean summaries into an LLM to detect meaningful changes in pricing or positioning. Have you seen much adoption for that competitive monitoring use case specifically?

Boehner • Mar 20

What’s been more interesting though is the second part you mentioned; using structured summaries instead of raw HTML. Once you stop thinking in terms of DOM parsing and start thinking in terms of “what actually changed”, it becomes way more useful.
We’ve been experimenting with piping those summaries into an LLM to generate plain-English diffs like: “pricing dropped, messaging shifted toward X, new feature emphasis on Y”
That turns it from monitoring into something closer to competitive intelligence.
Still early, but the signal is way higher than traditional scraping. Curious how you’re handling prioritization across 8k pages are you diffing everything or scoring for meaningful changes first?

Boehner • Mar 20

Starting working on this, should be ready tomorrow.

klement Gunndu • Mar 19

The token reduction from raw HTML to structured text_summary is the real win here — we hit the same wall feeding page scrapes into agent context and ended up building something similar. Curious how it handles SPAs where the initial DOM is basically empty until JS runs.

Boehner • Mar 19

I've not seen an issue, surely it exists somewhere. Nothing is a 100% problem proof. Curious what you built on your end?