Stop Scraping Pages by Hand — One API Call Returns Everything You Need
I used to have this gross four-step process every time I needed to understand what a webpage was doing:
- Screenshot it
-
curlthe HTML and pipe it through a parser - Fire up Puppeteer to extract structured data
- Manually look up the tech stack
Four round trips. Four scripts to maintain. Four things that break when a site updates its layout.
Then I added a single /v1/analyze endpoint to SnapAPI and collapsed all four steps into one.
Here's what a single call returns now:
{
"page_type": "landing_page",
"cta": "Start for free",
"navigation": ["Docs", "Pricing", "Changelog", "Sign In"],
"buttons": ["Start for free", "View docs", "See pricing"],
"forms": [{ "action": "/signup", "fields": ["email"] }],
"headings": {
"h1": ["The Screenshot API that Developers Actually Use"],
"h2": ["One line of code", "No Puppeteer", "Free tier included"]
},
"links": { "internal": 14, "external": 3, "total": 17 },
"word_count": 847,
"load_time_ms": 1243,
"technologies": ["Cloudflare", "Google Analytics", "Stripe"],
"screenshot": "<base64 PNG>"
}
One HTTP GET. One response. Everything about the page — structure, intent, and a visual.
Why This Matters
If you're building any of these things, you've probably felt the pain:
- Competitive intelligence tools — you want to know if a competitor changed their CTA or added a new pricing tier
- SEO auditing scripts — you need word counts, heading structure, and link counts at scale
- AI agents — your agent needs to understand a page before acting on it, not just see a blob of HTML
- Lead enrichment pipelines — you're building profiles of companies and need to know what tech stack they're running
The traditional approach is to either scrape HTML (and fight bot detection), run your own Puppeteer cluster (and babysit it), or stitch together 3–4 different APIs (expensive, fragile).
The analyze endpoint is a single call that does all of that in under 2 seconds.
The Code
Node.js
const res = await fetch(
"https://snapapi.tech/v1/analyze?" + new URLSearchParams({
url: "https://stripe.com",
api_key: "YOUR_KEY",
screenshot: "true"
})
);
const data = await res.json();
console.log("Page type:", data.page_type);
console.log("Primary CTA:", data.cta);
console.log("Tech stack:", data.technologies);
console.log("Word count:", data.word_count);
Python
import requests
response = requests.get("https://snapapi.tech/v1/analyze", params={
"url": "https://stripe.com",
"api_key": "YOUR_KEY",
"screenshot": "true"
})
data = response.json()
print(f"Page type: {data['page_type']}")
print(f"Primary CTA: {data['cta']}")
print(f"Technologies: {', '.join(data['technologies'])}")
print(f"Word count: {data['word_count']}")
curl
curl "https://snapapi.tech/v1/analyze?url=https://stripe.com&api_key=YOUR_KEY" | jq .
Real Use Case: Competitor Monitoring in 20 Lines
Here's a script that watches 10 competitor homepages and alerts you when their CTA or tech stack changes:
const fetch = require("node-fetch");
const fs = require("fs");
const API_KEY = process.env.SNAPAPI_KEY;
const COMPETITORS = [
"https://stripe.com",
"https://clerk.dev",
"https://vercel.com",
// ... add yours
];
const STATE_FILE = "./competitor-state.json";
const previous = fs.existsSync(STATE_FILE)
? JSON.parse(fs.readFileSync(STATE_FILE))
: {};
async function analyze(url) {
const res = await fetch(
`https://snapapi.tech/v1/analyze?url=${encodeURIComponent(url)}&api_key=${API_KEY}`
);
return res.json();
}
async function run() {
const current = {};
for (const url of COMPETITORS) {
const data = await analyze(url);
const key = url;
current[key] = {
cta: data.cta,
technologies: data.technologies,
page_type: data.page_type,
word_count: data.word_count,
};
if (previous[key]) {
const prev = previous[key];
if (prev.cta !== data.cta) {
console.log(`🔔 CTA changed on ${url}: "${prev.cta}" → "${data.cta}"`);
}
const addedTech = data.technologies.filter(t => !prev.technologies.includes(t));
if (addedTech.length) {
console.log(`🔔 New tech detected on ${url}: ${addedTech.join(", ")}`);
}
}
}
fs.writeFileSync(STATE_FILE, JSON.stringify(current, null, 2));
console.log("✅ Done. Checked", COMPETITORS.length, "competitors.");
}
run().catch(console.error);
Run this on a cron and you'll know the moment a competitor A/B tests a new headline or switches payment providers.
Real Use Case: AI Agent Page Understanding
If you're building an AI agent that needs to browse the web, raw HTML is a terrible input — it's noisy, enormous, and the model spends tokens on nav bars and cookie banners.
The analyze endpoint solves this by pre-extracting the structure:
async function getPageContext(url) {
const res = await fetch(
`https://snapapi.tech/v1/analyze?url=${encodeURIComponent(url)}&api_key=${API_KEY}`
);
const data = await res.json();
// Return a compact summary for the LLM
return {
summary: `This is a ${data.page_type}. Primary CTA: "${data.cta}". ` +
`Main heading: "${data.headings?.h1?.[0]}". ` +
`Word count: ${data.word_count}. ` +
`Technologies: ${data.technologies?.join(", ")}.`,
screenshot: data.screenshot, // base64 for vision models
};
}
Feed that summary + screenshot to GPT-4o or Claude and you get much better responses than dumping raw HTML.
Real Use Case: SEO Audit at Scale
const pages = [
"https://yoursite.com",
"https://yoursite.com/pricing",
"https://yoursite.com/blog",
"https://yoursite.com/about",
];
const results = await Promise.all(
pages.map(async url => {
const res = await fetch(
`https://snapapi.tech/v1/analyze?url=${encodeURIComponent(url)}&api_key=${API_KEY}`
);
const data = await res.json();
return {
url,
h1_count: data.headings?.h1?.length ?? 0,
word_count: data.word_count,
has_form: data.forms.length > 0,
internal_links: data.links?.internal ?? 0,
cta: data.cta,
};
})
);
// Print a quick audit table
console.table(results);
Output:
┌─────────────────────────────────┬──────────┬────────────┬──────────┬────────────────┬──────────────────┐
│ url │ h1_count │ word_count │ has_form │ internal_links │ cta │
├─────────────────────────────────┼──────────┼────────────┼──────────┼────────────────┼──────────────────┤
│ https://yoursite.com │ 1 │ 847 │ true │ 14 │ Start for free │
│ https://yoursite.com/pricing │ 1 │ 412 │ false │ 8 │ Get started │
│ https://yoursite.com/blog │ 0 │ 2341 │ false │ 23 │ │
│ https://yoursite.com/about │ 1 │ 631 │ false │ 11 │ Contact us │
└─────────────────────────────────┴──────────┴────────────┴──────────┴────────────────┴──────────────────┘
Missing H1 on the blog? No CTA on the about page? This surfaces in seconds.
Batch Mode
If you need to analyze 10+ pages, use the batch endpoint to parallelize everything:
const res = await fetch("https://snapapi.tech/v1/batch", {
method: "POST",
headers: {
"Content-Type": "application/json",
"X-API-Key": API_KEY
},
body: JSON.stringify({
endpoint: "analyze",
urls: [
"https://stripe.com",
"https://paddle.com",
"https://lemon.squeezy.com",
]
})
});
const results = await res.json();
// results is an array of analyze responses, one per URL
10 URLs → ~3–4 seconds → structured intelligence on all of them.
What's Returned (Full Schema)
| Field | Type | Description |
|---|---|---|
page_type |
string |
landing_page, blog, pricing, docs, ecommerce, etc. |
cta |
string | The primary call-to-action button text |
navigation |
string[] | Top nav link labels |
buttons |
string[] | All button text on the page |
forms |
object[] | Form action, method, and field names |
headings |
object | H1–H6 arrays |
links |
object |
internal, external, total counts |
word_count |
number | Visible word count (not HTML) |
load_time_ms |
number | Time to interactive |
technologies |
string[] | Detected libraries, CDNs, analytics, payment providers |
screenshot |
string | Base64 PNG (when screenshot=true) |
Free Tier
SnapAPI has a free tier — 100 calls/month, no card required. Grab a key at snapapi.tech and try it against any page you want.
If you're auditing a lot of pages or building something that runs continuously, the paid tiers start at $9/month.
Wrapping Up
The analyze endpoint is what happens when you stop making developers stitch together a scraper + a parser + a Puppeteer script + a Wappalyzer clone — and just collapse it all into a single API call.
One request. One response. Everything you need to understand a webpage programmatically.
Top comments (0)