Stop Scraping Pages by Hand — One API Call Returns Everything You Need

#api #automation #javascript #webdev

Stop Scraping Pages by Hand — One API Call Returns Everything You Need

I used to have this gross four-step process every time I needed to understand what a webpage was doing:

Screenshot it
curl the HTML and pipe it through a parser
Fire up Puppeteer to extract structured data
Manually look up the tech stack

Four round trips. Four scripts to maintain. Four things that break when a site updates its layout.

Then I added a single /v1/analyze endpoint to SnapAPI and collapsed all four steps into one.

Here's what a single call returns now:

{
  "page_type": "landing_page",
  "cta": "Start for free",
  "navigation": ["Docs", "Pricing", "Changelog", "Sign In"],
  "buttons": ["Start for free", "View docs", "See pricing"],
  "forms": [{ "action": "/signup", "fields": ["email"] }],
  "headings": {
    "h1": ["The Screenshot API that Developers Actually Use"],
    "h2": ["One line of code", "No Puppeteer", "Free tier included"]
  },
  "links": { "internal": 14, "external": 3, "total": 17 },
  "word_count": 847,
  "load_time_ms": 1243,
  "technologies": ["Cloudflare", "Google Analytics", "Stripe"],
  "screenshot": "<base64 PNG>"
}

One HTTP GET. One response. Everything about the page — structure, intent, and a visual.

Why This Matters

If you're building any of these things, you've probably felt the pain:

Competitive intelligence tools — you want to know if a competitor changed their CTA or added a new pricing tier
SEO auditing scripts — you need word counts, heading structure, and link counts at scale
AI agents — your agent needs to understand a page before acting on it, not just see a blob of HTML
Lead enrichment pipelines — you're building profiles of companies and need to know what tech stack they're running

The traditional approach is to either scrape HTML (and fight bot detection), run your own Puppeteer cluster (and babysit it), or stitch together 3–4 different APIs (expensive, fragile).

The analyze endpoint is a single call that does all of that in under 2 seconds.

The Code

Node.js

const res = await fetch(
  "https://snapapi.tech/v1/analyze?" + new URLSearchParams({
    url: "https://stripe.com",
    api_key: "YOUR_KEY",
    screenshot: "true"
  })
);

const data = await res.json();

console.log("Page type:", data.page_type);
console.log("Primary CTA:", data.cta);
console.log("Tech stack:", data.technologies);
console.log("Word count:", data.word_count);

Python

import requests

response = requests.get("https://snapapi.tech/v1/analyze", params={
    "url": "https://stripe.com",
    "api_key": "YOUR_KEY",
    "screenshot": "true"
})

data = response.json()
print(f"Page type: {data['page_type']}")
print(f"Primary CTA: {data['cta']}")
print(f"Technologies: {', '.join(data['technologies'])}")
print(f"Word count: {data['word_count']}")

curl

curl "https://snapapi.tech/v1/analyze?url=https://stripe.com&api_key=YOUR_KEY" | jq .

Real Use Case: Competitor Monitoring in 20 Lines

Here's a script that watches 10 competitor homepages and alerts you when their CTA or tech stack changes:

const fetch = require("node-fetch");
const fs = require("fs");

const API_KEY = process.env.SNAPAPI_KEY;
const COMPETITORS = [
  "https://stripe.com",
  "https://clerk.dev",
  "https://vercel.com",
  // ... add yours
];

const STATE_FILE = "./competitor-state.json";
const previous = fs.existsSync(STATE_FILE) 
  ? JSON.parse(fs.readFileSync(STATE_FILE)) 
  : {};

async function analyze(url) {
  const res = await fetch(
    `https://snapapi.tech/v1/analyze?url=${encodeURIComponent(url)}&api_key=${API_KEY}`
  );
  return res.json();
}

async function run() {
  const current = {};

  for (const url of COMPETITORS) {
    const data = await analyze(url);
    const key = url;
    current[key] = {
      cta: data.cta,
      technologies: data.technologies,
      page_type: data.page_type,
      word_count: data.word_count,
    };

    if (previous[key]) {
      const prev = previous[key];
      if (prev.cta !== data.cta) {
        console.log(`🔔 CTA changed on ${url}: "${prev.cta}" → "${data.cta}"`);
      }
      const addedTech = data.technologies.filter(t => !prev.technologies.includes(t));
      if (addedTech.length) {
        console.log(`🔔 New tech detected on ${url}: ${addedTech.join(", ")}`);
      }
    }
  }

  fs.writeFileSync(STATE_FILE, JSON.stringify(current, null, 2));
  console.log("✅ Done. Checked", COMPETITORS.length, "competitors.");
}

run().catch(console.error);

Run this on a cron and you'll know the moment a competitor A/B tests a new headline or switches payment providers.

Real Use Case: AI Agent Page Understanding

If you're building an AI agent that needs to browse the web, raw HTML is a terrible input — it's noisy, enormous, and the model spends tokens on nav bars and cookie banners.

The analyze endpoint solves this by pre-extracting the structure:

async function getPageContext(url) {
  const res = await fetch(
    `https://snapapi.tech/v1/analyze?url=${encodeURIComponent(url)}&api_key=${API_KEY}`
  );
  const data = await res.json();

  // Return a compact summary for the LLM
  return {
    summary: `This is a ${data.page_type}. Primary CTA: "${data.cta}". ` +
             `Main heading: "${data.headings?.h1?.[0]}". ` +
             `Word count: ${data.word_count}. ` +
             `Technologies: ${data.technologies?.join(", ")}.`,
    screenshot: data.screenshot, // base64 for vision models
  };
}

Feed that summary + screenshot to GPT-4o or Claude and you get much better responses than dumping raw HTML.

Real Use Case: SEO Audit at Scale

const pages = [
  "https://yoursite.com",
  "https://yoursite.com/pricing",
  "https://yoursite.com/blog",
  "https://yoursite.com/about",
];

const results = await Promise.all(
  pages.map(async url => {
    const res = await fetch(
      `https://snapapi.tech/v1/analyze?url=${encodeURIComponent(url)}&api_key=${API_KEY}`
    );
    const data = await res.json();
    return {
      url,
      h1_count: data.headings?.h1?.length ?? 0,
      word_count: data.word_count,
      has_form: data.forms.length > 0,
      internal_links: data.links?.internal ?? 0,
      cta: data.cta,
    };
  })
);

// Print a quick audit table
console.table(results);

Output:

┌─────────────────────────────────┬──────────┬────────────┬──────────┬────────────────┬──────────────────┐
│ url                             │ h1_count │ word_count │ has_form │ internal_links │ cta              │
├─────────────────────────────────┼──────────┼────────────┼──────────┼────────────────┼──────────────────┤
│ https://yoursite.com            │ 1        │ 847        │ true     │ 14             │ Start for free   │
│ https://yoursite.com/pricing    │ 1        │ 412        │ false    │ 8              │ Get started      │
│ https://yoursite.com/blog       │ 0        │ 2341       │ false    │ 23             │                  │
│ https://yoursite.com/about      │ 1        │ 631        │ false    │ 11             │ Contact us       │
└─────────────────────────────────┴──────────┴────────────┴──────────┴────────────────┴──────────────────┘

Missing H1 on the blog? No CTA on the about page? This surfaces in seconds.

Batch Mode

If you need to analyze 10+ pages, use the batch endpoint to parallelize everything:

const res = await fetch("https://snapapi.tech/v1/batch", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "X-API-Key": API_KEY
  },
  body: JSON.stringify({
    endpoint: "analyze",
    urls: [
      "https://stripe.com",
      "https://paddle.com",
      "https://lemon.squeezy.com",
    ]
  })
});

const results = await res.json();
// results is an array of analyze responses, one per URL

10 URLs → ~3–4 seconds → structured intelligence on all of them.

What's Returned (Full Schema)

Field	Type	Description
`page_type`	string	`landing_page`, `blog`, `pricing`, `docs`, `ecommerce`, etc.
`cta`	string	The primary call-to-action button text
`navigation`	string[]	Top nav link labels
`buttons`	string[]	All button text on the page
`forms`	object[]	Form `action`, `method`, and field names
`headings`	object	H1–H6 arrays
`links`	object	`internal`, `external`, `total` counts
`word_count`	number	Visible word count (not HTML)
`load_time_ms`	number	Time to interactive
`technologies`	string[]	Detected libraries, CDNs, analytics, payment providers
`screenshot`	string	Base64 PNG (when `screenshot=true`)

Free Tier

SnapAPI has a free tier — 100 calls/month, no card required. Grab a key at snapapi.tech and try it against any page you want.

If you're auditing a lot of pages or building something that runs continuously, the paid tiers start at $9/month.

Wrapping Up

The analyze endpoint is what happens when you stop making developers stitch together a scraper + a parser + a Puppeteer script + a Wappalyzer clone — and just collapse it all into a single API call.

One request. One response. Everything you need to understand a webpage programmatically.

Try the live demo →
Read the docs →