I've written a lot of scrapers. The HTML parsing part is never the interesting part — it's the part that takes the longest. You know what data you want. You know where it lives on the page. Getting it out shouldn't require 40 lines of cheerio and a prayer.
So I built an API that takes CSS selectors and gives you JSON. That's it.
curl -s -X POST https://structapi.duckdns.org/extract -H "X-API-Key: $KEY" -H "Content-Type: application/json" -d '{
"url": "https://news.ycombinator.com",
"fields": [
{"name": "title", "selector": ".titleline > a"},
{"name": "link", "selector": ".titleline > a", "attr": "href"}
]
}'
Returns:
{
"success": true,
"data": {
"title": "Show HN: StructAPI — Turn websites into JSON",
"link": "https://structapi.duckdns.org"
}
}
You define fields with CSS selectors. You get back an object matching your schema. No HTML in the middle.
Why I built this
Every scraping API I tried falls into one of two buckets:
Proxy-first services (ScrapingBee, ScraperAPI, BrightData) — they do the unblocking, rotating IPs, captcha solving — and then dump raw HTML on you. You still have to parse it.
AI extraction (Diffbot, $299/mo) — they extract structured data but you can't control the schema. The AI picks what it thinks is relevant. If it picks wrong, tough luck.
I wanted the middle ground: you control the extraction, I handle the HTTP. CSS selectors are the interface — they're precise, testable, and every developer already knows them.
What it does
/extract — You provide a URL and an array of field definitions (name + CSS selector). We fetch the page, run the selectors, return JSON. Single values, arrays, attribute extraction, nested selectors — all work.
/auto — Don't know the selectors? We auto-detect title, headings, links, images, and paragraphs from any URL. Good for quick looks, not for production.
/usage — Check your current month's request count and remaining quota.
What it costs
| Tier | Requests/mo | Price |
|---|---|---|
| Free | 100 | $0 |
| Starter | 10,000 | $29/mo |
| Pro | 50,000 | $99/mo |
| Scale | 200,000 | $299/mo |
Free tier: no credit card needed. Run curl -X POST https://structapi.duckdns.org/keys -H "Content-Type: application/json" -d '{}' and you get a key back.
What it doesn't do (yet)
- JS rendering (React, Vue SPAs) — static HTML extraction only for now
- IP rotation / residential proxies — coming after first 5 paid customers
- Captcha solving — not planning to support this
- Screenshots or PDFs — text extraction only
How it's built
Node.js on Express with better-sqlite3 for usage tracking. Stripe for billing (checkout, webhooks, customer portal). Caddy reverse-proxies to provide HTTPS. Hosted on a $12/mo VPS.
Source code on GitHub: https://github.com/92SM/structapi
Try it
# Get a free key
curl -X POST https://structapi.duckdns.org/keys -H "Content-Type: application/json" -d '{}'
# Extract data
KEY="***"
curl -X POST https://structapi.duckdns.org/extract -H "X-API-Key: $KEY" -H "Content-Type: application/json" -d '{"url":"https://example.com","fields":[{"name":"h1","selector":"h1"}]}'
Top comments (0)