I've written a lot of scrapers. The HTML parsing part is never the interesting part — and it's always the part that takes the longest. You know what data you want. You know where it lives on the page. Getting it out shouldn't require 40 lines of cheerio and a prayer.
So I built StructAPI. You send a URL and CSS selectors. You get JSON.
The pitch
curl -s -X POST https://structapi.duckdns.org/extract \
-H "X-API-Key: $KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://news.ycombinator.com",
"fields": [
{"name": "title", "selector": ".titleline > a"},
{"name": "link", "selector": ".titleline > a", "attr": "href"}
]
}'
[
{"title": "Show HN: A thing", "link": "https://thing.com"},
{"title": "Why databases are weird", "link": "https://dbpost.com"}
]
That's it. Define fields. Get structured data. No HTML in between.
Why this exists
Every scraping API I found falls into two camps:
Camp 1 — The proxy layer (ScrapingBee, ScraperAPI, BrightData): They handle IP rotation, captcha solving, browser rendering — then dump raw HTML on you. The parsing is still your problem. You're paying for unblocking, not extraction.
Camp 2 — The black box (Diffbot): They auto-extract structured data with AI. Works great until it doesn't — and you can't tell it which fields you care about. If the AI picks wrong, that's that. Also: $299/month minimum.
StructAPI sits in a third camp: you define the schema, we return the data. No AI guessing. No raw HTML to parse. Just CSS selectors → JSON.
Pricing
| Tier | Requests | Price |
|---|---|---|
| Free | 100/mo | $0 |
| Starter | 10,000/mo | $29/mo |
| Pro | 50,000/mo | $99/mo |
| Scale | 200,000/mo | $299/mo |
No credit card for the free tier. Proxy rotation and JS rendering come with paid plans (launching after first paying customers).
What's next
The API is live now. If you're tired of writing HTML parsers for every project, give it a shot.
Sign up: https://structapi.duckdns.org/keys
Docs: https://structapi.duckdns.org/docs
GitHub: https://github.com/92SM/structapi
Top comments (0)