DEV Community

Sergio Morales
Sergio Morales

Posted on

I Built a $29/Month API That Turns Any Website Into Structured JSON (No AI Black Box)

I've written a lot of scrapers. The HTML parsing part is never the interesting part — it's the part that takes the longest. You know what data you want. You know where it lives on the page. Getting it out shouldn't require 40 lines of cheerio and a prayer.

So I built an API that takes CSS selectors and gives you JSON. That's it.

curl -s -X POST https://structapi.duckdns.org/extract   -H "X-API-Key: $KEY"   -H "Content-Type: application/json"   -d '{
    "url": "https://news.ycombinator.com",
    "fields": [
      {"name": "title", "selector": ".titleline > a"},
      {"name": "link", "selector": ".titleline > a", "attr": "href"}
    ]
  }'
Enter fullscreen mode Exit fullscreen mode

Returns:

{
  "success": true,
  "data": {
    "title": "Show HN: StructAPI — Turn websites into JSON",
    "link": "https://structapi.duckdns.org"
  }
}
Enter fullscreen mode Exit fullscreen mode

You define fields with CSS selectors. You get back an object matching your schema. No HTML in the middle.

Why I built this

Every scraping API I tried falls into one of two buckets:

  1. Proxy-first services (ScrapingBee, ScraperAPI, BrightData) — they do the unblocking, rotating IPs, captcha solving — and then dump raw HTML on you. You still have to parse it.

  2. AI extraction (Diffbot, $299/mo) — they extract structured data but you can't control the schema. The AI picks what it thinks is relevant. If it picks wrong, tough luck.

I wanted the middle ground: you control the extraction, I handle the HTTP. CSS selectors are the interface — they're precise, testable, and every developer already knows them.

What it does

/extract — You provide a URL and an array of field definitions (name + CSS selector). We fetch the page, run the selectors, return JSON. Single values, arrays, attribute extraction, nested selectors — all work.

/auto — Don't know the selectors? We auto-detect title, headings, links, images, and paragraphs from any URL. Good for quick looks, not for production.

/usage — Check your current month's request count and remaining quota.

What it costs

Tier Requests/mo Price
Free 100 $0
Starter 10,000 $29/mo
Pro 50,000 $99/mo
Scale 200,000 $299/mo

Free tier: no credit card needed. Run curl -X POST https://structapi.duckdns.org/keys -H "Content-Type: application/json" -d '{}' and you get a key back.

What it doesn't do (yet)

  • JS rendering (React, Vue SPAs) — static HTML extraction only for now
  • IP rotation / residential proxies — coming after first 5 paid customers
  • Captcha solving — not planning to support this
  • Screenshots or PDFs — text extraction only

How it's built

Node.js on Express with better-sqlite3 for usage tracking. Stripe for billing (checkout, webhooks, customer portal). Caddy reverse-proxies to provide HTTPS. Hosted on a $12/mo VPS.

Source code on GitHub: https://github.com/92SM/structapi

Try it

# Get a free key
curl -X POST https://structapi.duckdns.org/keys -H "Content-Type: application/json" -d '{}'

# Extract data
KEY="***"
curl -X POST https://structapi.duckdns.org/extract   -H "X-API-Key: $KEY"   -H "Content-Type: application/json"   -d '{"url":"https://example.com","fields":[{"name":"h1","selector":"h1"}]}'
Enter fullscreen mode Exit fullscreen mode

Docs: https://structapi.duckdns.org/docs

Top comments (0)