I Built an API That Turns Any Website Into JSON Using Just CSS Selectors

#webdev #api #scraping #showdev

I've written a lot of scrapers. The HTML parsing part is never the interesting part — and it's always the part that takes the longest. You know what data you want. You know where it lives on the page. Getting it out shouldn't require 40 lines of cheerio and a prayer.

So I built StructAPI. You send a URL and CSS selectors. You get JSON.

The pitch

curl -s -X POST https://structapi.duckdns.org/extract \
  -H "X-API-Key: $KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://news.ycombinator.com",
    "fields": [
      {"name": "title", "selector": ".titleline > a"},
      {"name": "link", "selector": ".titleline > a", "attr": "href"}
    ]
  }'

[
  {"title": "Show HN: A thing", "link": "https://thing.com"},
  {"title": "Why databases are weird", "link": "https://dbpost.com"}
]

That's it. Define fields. Get structured data. No HTML in between.

Why this exists

Every scraping API I found falls into two camps:

Camp 1 — The proxy layer (ScrapingBee, ScraperAPI, BrightData): They handle IP rotation, captcha solving, browser rendering — then dump raw HTML on you. The parsing is still your problem. You're paying for unblocking, not extraction.

Camp 2 — The black box (Diffbot): They auto-extract structured data with AI. Works great until it doesn't — and you can't tell it which fields you care about. If the AI picks wrong, that's that. Also: $299/month minimum.

StructAPI sits in a third camp: you define the schema, we return the data. No AI guessing. No raw HTML to parse. Just CSS selectors → JSON.