DEV Community

Paras Tejpal
Paras Tejpal

Posted on

I Built a Free API That Scrapes Any Website Using Plain English - No CSS Selectors

I've wasted days of my life maintaining CSS selectors.

You know the drill - you write the perfect scraper, it works great for a week, then the site does a frontend redesign, your selectors break, and you spend another afternoon hunting through the DOM again.

So I built Opticparse - a completely different approach.

How It Works

Instead of selectors, Opticparse:

  1. Opens a real Chromium browser (via Playwright)
  2. Navigates to your URL and waits for JavaScript to load
  3. Screenshots the page
  4. Sends the screenshot to a vision AI model
  5. Returns structured JSON based on your natural language query
curl -X POST https://opticparse.onrender.com/api/vision-scrape \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "target_url": "https://news.ycombinator.com",
    "extraction_query": "Extract all story titles and upvote counts as a JSON array"
  }'
Enter fullscreen mode Exit fullscreen mode

No selectors. No XPath. No DOM inspection. The AI figures out where everything is from the screenshot.

The AI Provider Rotation

The smartest part: if one AI provider rate-limits, the next one kicks in automatically.

Provider order:

  1. Groq - llama-3.2-11b-vision (fastest free inference, < 1s)
  2. GitHub Models - gpt-4o (free 150 req/day fallback)
  3. OpenRouter - gpt-4o (additional free credits)

Zero downtime, effectively unlimited free capacity.

The Stealth Mode

Cloudflare and other WAFs detect headless browsers by checking navigator.webdriver. I added a simple init script to neutralize this:

await context.add_init_script(
    "Object.defineProperty(navigator, 'webdriver', {get: () => undefined});"
)
Enter fullscreen mode Exit fullscreen mode

Combined with a real Chrome user agent, this bypasses most basic bot detection.

Try It Free

Available on RapidAPI with a free tier - no credit card needed.

GitHub (MIT): https://github.com/parastejpal987-cmyk/opticparse

What websites have you tried to scrape that kept breaking? Let me know in the comments!

Top comments (0)