DEV Community

Alex Spinov
Alex Spinov

Posted on

I Replaced 200 Lines of CSS Selectors With 3 Lines of AI — Here's the Code

Last month I was scraping product data from 15 different e-commerce sites.

Every site had different HTML. Every site broke my selectors every few weeks. I was spending more time maintaining scrapers than actually using the data.

Then I tried something stupid: I sent the raw HTML to Claude and said "extract the product name, price, and rating."

It worked. On every site. Without a single CSS selector.


The Old Way (200+ lines per site)

# site1.py
price = soup.find('span', class_='price-tag__amount').text.strip()
name = soup.find('h1', {'data-testid': 'product-title'}).text
rating = soup.find('div', class_='star-rating').get('aria-label')

# site2.py (completely different selectors)
price = soup.select_one('.pdp-price .sale-price').text
name = soup.select_one('#productName').text
rating = soup.select_one('.ratings-count span').text

# site3.py (different again)
# ... you get the idea
Enter fullscreen mode Exit fullscreen mode

15 sites × 15 selectors each = 225 CSS selectors that break whenever a site updates their HTML.

The New Way (3 lines)

import anthropic

def extract(html_text, prompt):
    client = anthropic.Anthropic()
    response = client.messages.create(
        model="claude-sonnet-4-6-20250514",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"Extract from this text. Return JSON only.\n"
                       f"Task: {prompt}\nText:\n{html_text[:5000]}"
        }]
    )
    return response.content[0].text

# Works on ANY site. No selectors.
data = extract(page_text, "Extract: product name, price, rating")
Enter fullscreen mode Exit fullscreen mode

That's it. Three lines of extraction logic that work on every site.

But What About Cost?

This is the first question everyone asks. Here's the math:

  • Claude Sonnet: ~$0.003 per page (3 cents per 10 pages)
  • 1,000 pages/day = $3/day = $90/month
  • Compare to: 20 hours/month maintaining selectors × $50/hr = $1,000/month

AI extraction is 10x cheaper than manual maintenance.

For low-volume scraping (<100 pages/day), cost is basically zero.

When NOT to Use This

Honesty time — LLM extraction isn't always the answer:

  • High volume (10K+ pages/day): Selectors are faster and cheaper at scale
  • Simple, stable sites: If the HTML never changes, selectors work fine
  • Structured APIs available: Always prefer an API over scraping
  • Real-time data: LLM adds ~1-2 seconds latency per page

The Hybrid Approach (What I Actually Use)

def smart_extract(url, prompt):
    # Try CSS selectors first (fast + free)
    result = try_selectors(url)
    if result and result.is_valid():
        return result

    # Fall back to LLM (slower but never breaks)
    return llm_extract(url, prompt)
Enter fullscreen mode Exit fullscreen mode

Selectors for speed. LLM as fallback. Best of both worlds.


Your Turn

I'm genuinely curious: are you still writing CSS selectors for scraping, or have you switched to AI extraction?

If you've tried LLM-based scraping, what was your experience? Better? Worse? Weird edge cases?

Drop your story in the comments — I'll share the most interesting ones in a follow-up post.


I open-sourced my extraction code: LLM Data Extraction on GitHub. Star it if you want to try the approach yourself.

More scraping tools: Awesome Web Scraping 2026 — 77+ free tools and APIs.

Top comments (0)