I Replaced 200 Lines of CSS Selectors With 3 Lines of AI — Here's the Code

#python #webdev #tutorial #discuss

Last month I was scraping product data from 15 different e-commerce sites.

Every site had different HTML. Every site broke my selectors every few weeks. I was spending more time maintaining scrapers than actually using the data.

Then I tried something stupid: I sent the raw HTML to Claude and said "extract the product name, price, and rating."

It worked. On every site. Without a single CSS selector.

The Old Way (200+ lines per site)

# site1.py
price = soup.find('span', class_='price-tag__amount').text.strip()
name = soup.find('h1', {'data-testid': 'product-title'}).text
rating = soup.find('div', class_='star-rating').get('aria-label')

# site2.py (completely different selectors)
price = soup.select_one('.pdp-price .sale-price').text
name = soup.select_one('#productName').text
rating = soup.select_one('.ratings-count span').text

# site3.py (different again)
# ... you get the idea

15 sites × 15 selectors each = 225 CSS selectors that break whenever a site updates their HTML.

The New Way (3 lines)

import anthropic

def extract(html_text, prompt):
    client = anthropic.Anthropic()
    response = client.messages.create(
        model="claude-sonnet-4-6-20250514",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"Extract from this text. Return JSON only.\n"
                       f"Task: {prompt}\nText:\n{html_text[:5000]}"
        }]
    )
    return response.content[0].text

# Works on ANY site. No selectors.
data = extract(page_text, "Extract: product name, price, rating")

That's it. Three lines of extraction logic that work on every site.

But What About Cost?

This is the first question everyone asks. Here's the math:

Claude Sonnet: ~$0.003 per page (3 cents per 10 pages)
1,000 pages/day = $3/day = $90/month
Compare to: 20 hours/month maintaining selectors × $50/hr = $1,000/month

AI extraction is 10x cheaper than manual maintenance.

For low-volume scraping (<100 pages/day), cost is basically zero.

When NOT to Use This

Honesty time — LLM extraction isn't always the answer:

High volume (10K+ pages/day): Selectors are faster and cheaper at scale
Simple, stable sites: If the HTML never changes, selectors work fine
Structured APIs available: Always prefer an API over scraping
Real-time data: LLM adds ~1-2 seconds latency per page

The Hybrid Approach (What I Actually Use)

def smart_extract(url, prompt):
    # Try CSS selectors first (fast + free)
    result = try_selectors(url)
    if result and result.is_valid():
        return result

    # Fall back to LLM (slower but never breaks)
    return llm_extract(url, prompt)

Selectors for speed. LLM as fallback. Best of both worlds.

Your Turn

I'm genuinely curious: are you still writing CSS selectors for scraping, or have you switched to AI extraction?

If you've tried LLM-based scraping, what was your experience? Better? Worse? Weird edge cases?

Drop your story in the comments — I'll share the most interesting ones in a follow-up post.

I open-sourced my extraction code: LLM Data Extraction on GitHub. Star it if you want to try the approach yourself.

More scraping tools: Awesome Web Scraping 2026 — 77+ free tools and APIs.

Need web scraping or data extraction? I've built 77+ production scrapers. Email spinov001@gmail.com — quote in 2 hours. Or try my ready-made Apify actors — no code needed.