Phil Rentier Digital

Posted on Jun 15 • Originally published at rentierdigital.xyz

Web Scraping Is Dead. Vibe Scraping Just Replaced It

#ai #programming #aitools #claude

I had a Python script for scraping Amazon. 280 lines. 3 libraries. A proxy rotation I'd configured by hand, a VPS running 24/7 to keep it alive, and a cron job that emailed me whenever it crashed (which was often enough that I'd stopped reading the alerts).

Whenever Amazon changed its HTML structure, I lost a full day rebuilding selectors I'd already written once, chasing a page that didn't know I existed.

TLDR: 6 weeks ago I connected 1 MCP server to Claude Code and stopped writing Python scripts for web data entirely. This article is about what became possible after that, and about who just inherited the kind of market intelligence that enterprise data teams used to protect behind $80K/year contracts.

6 weeks ago I added BrightData to Claude Code, described what I wanted in plain English, and structured data came back. A different category of thing, not a faster version of the old one.

The Old Way Was a Dev Tax

Web scraping had a real cost, and it wasn't the data.

You needed a scraping library: BeautifulSoup, Playwright, Puppeteer, take your pick. You needed a proxy rotation service, because most sites start blocking after a few dozen requests from the same IP. You needed to handle CAPTCHAs, which meant either a third-party solving service or bypass logic that broke every 6 weeks.

You needed a VPS or cloud function to run it continuously. And you needed to maintain all of it every time a target site changed its structure, which large e-commerce sites do constantly, without notice, without caring that your pipeline depended on them.

Every Amazon HTML update felt like a patch note that silently nerfed your main build. You didn't know until prod broke.

I documented the Python WAF bypass playbook back in 2024. It was a real problem worth solving. The code worked. It also took 3 days to write and half a day every month to maintain.

That's the dev tax. Every hour maintaining a scraper is an hour not building what the data was supposed to inform. The information was always there, publicly. The cost was the access layer, not the data itself.

For vibe-coders, the whole stack was a wall. You can't vibe-code your way through proxy rotation and CAPTCHA logic. That combination of complexity was what kept web data extraction as a skill for a specific type of builder, and kept everyone else out.

The Python scraper era just hit its 'You Died' screen.

What "Vibe Scraping" Actually Means

The term didn't come from a marketing team.

In November 2025, a channel with 2,130 subscribers posted a video titled "VIBE WEB SCRAPING is VIBE CODING for scraping data from many websites using AI prompts." It pulled 363,000 views. Outlier score of 145.9x the channel's average.

The market named this before the articles existed.

Vibe coding gave builders the power to create apps without writing infrastructure. Vibe scraping does the same thing for data access. You describe what you want to extract. The AI orchestrates the calls. The infrastructure layer disappears from your workflow. Proxy config, HTML selectors, CAPTCHA logic: BrightData owns all of it.

The old stack had a filter built in: developers who could write and maintain the full access layer. Remove that filter and the set of people who can use web data as a competitive input goes from "devs and well-funded data teams" to "anyone with Claude Code and a clear intent." Different game entirely.

1 Line of Config. Just Ask.

The install takes less than a minute.

brightdata add mcp

1 CLI command. The BrightData CLI (updated June 11, 2026) integrates directly into Claude Code, Cursor, and Codex with zero manual configuration required. Restart Claude Code. You can now ask it to scrape anything.

BrightData handles the rest: anti-bot evasion, CAPTCHA solving, proxy rotation across millions of IPs, and structured extraction across 40+ platforms including Amazon, LinkedIn, Instagram, TikTok, YouTube, Google Maps, Walmart, eBay, and Etsy.

From your side: describe what you want in plain English. Claude picks the right tool, makes the calls, returns structured data.

The free tier covers 5,000 requests per month. That's enough to run every use case in this article at least once and decide if this belongs in your workflow. Start with the free tier here.

1 thing worth saying: I've written about why CLIs outperform MCPs for AI agents and I still think that argument holds in most cases. BrightData is 1 genuine exception. The MCP here isn't a convenience wrapper. It gives Claude structured access to 40+ extraction presets and real-time CAPTCHA handling that would take weeks to replicate with a CLI approach. The abstraction earns its place.

6 Things I Built. 1 Pattern.

The Vibe Scraper Playbook: Six Web Intelligence Use Cases

These 6 use cases aren't a menu. They're connected by a thread: each one represents a type of intelligence that large companies used to pay teams to produce, now accessible to a solo builder in an afternoon.

Competitor content intelligence. My competitors post on LinkedIn, YouTube, and Twitter. Their posting cadence tells you what's resonating. Their video transcripts tell you their messaging. I have Claude Code scraping all of that daily, summarizing what's new, and dropping a digest in Slack. (Karen from Accounting asked why I always seem to know what the competition is up to before the weekly strategy meeting. I told her I just pay attention. This was not the whole truth.)

Kevin Badi at AI Operations documented a similar setup: monitor Twitter, TikTok, Instagram, YouTube, and LinkedIn, transcribe the videos, summarize, deliver by email or Slack. "Smaller AI agencies can now compete with and outperform larger enterprise companies," he noted. The math checks out.

CRM lead enrichment. A CSV of prospects goes in: names, companies, job titles. Claude Code adds emails, phone numbers, LinkedIn profiles, and recent activity signals, automatically, at scale. Outbound that used to require a dedicated data team now runs in a single Claude session.

Price tracking. BrightData has structured extractors for Amazon, Walmart, eBay, and Etsy. I describe the products I want to monitor and the alert condition. Claude sets up the extraction. When a competitor adjusts pricing on a category I care about, I know before the end of the day, without having opened a single product page manually.

(Quick digression unrelated to scraping: I spent 15 minutes this week checking whether my pool pump control panel generates anything scrapeable. It doesn't. The local admin page requires auth, there's no API, and the manufacturer never imagined someone would want to feed pump telemetry into Claude. I checked anyway. This is what happens when you get a tool that can do things: you immediately try to apply it to everything, including things with no business case.)

LLM brand monitoring. What does ChatGPT recommend when someone asks about your product category? What does Perplexity surface when your target customer searches for competitors? BrightData can extract those outputs in real time. The discipline is called Generative Engine Optimization (GEO) and it's roughly 18 months old. Nobody has solid monitoring tools for it yet.

I'll be honest: I'm not entirely sure how this evolves once the major LLMs change how they surface brands in generated responses. Worth watching closely, worth not betting the whole roadmap on.

Hiring signal analysis. Job postings are the best free strategic intelligence on the open web. A competitor opening a VP Sales role just closed funding. One posting 10 data engineering positions is pivoting hard on AI infrastructure. One closing all customer success roles is either automating support or about to have a rough quarter.

BrightData extracts structured job posting data continuously. Claude reads the signals. What a competitive intelligence team takes weeks to compile, this setup surfaces in a morning.

Review mining. Every competitor in my market has hundreds of Amazon reviews, Trustpilot entries, and Google Maps ratings. In those reviews is the exact language customers use to describe what frustrates them, what they wish was different, what made them switch. That language belongs in my positioning, my landing page copy, my onboarding scripts. Claude extracts all reviews for a target, clusters recurring complaints by theme, and produces a positioning brief. 3 weeks of work for a marketing team. 20 minutes here.

The pattern is always the same. The information was already public. The bottleneck was always access.

What It Can't Do (Yet)

Public data only. BrightData gives you access to the open web: product pages, social profiles, job listings, reviews, pricing data. Anything behind a login is out of scope. If you need data from authenticated sessions or private APIs, this doesn't help.

The free tier runs out faster than you'd expect. 5,000 requests per month sounds generous until you're running competitor monitoring across 10 profiles, 3 times a day, across 5 platforms. The math gets tight fast. Paid plans scale with volume, the pricing is reasonable for what it delivers, but factor it into your cost model before you build a workflow that depends on it.

The prompt quality ceiling is real. Vague request, vague output. The LLM equivalent of undefined is not a function. "Scrape my competitor's posts" produces worse results than "extract the last 30 posts from this LinkedIn company page, include full post text, engagement count, and posting date, return as structured JSON." The infrastructure problem goes away. The thinking problem stays.

They Paid $80K for This Data

Enterprise proxy contracts for this kind of web access used to run $10,000 to $80,000 per year depending on volume and platform coverage. That's before staffing the team to use the data, build the pipelines, and maintain the extraction layer when sites changed.

The moat wasn't proprietary information. The public web was always public. The moat was the cost and complexity of access, which reserved serious data operations for companies with serious budgets.

That moat just changed hands.

What changed isn't the data sitting on those pages. Every price on Amazon, every job posting on LinkedIn, every review on Trustpilot was accessible yesterday and it's accessible today. What changed is who can read it at scale, without a team, without a six-figure contract, without writing a single line of Python.

I keep thinking about what this means for the solo builder going from a working demo to something they can actually ship: not the 20-engineer company with a data team already, but the person who just got a product to work and needs real market intelligence before betting on a pricing strategy or a positioning. They now have access to the same competitive data that funded startups were using to make those calls. The informational playing field just leveled, in real time. 🎯

If you're in that gap between working demo and shipped product, Vibe Coding, For Real covers the method I use to make that jump. The data access layer we've built here slots directly into the competitive research stage.

The web was always public. What changed is who can actually read it.

Sources

RTILA channel, YouTube, November 2025: "VIBE WEB SCRAPING is VIBE CODING for scraping data from many websites using AI prompts" (363,000 views, outlier score 145.9x vs. 2,130-subscriber channel average)
Kevin Badi, AI Operations: Claude + BrightData MCP documentation (Competitive Intel Agent, CRM Lead Enrichment use cases)
BrightData official MCP documentation: free tier 5,000 req/month, anti-bot infrastructure, structured extraction presets
BrightData Skills README, GitHub brightdata/skills: platform coverage (Amazon, LinkedIn, Instagram, TikTok, YouTube, Google Maps, Walmart, eBay, Etsy, Home Depot)
BrightData CLI, GitHub (updated June 11, 2026): brightdata add mcp Claude Code integration

This post may contain affiliate links. If you click them, I might earn a small commission (costs you nothing, and helps me keep shipping quality articles every day for your reading pleasure.)