DEV Community

Cover image for How I Built a "Vision-Based" Web Scraper in n8n (No CSS Selectors Needed)
Salim MHB
Salim MHB

Posted on

How I Built a "Vision-Based" Web Scraper in n8n (No CSS Selectors Needed)

The Problem: "Fragile" Scrapers 💥
If you have ever built a web scraper, you know the pain.

You spend hours inspecting elements, finding the right CSS selector (div.product-card > span.price), and building your logic. It runs perfectly for a week.

Then, the website updates its UI. The class names change from .price to .p-4 text-bold. Your scraper breaks.

I got tired of this cycle. So, I decided to build a scraper that doesn't read code. It "sees" the page, just like a human does.

The Solution: Multimodal AI (Gemini 1.5 Pro) 👁️
With the rise of Multimodal LLMs (models that accept images as input), we don't need to parse HTML anymore. We can just take a screenshot and ask the AI what it sees.

Here is how I built this workflow in n8n.

Step 1: The Stack 🛠️
n8n: For orchestration.

ScrapingBee (or Puppeteer): To render the page and take a screenshot.

Google Gemini 1.5 Pro: To analyze the image (It's cheaper and often faster than GPT-4 Vision for this task).

Step 2: The Logic 🧠

  1. Render & Screenshot First, don't fetch the HTML. Fetch a Binary Image. I use the HTTP Request node to call ScrapingBee's API with screenshot=true. This returns the visual representation of the website.

  2. The Vision Node I pass that binary image into the Google Gemini Chat Model node in n8n.

  3. The Prompt (The Secret Sauce) This is where the magic happens. You need to be very specific to get clean JSON.

My Prompt: "Analyze this image of an e-commerce product page. Extract the Product Title, Price, and Availability status. Return the data ONLY as a valid JSON object. Do not include markdown formatting or backticks."

  1. The Output The AI looks at the pixels—not the code. Even if the website obfuscates its HTML classes, the AI still sees "$19.99" in big bold text.

It returns:

JSON

{
"title": "n8n AI Mastery Pack",
"price": "$19.99",
"availability": "In Stock"
}
Why this changes everything 🚀
Zero Maintenance: The website can change its entire underlying code. As long as the visual design remains similar, your scraper keeps working.

Bypasses Obfuscation: Some sites scramble their HTML to stop scrapers. Vision AI doesn't care.

Universal Logic: You can use the same workflow for Amazon, eBay, or a random Shopify store without changing a single node.

The Trade-off ⚖️
It is slower and slightly more expensive (API costs) than standard HTML parsing. My advice: Use a "Hybrid" approach. Try standard scraping first; if it fails, trigger the Vision Agent as a fallback.

Want the Workflow? 📦
I spent a lot of time refining the prompts and error handling for this Vision Agent, and I bundled it into a pack of 4 Production-Ready n8n Agents (including a Long-Term Memory Bot and an Auto-Reporter).

If you want to skip the build time and just import the JSON, you can grab the pack here:

👉 Download the n8n AI Mastery Pack

(Save 10+ hours of development time. It includes the exact Vision Logic I described above.)

Top comments (0)