How I Built a "Vision-Based" Web Scraper in n8n (No CSS Selectors Needed)

Salim MHB — Tue, 02 Dec 2025 07:13:27 +0000

The Problem: "Fragile" Scrapers 💥
If you have ever built a web scraper, you know the pain.

You spend hours inspecting elements, finding the right CSS selector (div.product-card > span.price), and building your logic. It runs perfectly for a week.

Then, the website updates its UI. The class names change from .price to .p-4 text-bold. Your scraper breaks.

I got tired of this cycle. So, I decided to build a scraper that doesn't read code. It "sees" the page, just like a human does.

The Solution: Multimodal AI (Gemini 1.5 Pro) 👁️
With the rise of Multimodal LLMs (models that accept images as input), we don't need to parse HTML anymore. We can just take a screenshot and ask the AI what it sees.

Here is how I built this workflow in n8n.

Step 1: The Stack 🛠️
n8n: For orchestration.

ScrapingBee (or Puppeteer): To render the page and take a screenshot.

Google Gemini 1.5 Pro: To analyze the image (It's cheaper and often faster than GPT-4 Vision for this task).

Step 2: The Logic 🧠

Render & Screenshot First, don't fetch the HTML. Fetch a Binary Image. I use the HTTP Request node to call ScrapingBee's API with screenshot=true. This returns the visual representation of the website.
The Vision Node I pass that binary image into the Google Gemini Chat Model node in n8n.
The Prompt (The Secret Sauce) This is where the magic happens. You need to be very specific to get clean JSON.

My Prompt: "Analyze this image of an e-commerce product page. Extract the Product Title, Price, and Availability status. Return the data ONLY as a valid JSON object. Do not include markdown formatting or backticks."

The Output The AI looks at the pixels—not the code. Even if the website obfuscates its HTML classes, the AI still sees "$19.99" in big bold text.

It returns:

JSON

{
"title": "n8n AI Mastery Pack",
"price": "$19.99",
"availability": "In Stock"
}
Why this changes everything 🚀
Zero Maintenance: The website can change its entire underlying code. As long as the visual design remains similar, your scraper keeps working.

Bypasses Obfuscation: Some sites scramble their HTML to stop scrapers. Vision AI doesn't care.

Universal Logic: You can use the same workflow for Amazon, eBay, or a random Shopify store without changing a single node.

The Trade-off ⚖️
It is slower and slightly more expensive (API costs) than standard HTML parsing. My advice: Use a "Hybrid" approach. Try standard scraping first; if it fails, trigger the Vision Agent as a fallback.

Want the Workflow? 📦
I spent a lot of time refining the prompts and error handling for this Vision Agent, and I bundled it into a pack of 4 Production-Ready n8n Agents (including a Long-Term Memory Bot and an Auto-Reporter).

If you want to skip the build time and just import the JSON, you can grab the pack here:

👉 Download the n8n AI Mastery Pack

(Save 10+ hours of development time. It includes the exact Vision Logic I described above.)

Stop Building Basic Bots: How I Built 4 "Production-Ready" AI Agents in n8n (Vision, Memory, & Reporting)

Salim MHB — Mon, 01 Dec 2025 09:43:05 +0000

We all love building workflows in n8n. But let’s be honest: there is a huge gap between a simple "Hello World" chatbot and a robust, production-ready AI Agent that can handle real-world complexity.

I spent the last few weeks pushing n8n to its limits to solve four specific headaches I faced in automation: Memory, Dynamic Scraping, Content Analysis, and Reporting.

Here is a breakdown of the 4 advanced agents I built, the tech stack I used, and how they solve problems standard workflows can't.

The "Amnesia" Problem (Long-Term Memory Agent) The Problem: Most LLM chains in n8n forget the user's context as soon as the execution ends. The Solution: I built an agent that mimics human memory.

How it works: Instead of relying solely on window memory, this workflow connects to Google Docs.

The Logic: The AI analyzes the user's input. If it detects personal details or preferences, it "saves" them to a specific doc (Long-Term Memory). If it detects a request, it saves it as a "Note."

Result: A bot that actually remembers who you are weeks later.

The "Vision" Scraper (Scraping Without Selectors) The Problem: Traditional scraping relies on CSS selectors. If the website updates its UI, your scraper breaks. The Solution: An agent that "sees" instead of reading code.

The Stack: ScrapingBee (for rendering) + Google Gemini Vision.

How it works: The workflow takes a screenshot of the webpage. Then, it passes that image to Gemini 1.5 Pro with a prompt to extract structured JSON data (Prices, Titles, etc.).

Why it helps: It’s virtually unbreakable because it doesn't care about div or class names.

The YouTube Analyst The Problem: I needed to extract insights from technical videos without watching them for 40 minutes. The Solution: An automated Summarizer & Analyst.

Flow: YouTube API (Get URL) -> Extract Transcript -> OpenAI (Analyze) -> Telegram.

Key Feature: It doesn't just summarize; it breaks down definitions, characteristics, and actionable steps into a structured report delivered to chat.

The Auto-Reporter The Problem: Manually updating spreadsheets with community stats (GitHub, etc.) is tedious. The Solution: A fully automated reporting agent.

Flow: Scrapes Data -> Aggregates Stats -> Generates Markdown Report -> Saves to Drive/Emails stakeholders.

Why I Bundled These?
Building these from scratch involved a lot of trial and error, specifically figuring out the prompt engineering for the Vision model and the logic routing for the Memory agent.

If you want to build these yourself, I highly recommend exploring Gemini's Vision capabilities in n8n—it's a game changer for scraping.

However, if you want to skip the debugging phase...

I’ve packaged all 4 of these workflows into a "Mastery Pack." They are cleaned up, annotated, and ready to import.

💡 Think about it: A developer's hour is valuable. You can build this yourself, or you can save 10+ hours of development time and grab the JSON files instantly.

👉 Get the n8n AI Mastery Pack here

Let me know in the comments if you have questions about the Vision Scraping logic!

DEV Community: Salim MHB

How I Built a "Vision-Based" Web Scraper in n8n (No CSS Selectors Needed)

Stop Building Basic Bots: How I Built 4 "Production-Ready" AI Agents in n8n (Vision, Memory, & Reporting)