<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Salim MHB</title>
    <description>The latest articles on DEV Community by Salim MHB (@salim_mhb).</description>
    <link>https://dev.to/salim_mhb</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3638737%2F4e1146ad-0596-4524-ae1f-56e45388f1aa.jpg</url>
      <title>DEV Community: Salim MHB</title>
      <link>https://dev.to/salim_mhb</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/salim_mhb"/>
    <language>en</language>
    <item>
      <title>How I Built a "Vision-Based" Web Scraper in n8n (No CSS Selectors Needed)</title>
      <dc:creator>Salim MHB</dc:creator>
      <pubDate>Tue, 02 Dec 2025 07:13:27 +0000</pubDate>
      <link>https://dev.to/salim_mhb/how-i-built-a-vision-based-web-scraper-in-n8n-no-css-selectors-needed-3c40</link>
      <guid>https://dev.to/salim_mhb/how-i-built-a-vision-based-web-scraper-in-n8n-no-css-selectors-needed-3c40</guid>
      <description>&lt;p&gt;The Problem: "Fragile" Scrapers 💥&lt;br&gt;
If you have ever built a web scraper, you know the pain.&lt;/p&gt;

&lt;p&gt;You spend hours inspecting elements, finding the right CSS selector (div.product-card &amp;gt; span.price), and building your logic. It runs perfectly for a week.&lt;/p&gt;

&lt;p&gt;Then, the website updates its UI. The class names change from .price to .p-4 text-bold. Your scraper breaks.&lt;/p&gt;

&lt;p&gt;I got tired of this cycle. So, I decided to build a scraper that doesn't read code. It "sees" the page, just like a human does.&lt;/p&gt;

&lt;p&gt;The Solution: Multimodal AI (Gemini 1.5 Pro) 👁️&lt;br&gt;
With the rise of Multimodal LLMs (models that accept images as input), we don't need to parse HTML anymore. We can just take a screenshot and ask the AI what it sees.&lt;/p&gt;

&lt;p&gt;Here is how I built this workflow in n8n.&lt;/p&gt;

&lt;p&gt;Step 1: The Stack 🛠️&lt;br&gt;
n8n: For orchestration.&lt;/p&gt;

&lt;p&gt;ScrapingBee (or Puppeteer): To render the page and take a screenshot.&lt;/p&gt;

&lt;p&gt;Google Gemini 1.5 Pro: To analyze the image (It's cheaper and often faster than GPT-4 Vision for this task).&lt;/p&gt;

&lt;p&gt;Step 2: The Logic 🧠&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Render &amp;amp; Screenshot First, don't fetch the HTML. Fetch a Binary Image. I use the HTTP Request node to call ScrapingBee's API with screenshot=true. This returns the visual representation of the website.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Vision Node I pass that binary image into the Google Gemini Chat Model node in n8n.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Prompt (The Secret Sauce) This is where the magic happens. You need to be very specific to get clean JSON.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;My Prompt: "Analyze this image of an e-commerce product page. Extract the Product Title, Price, and Availability status. Return the data ONLY as a valid JSON object. Do not include markdown formatting or backticks."&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The Output The AI looks at the pixels—not the code. Even if the website obfuscates its HTML classes, the AI still sees "$19.99" in big bold text.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It returns:&lt;/p&gt;

&lt;p&gt;JSON&lt;/p&gt;

&lt;p&gt;{&lt;br&gt;
  "title": "n8n AI Mastery Pack",&lt;br&gt;
  "price": "$19.99",&lt;br&gt;
  "availability": "In Stock"&lt;br&gt;
}&lt;br&gt;
Why this changes everything 🚀&lt;br&gt;
Zero Maintenance: The website can change its entire underlying code. As long as the visual design remains similar, your scraper keeps working.&lt;/p&gt;

&lt;p&gt;Bypasses Obfuscation: Some sites scramble their HTML to stop scrapers. Vision AI doesn't care.&lt;/p&gt;

&lt;p&gt;Universal Logic: You can use the same workflow for Amazon, eBay, or a random Shopify store without changing a single node.&lt;/p&gt;

&lt;p&gt;The Trade-off ⚖️&lt;br&gt;
It is slower and slightly more expensive (API costs) than standard HTML parsing. My advice: Use a "Hybrid" approach. Try standard scraping first; if it fails, trigger the Vision Agent as a fallback.&lt;/p&gt;

&lt;p&gt;Want the Workflow? 📦&lt;br&gt;
I spent a lot of time refining the prompts and error handling for this Vision Agent, and I bundled it into a pack of 4 Production-Ready n8n Agents (including a Long-Term Memory Bot and an Auto-Reporter).&lt;/p&gt;

&lt;p&gt;If you want to skip the build time and just import the JSON, you can grab the pack here:&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://salim4mhb.gumroad.com/l/n8n-mastery-pack" rel="noopener noreferrer"&gt;Download the n8n AI Mastery Pack&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(Save 10+ hours of development time. It includes the exact Vision Logic I described above.)&lt;/p&gt;

</description>
      <category>automation</category>
      <category>ai</category>
      <category>webscraping</category>
      <category>n8nbrightdatachallenge</category>
    </item>
    <item>
      <title>Stop Building Basic Bots: How I Built 4 "Production-Ready" AI Agents in n8n (Vision, Memory, &amp; Reporting)</title>
      <dc:creator>Salim MHB</dc:creator>
      <pubDate>Mon, 01 Dec 2025 09:43:05 +0000</pubDate>
      <link>https://dev.to/salim_mhb/stop-building-basic-bots-how-i-built-4-production-ready-ai-agents-in-n8n-vision-memory--4e20</link>
      <guid>https://dev.to/salim_mhb/stop-building-basic-bots-how-i-built-4-production-ready-ai-agents-in-n8n-vision-memory--4e20</guid>
      <description>&lt;p&gt;We all love building workflows in n8n. But let’s be honest: there is a huge gap between a simple "Hello World" chatbot and a robust, production-ready AI Agent that can handle real-world complexity.&lt;/p&gt;

&lt;p&gt;I spent the last few weeks pushing n8n to its limits to solve four specific headaches I faced in automation: Memory, Dynamic Scraping, Content Analysis, and Reporting.&lt;/p&gt;

&lt;p&gt;Here is a breakdown of the 4 advanced agents I built, the tech stack I used, and how they solve problems standard workflows can't.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The "Amnesia" Problem (Long-Term Memory Agent)
The Problem: Most LLM chains in n8n forget the user's context as soon as the execution ends. The Solution: I built an agent that mimics human memory.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;How it works: Instead of relying solely on window memory, this workflow connects to Google Docs.&lt;/p&gt;

&lt;p&gt;The Logic: The AI analyzes the user's input. If it detects personal details or preferences, it "saves" them to a specific doc (Long-Term Memory). If it detects a request, it saves it as a "Note."&lt;/p&gt;

&lt;p&gt;Result: A bot that actually remembers who you are weeks later.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The "Vision" Scraper (Scraping Without Selectors)
The Problem: Traditional scraping relies on CSS selectors. If the website updates its UI, your scraper breaks. The Solution: An agent that "sees" instead of reading code.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The Stack: ScrapingBee (for rendering) + Google Gemini Vision.&lt;/p&gt;

&lt;p&gt;How it works: The workflow takes a screenshot of the webpage. Then, it passes that image to Gemini 1.5 Pro with a prompt to extract structured JSON data (Prices, Titles, etc.).&lt;/p&gt;

&lt;p&gt;Why it helps: It’s virtually unbreakable because it doesn't care about div or class names.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The YouTube Analyst
The Problem: I needed to extract insights from technical videos without watching them for 40 minutes. The Solution: An automated Summarizer &amp;amp; Analyst.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Flow: YouTube API (Get URL) -&amp;gt; Extract Transcript -&amp;gt; OpenAI (Analyze) -&amp;gt; Telegram.&lt;/p&gt;

&lt;p&gt;Key Feature: It doesn't just summarize; it breaks down definitions, characteristics, and actionable steps into a structured report delivered to chat.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The Auto-Reporter
The Problem: Manually updating spreadsheets with community stats (GitHub, etc.) is tedious. The Solution: A fully automated reporting agent.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Flow: Scrapes Data -&amp;gt; Aggregates Stats -&amp;gt; Generates Markdown Report -&amp;gt; Saves to Drive/Emails stakeholders.&lt;/p&gt;

&lt;p&gt;Why I Bundled These?&lt;br&gt;
Building these from scratch involved a lot of trial and error, specifically figuring out the prompt engineering for the Vision model and the logic routing for the Memory agent.&lt;/p&gt;

&lt;p&gt;If you want to build these yourself, I highly recommend exploring Gemini's Vision capabilities in n8n—it's a game changer for scraping.&lt;/p&gt;

&lt;p&gt;However, if you want to skip the debugging phase...&lt;/p&gt;

&lt;p&gt;I’ve packaged all 4 of these workflows into a "Mastery Pack." They are cleaned up, annotated, and ready to import.&lt;/p&gt;

&lt;p&gt;💡 Think about it: A developer's hour is valuable. You can build this yourself, or you can save 10+ hours of development time and grab the JSON files instantly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7k3564rg8gn2la5dww7m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7k3564rg8gn2la5dww7m.png" alt=" " width="800" height="285"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://salim4mhb.gumroad.com/l/n8n-mastery-pack" rel="noopener noreferrer"&gt;Get the n8n AI Mastery Pack here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let me know in the comments if you have questions about the Vision Scraping logic!&lt;/p&gt;

</description>
      <category>n8n</category>
      <category>automation</category>
      <category>ai</category>
      <category>webscraping</category>
    </item>
  </channel>
</rss>
