DEV Community

Cover image for Building an AI‑Powered Web Scraper in n8n (HTTP, HTML, JS, OpenAI)
Tolulope Victor
Tolulope Victor

Posted on

Building an AI‑Powered Web Scraper in n8n (HTTP, HTML, JS, OpenAI)

This workflow is a clean, end‑to‑end example of how to pull content from the web, process it with JavaScript, and then hand it off to an AI model – all from a single visual canvas.


What this workflow does

At a high level, the workflow:

  • Starts when you manually click Execute workflow.
  • Sends an HTTP Request to a target URL.
  • Parses the response using an HTML node to extract just the part of the page you care about.
  • Runs a Code in JavaScript step to clean or reshape that content.
  • Feeds the processed text into Message a model, which generates a final, human‑readable summary or analysis.

It is ideal for turning any web page into something more readable: a short summary, a bullet‑point brief for teammates, or even a draft blog post.


Step‑by‑step through each node

1. Manual trigger


The workflow begins with a When clicking ‘Execute workflow’ node.

  • This keeps things simple for a portfolio demo: there is no cron, no webhooks, just a clear “Run” button.
  • It is great for exploratory analysis or for showcasing the pipeline in a live walkthrough, because you control exactly when the flow starts.

2. HTTP Request: fetching the page

Next, the flow moves into HTTP Request (GET), pointing at the target URL (for example, https://www.leftclick.ai).

  • The request node is responsible for downloading the raw HTML of the page, including all the markup, scripts, and styles.
  • In a real project, this is where you might add headers, authentication, or query parameters if the page is behind an API or needs filters.


3. HTML node: extracting meaningful content

Raw HTML is noisy, so the next step uses an HTML node (named extractHtmlContent) to pull out just the parts that matter.

  • Using CSS selectors or XPath, the node can target the main article container, headings, or specific sections instead of the entire page.
  • This dramatically reduces token usage for the model and improves response quality, because the AI sees focused text instead of layout code and navigation.


4. Code in JavaScript: cleaning and shaping the text

After extraction, the Code in JavaScript node gives you a place to fine‑tune the content before handing it to the model.

  • Typical tasks here include stripping leftover tags, normalizing whitespace, truncating extra‑long pages, or assembling a structured prompt for the AI.
  • This node is where you can add your own opinionated logic — for example, building a JSON object with title, summary, and keyPoints that will be passed forward.


5. Message a model: turning data into narrative

Finally, the processed text flows into Message a model, which calls an OpenAI‑compatible chat or responses endpoint.

  • The node can send a system message (for example: “You are a helpful web analyst.”) and a user message containing the cleaned page content.
  • The model then returns a well‑structured explanation, summary, or analysis that becomes the workflow’s final output — perfect for feeding into emails, dashboards, or documentation.

This workflow demonstrates several skills that are valuable for modern web and automation work.

  • It shows understanding of HTTP, HTML parsing, and JavaScript data shaping, all wired together in a visual automation tool.
  • It highlights practical use of AI APIs: not just calling a model, but preparing high‑quality input and integrating the response into a repeatable process.

Top comments (0)