DEV Community

Cover image for n8n + LLM Scraper: Capture AI Answers in a No-Code Workflow
EthanAIgo1
EthanAIgo1

Posted on

n8n + LLM Scraper: Capture AI Answers in a No-Code Workflow

n8n+LLM scraper

TL;DR:

  • n8n talks to the Scrapeless LLM Chat Scraper with one HTTP Request node — no code, no SDK. Use a single HTTP Request node to POST to https://api.scrapeless.com/api/v2/scraper/execute with an x-api-token header and a JSON body; the response is injected into the workflow as data the following node can consume.
  • The request body is { actor, input } and nothing else. Use the body {"actor":"scraper.chatgpt","input":{"prompt":"…","country":"US","web_search":true}}; the node returns an envelope { status, task_id, task_result } — the uniform response format for all Scrapeless LLM actors.
  • A Schedule Trigger turns the call into a standing monitor. Chain Schedule Trigger → HTTP Request → IF → Set/Sheet/DB and n8n will rerun the prompt on your chosen cadence, appending each returned answer to a sheet or table without manual intervention.
  • The IF node handles the empty run as data, not as a failure. The model writes to task_result per session, so a blank reply means “no result for this run” rather than an error — branch on emptiness, log that there’s nothing to persist, and continue; a later scheduled run may produce a populated result.
  • The MCP Client node is the agent-node alternative. If your workflow acts as an AI agent instead of a static pipeline, point n8n’s MCP Client node at the Scrapeless MCP server and the same scraping capability becomes a callable tool the agent can invoke.
  • Free to start. New Scrapeless accounts receive trial credits — sign up at app.scrapeless.com.

Introduction: the answer engine becomes a workflow input

LLM answer engines now sit between users and the open web: the brand-level questions — who gets recommended, which sources are cited, what price appears — are often decided inside ChatGPT before a single link is clicked. Polling that surface on a schedule is a data-collection task, and many teams already run scheduled jobs in n8n.

The snag is that ChatGPT doesn’t expose an official “answer” API, and driving the chat UI from an automation tool means dealing with login screens, streamed responses, and fields populated client-side after rendering. n8n's HTTP Request node can call any REST endpoint, but there’s nothing for it to call until rendering, residential egress, and parsing are handled elsewhere.

The Scrapeless LLM Chat Scraper is that elsewhere: a single POST returns the rendered HTML answer wrapped in JSON, so the HTTP Request node gets a simple endpoint and downstream steps consume structured JSON fields. This post shows how to wire n8n to that actor with no code — a Schedule Trigger, one HTTP Request node, an IF branch to skip empty runs, and a storage node — and explains the agent-node path for workflows that use the scraper as an AI tool. For a ranked comparison of answer-engine scrapers, see the best LLM scrapers.

A note on scope: the request contract below was validated against the live scraper.chatgpt actor, and each n8n parameter name was checked against the current n8n node reference. The end-to-end workflow is described from those two verified sources — this post does not include a screenshot of a run as proof.


What You Can Do With It

  • Scheduled answer monitoring. Execute a consistent set of prompts on an hourly or daily cadence and append each ChatGPT response to a spreadsheet, converting answer drift into a time series you can analyze instead of relying on manual checks.
  • Share-of-citation tracking. Inspect task_result.search_result to obtain the sources the model consulted, then aggregate domain counts across runs to determine which sites the model keeps citing for your category.
  • Brand-mention alerts. Use a conditional branch that checks whether the response text names your product, and trigger a Slack or email node off the IF when a mention appears or disappears.
  • Multi-engine capture in one workflow. Clone the HTTP Request node and swap the actor string to scraper.gemini or scraper.perplexity; the request envelope remains identical, so downstream nodes don’t need to change.
  • No-ops handoff to non-developers. Once the workflow is in place, teammates can edit the prompt list in a Set node or a sheet without touching code, and the capture keeps running.
  • Agent tool calls. Surface the scraper through the MCP Client node so an n8n AI agent can decide when to query an answer engine as part of a larger task.

Why the Scrapeless LLM Chat Scraper for n8n

The Scrapeless LLM Chat Scraper is the scraper.chatgpt actor in the Universal Scraping API family, and it maps neatly to n8n because it’s a single authenticated POST that accepts JSON and returns JSON. For building no-code workflows it provides:

  • A single REST endpoint the HTTP Request node calls directly — no SDK to install on the n8n host, no browser to drive.
  • Server-side rendering, residential egress, and anti-bot handling, so the node receives a finished answer rather than a login page.
  • The country field on the request, which pins the egress market from inside the JSON body — one node covers per-market capture.
  • One { status, task_id, task_result } envelope shared across scraper.chatgpt, scraper.gemini, and scraper.perplexity, so a working node duplicates to the other engines unchanged.
  • An x-api-token header as the only auth — a single n8n credential or header value, reusable across every node that calls Scrapeless.

Get your API key on the free plan at app.scrapeless.com.


Prerequisites

  • An n8n instance (cloud or self-hosted) where you can add a workflow
  • A Scrapeless account and API key — sign up at app.scrapeless.com
  • The API key available to paste into the HTTP Request node's header (or stored as an n8n credential)
  • A destination for the captured rows — a Set node, a Google Sheets node, or a database node such as Postgres

You don't need any language runtime, proxy, or CAPTCHA solver on your side — the integration is a simple HTTP request and all the heavy processing happens on the Scrapeless servers.


The workflow at a glance

The entire capture consists of four nodes arranged sequentially:

Schedule Trigger  →  HTTP Request  →  IF  →  Set / Google Sheets / Postgres
   (interval)        (POST actor)    (empty?)     (store the answer)
Enter fullscreen mode Exit fullscreen mode

Schedule Trigger kicks off on a configured interval, the HTTP Request node invokes scraper.chatgpt, the IF node determines whether the response contains content, and the final storage node persists the row. When the IF node follows its empty branch, that run (a no-answer case) is recorded and discarded — it is not retried. Each node description below only includes parameters present in the current n8n node reference.


Step 1 — Schedule Trigger

The Schedule Trigger initiates the workflow on a regular timetable so captures happen automatically without manual starts. Add a Schedule Trigger node (type version 1.3) and configure its Trigger Rules to use an interval — e.g., every hour, every few hours, or once per day, chosen based on how quickly the answers you track typically change. For monitoring an answer engine, running daily or twice a day is usually sufficient because trends over weeks provide the meaningful signal, not minute-level fluctuations.

Each trigger firing produces one item. If you need multiple prompts in a single run, follow the trigger with a Set node that emits your list of prompts, or load the prompts from a sheet — each prompt will then pass through the HTTP Request node as a separate item.


Step 2 — HTTP Request node: call the actor

This HTTP Request node is the integration point: it POSTs the actor invocation to Scrapeless and returns the parsed response back into your workflow. Add an HTTP Request node (type version 4.4) and configure it with these parameters:

  • MethodPOST
  • URLhttps://api.scrapeless.com/api/v2/scraper/execute
  • Send Headers → on. Add one header: name x-api-token, value your Scrapeless API key (or reference an n8n credential).
  • Send Body → on.
  • Body Content TypeJSON.
  • Specify BodyUsing JSON, then paste the actor call into the JSON field.

The request body is the complete contract — it must include the actor identifier and an input object:

{
  "actor": "scraper.chatgpt",
  "input": {
    "prompt": "best running shoes 2026",
    "country": "US",
    "web_search": true
  }
}
Enter fullscreen mode Exit fullscreen mode

If you need the prompt to vary per item, substitute the literal string with an n8n expression that pulls the incoming item's value (for example, the prompt field from the Set node or a spreadsheet row that feeds this node). The country field forces residential egress for the run, and web_search enables the model to consult live sources — which increases the chance the answer resolves correctly. Note that all parameters must live under input; placing prompt or country at the top level will cause the actor to reject the request.

Increase the node's Timeout — rendering a complete answer can take time, and the default short timeout may terminate the request before the response arrives. Give the call enough headroom.

The node returns the standard envelope as the item's JSON: { status, task_id, task_result }. Downstream nodes should read the generated text from task_result.result_text and the cited sources from task_result.search_result.

Get your API key on the free plan: app.scrapeless.com


Step 3 — IF node: branch on an empty answer

The IF node decides whether there is anything worth storing. ChatGPT responses are produced per session, so the same prompt can yield a full reply on one run and an empty task_result on another — this is normal behavior, not a failure. Place an IF node (type version 2.3) immediately after the HTTP Request node and create a single Conditions rule that verifies the answer field isn't empty — for example, check that the expression task_result.result_text is not empty.

  • False branch (answer present) → wire to the storage node in Step 4.
  • True branch (answer empty) → record that the run produced nothing and stop. A NoOp node, or a Set node that writes an "empty run" marker row, is enough.

The empty branch should not re-invoke the actor. The next scheduled run is the next opportunity for a populated answer, and the pattern depends on aggregating only the runs that return content. Treat an empty result as nullable data, not an error to chase.


Step 4 — Store the answer

The storage node converts each completed answer into a row you can query later. Connect the IF node's answer-present branch to the destination that fits your flow:

  • Set node → reduce the item to the fields you want to keep: the prompt, task_result.result_text, the source domains from task_result.search_result, the task_id, and a capture timestamp. Handy as the final shaping step even if another node performs the actual write.
  • Google Sheets node → append one row per run to produce a shareable, no-database log that non-developers can read and edit.
  • Postgres (or another database) node → insert a record into a table when you need captures to feed a warehouse or populate a dashboard.

Always include task_id and the run time on each row. Answer length, citation count, and the listed sources will vary between runs, so the useful output is the time series across captures rather than any single response.


The official Scrapeless node — and why this guide uses HTTP Request

There is an official Scrapeless community node, n8n-nodes-scrapeless. Install it, add a Scrapeless credential once, and the node exposes typed operations for three distinct surfaces: Deep SerpApi (Google Search and Google Trends), the Universal Scraping API (Web Unlocker), and the Crawler (Scrape and Crawl). For those kinds of tasks, the node is the more straightforward option — you won't need to manually construct request URLs or JSON payloads.

The LLM Chat Scraper actors — scraper.chatgpt, scraper.gemini, scraper.perplexity, and scraper.aimode — are not available as operations in the current node release, so when you need to capture an answer engine's response the HTTP Request node is the correct choice. It calls /api/v2/scraper/execute directly, which matches the requests assembled in the steps above. If a future node version adds an LLM-specific operation, the Scrapeless credential and overall workflow layout remain valid — only the central node would be swapped out.


The agent-node alternative: MCP Client + Scrapeless MCP server

When your workflow is driven by an AI agent instead of a fixed sequence of nodes, use n8n's MCP Client node rather than a custom HTTP request. The MCP Client node opens a connection to an MCP server and exposes that server's toolset to an n8n AI agent, letting the agent invoke those tools autonomously whenever its reasoning requires them. If you point the MCP Client at the Scrapeless MCP server, the answer-engine capture becomes one of the agent’s callable tools — the agent itself decides when to call ChatGPT as part of a broader task, instead of you embedding that call into a static branch.

These two approaches solve different problems. The HTTP Request node is ideal for deterministic, scheduled captures — identical prompts, fixed cadence, and predictable rows. The MCP Client node is the right choice when you want an agent to dynamically decide whether to query and what to ask. Both approaches use the same Scrapeless surface; the only difference is who initiates the call.


What You Get Back

The HTTP Request node delivers the actor's standard envelope as the item JSON. The actual reply is nested under task_result: the generated prose appears in result_text, and any sources consulted are listed in search_result. The example below shows the structure scraper.chatgpt emits; the field values come from a live run and have been truncated for brevity.

// Schema is what scraper.chatgpt returns; field values are an illustrative sample from a live run.
{
  "status": "success",
  "task_id": "…",
  "task_result": {
    "prompt": "best running shoes 2026",
    "model": "gpt-5-mini",
    "result_text": "Here are the best running shoes in 2026, based on recent testing across major brands (ASICS, Nike, HOKA, Adidas, Brooks, Saucony) …",
    "content_references": [],
    "search_result": [
      { "title": "10 Best Running Shoes of 2026 | Lab Tested & Ranked", "url": "https://…", "snippet": "…", "attribution": "outdoorgearlab.com" }
    ],
    "links": [],
    "web_search": true
  }
}
Enter fullscreen mode Exit fullscreen mode

A few practical notes for handling this in n8n:

  • Every field is nullable. result_text may be empty and search_result can be an empty array for a particular run — that's exactly why the Step 3 IF node exists. Always check for missing/null fields in any expression that reads them.
  • search_result is the citation surface. Each entry contains title, url, snippet, and attribution. Use a Set node to extract the host from the URL and aggregate counts across runs to measure share-of-citation.
  • web_search echoes the request. This boolean indicates whether live-source fetching was enabled for that run; include web_search: true in the request body when you want better resolution for recommendation-style prompts.
  • Output varies run to run. Response length and the number of sources may change even for the same prompt — persist the capture timestamp and task_id with every stored record.

Conclusion: a four-node standing capture

Linking n8n with the Scrapeless LLM Chat Scraper can be implemented with a single HTTP Request node: POST { actor, input } to /api/v2/scraper/execute including an x-api-token header, parse task_result from the response, branch on empty runs, and persist the resulting row. Adding a Schedule Trigger makes that workflow a continuous monitor, and adding the MCP Client node exposes it as an agent-facing tool when needed. Scope your prompt set tightly, fix country per target market, treat every field as nullable, and save task_id together with a timestamp so you get a time-series signal. Execute a stable prompt set on a schedule using Universal Scraping API credits, and the scraper output becomes a normalized input for downstream workflow logic. The request schema and field names were validated against the live LLM Chat Scraper actor, and node settings checked against the current n8n node reference.


FAQ

Q: Do I need to write any code to connect n8n to the LLM Chat Scraper?
No. You can use n8n's built-in HTTP Request node: set it to POST, point it at /api/v2/scraper/execute, include an x-api-token header, and send a JSON body. There’s nothing to install on the n8n host and you don’t need to add a Function node or custom SDK.

Q: Where does my Scrapeless API key go in n8n?
Put it in the HTTP Request node headers — turn on Send Headers, add a header named x-api-token and either paste your key or reference an n8n credential so the secret isn’t embedded in the node. That same header is used for every Scrapeless call within the workflow.

Q: How do I send several prompts in one run?
Chain a Schedule Trigger to a Set node that emits your list of prompts, or pull them from a Google Sheet. n8n treats each prompt as a separate item; each item passes through the HTTP Request node independently, so one workflow execution processes the whole batch.

Q: What happens when the answer comes back empty?
An empty task_result means no answer was produced for that session-run. The IF node’s empty branch records the no-op and stops processing that item; the workflow doesn’t retry that same call. The next scheduled execution is the next opportunity to get a non-empty answer.

Q: Can I capture Gemini and Perplexity from the same workflow?
Yes. Duplicate the HTTP Request node and swap the actor string to scraper.gemini or scraper.perplexity. The endpoint, header, and the { status, task_id, task_result } response envelope remain the same, so downstream IF and storage nodes don’t need changes.

Q: When should I use the MCP Client node instead of the HTTP Request node?
Use the HTTP Request node for predictable, scheduled captures with known prompts. Use the MCP Client node (targeting the Scrapeless MCP server) when an n8n AI agent should autonomously decide whether to query and what to send — in that setup the scraper functions as a callable tool for the agent.

Q: Do I need a proxy or a browser running on my n8n host?
No. Scrapeless handles rendering, residential egress, and anti-bot measures server-side. Your n8n instance only issues an outbound HTTPS request; use the country field in the request body to select the egress market.

Q: Is collecting ChatGPT answers legal?
The API returns the same publicly visible answer any user would see. As with any scraping workflow, legality depends on jurisdiction and intended use — review applicable terms, consult legal counsel if needed, and limit collection to public answer and source data (do not collect personal data).

Top comments (0)