Jayanth

Posted on Apr 21

Automate Personalized Cold Email Icebreakers With n8n, 0.3% to 10% Reply Rate

#ai #tutorial #beginners

What this builds

An n8n workflow that takes a list of prospects from Google Sheets, scrapes each prospect's website (homepage plus up to 3 internal pages), uses AI to extract meaningful business insights from each page, generates a hyper-personalised multi-line icebreaker for each prospect, and writes the icebreaker back to the Sheet — ready for your outreach tool.

Output example:

`"Hey Katie, love how KTL Graphics makes it easy to filter by acreage — 
also a fan of your property update email option. Wanted to run 
something by you..."`

The system found acreage filtering and email notification features by actually crawling the website — not from the company name or LinkedIn headline.

Workflow JSON download: Available on the blog

Architecture

`Manual Trigger
        ↓
Google Sheets — get all rows
        ↓
Filter — only rows with email AND website URL
        ↓
Loop Over Items (batch size 1)
        ↓
HTTP Request — scrape homepage (HTML)
        ↓
Edit Fields — extract html field
        ↓
Code node — convert to string
        ↓
HTML Extractor — pull all <a href> links
        ↓
Edit Fields — keep: first_name, last_name, email, website_url, links
        ↓
Split Out — one row per link
        ↓
Filter — links starting with /
        ↓
Code node — normalise relative/absolute URLs
        ↓
Remove Duplicates + Limit (max 3 pages)
        ↓
HTTP Request — fetch each internal page
        ↓
HTML to Markdown conversion
        ↓
AI Agent — summarise each page into abstract
        ↓
Merge all abstracts
        ↓
AI Agent — generate icebreaker from all abstracts
        ↓
Google Sheets — write icebreaker back to prospect row`

Step 1 — Google Sheets setup

Export your Apollo.io leads (or any source) to Google Sheets.

Required columns:

`first_name | last_name | email | website_url | icebreaker (empty, filled by workflow)`

The workflow reads from this sheet and writes icebreakers back to the icebreaker column.

Step 2 — Filter node (quality gate)

Add a Filter node after the Sheets Get Rows node.
Two conditions, both must be true:

`Condition 1: website_url  exists and is not empty
Condition 2: email        exists and is not empty`

Without this filter, the workflow attempts to scrape blank URLs and throws errors that cascade through the rest of the pipeline.

Step 3 — Loop Over Items (batch size 1)

Add a Loop Over Items node.

Batch Size: 1

This processes one prospect at a time. Without it, the workflow tries to process all prospects simultaneously — rate limits on external sites cause failures, and the AI responses get mixed across prospects.

Step 4 — Scrape the homepage

Add an HTTP Request node.

`Method:      GET
URL:         {{ $json.website_url }}
Error handling: Continue on error
Redirects:   Follow, max 21`

"Continue on error" is critical. Some websites block scraping. Without this setting, one blocked site kills the entire workflow run.

Step 5 — Extract and normalise HTML

Add an Edit Fields node:

Field: html → string → {{ $json.data }}

Add a Code node:

`return [{
  json: {
    html: $json.html.toString()
  }
}];`

This converts the raw response body to a usable string

Step 6 — Extract all links from the homepage

Add an HTML Extractor node.

`CSS Selector:  a
Attribute:     href
Return:        Array
Options:       Trim values + clean text`

This pulls every link from the homepage — navigation, footer, internal pages. The homepage alone contains only part of the story. The About page, Services page, and Blog posts are where the personalisation gold lives.

Step 7 — Normalise URLs (Code node)

After splitting links into individual rows and filtering for links starting with /, add this Code node to normalise both relative and absolute URLs to relative paths:

`const items = $input.all();

const updatedItems = items.map((item) => {
  const link = item?.json?.links;

  if (typeof link === "string") {
    if (link.startsWith("/")) {
      item.json.links = link;
    }
    else if (link.startsWith("http://") || link.startsWith("https://")) {
      try {
        const url = new URL(link);
        let path = url.pathname;
        if (path !== "/" && path.endsWith("/")) {
          path = path.slice(0, -1);
        }
        item.json.links = path || "/";
      } catch (e) {
        item.json.links = link;
      }
    }
    else {
      item.json.links = link;
    }
  }
  return item;
});

return updatedItems;`

Why this is necessary: Websites use both relative links (/about) and absolute links (https://example.com/about). You need both normalised to the same format to deduplicate and combine with the base URL correctly in the next step.

Step 8 — Deduplicate and limit

Add a Remove Duplicates node (deduplicate on links field) followed by a Limit node:

`Max items: 3
Keep:      First Items`

Three pages per prospect is the sweet spot. Enough to find specific details, not enough to run up excessive API costs or hit rate limits on the target site.

Step 9 — Fetch internal pages

Add another HTTP Request node to fetch each filtered internal URL:

Method: GET
URL: {{ $json.website_url }}{{ $json.links }}
Error handling: Continue on error
`

The URL concatenates the base domain from the original lead data with the relative path extracted from the link list.

Step 10 — HTML to Markdown

Add an HTML to Markdown node:

`Mode:            HTML to Markdown
HTML:            {{ $json.data ? $json.data : "<div>empty</div>" }}
Destination Key: data`

Markdown is significantly more token-efficient than HTML for AI processing. Stripping HTML tags reduces the content you are passing to the AI model by 60–80%, which directly reduces API cost and improves the quality of the AI's analysis by removing noise.

Step 11 — AI page summariser

Add an AI Agent node with this system prompt:

You are a helpful, intelligent website scraping assistant.

You are provided a Markdown scrape of a website page.
Your task is to provide a two-paragraph abstract of what this page is about.

Return in this JSON format:
{"abstract":"your abstract goes here"}

Rules:

Your extract should be comprehensive — similar level of detail as an abstract to a published paper.
Use a straightforward, factual writing style.
Focus on what is unique or distinctive about this business or page.
Note any specific features, products, services, or differentiators that would be useful for personalised outreach.
Return ONLY the JSON object. No backticks. No explanation.

Model: GPT-4 mini via OpenRouter — approximately $0.0003 per page abstract.

Step 12 — Merge all abstracts

After the loop processes all three pages, merge the abstract outputs into a single node. Use a Merge node set to "Combine All Items."
Pass the merged abstracts to the final AI Agent.

Step 13 — Icebreaker generator

Add a final AI Agent node with this system prompt:

`You are an expert cold email copywriter specializing in personalized outreach.

You will receive multiple website page summaries for a prospect company.
Your task is to write a multi-line, personalized icebreaker for a cold email.

Rules:
- Reference 1-2 SPECIFIC details you found on their website
  (features, initiatives, content, language they use about themselves)
- Sound like you actually explored the website — not like you read a summary
- Be conversational, warm, and curious — not salesy
- Keep it to 2-3 sentences maximum
- Address the first_name directly at the start
- End with a natural transition into your pitch ("Wanted to run something by you...")

First name: {{ $('Loop Over Items').first().json.first_name }}
Website summaries: [all abstract outputs concatenated here]

Return ONLY the icebreaker text. No JSON. No explanation.
`

Step 14 — Write back to Google Sheets

Add a Google Sheets node:

`Operation:    Update Row
Match on:     email = {{ $json.email }}
Update:
  icebreaker: {{ $json.icebreaker }}`

The icebreaker column fills in for each row as the workflow processes it. When the run is complete, your Sheet has a personalised icebreaker for every prospect with a valid website URL.

What breaks

HTTP Request returns 403 on most sites: The site is blocking the default n8n user agent. Add a custom header: User-Agent: Mozilla/5.0 (compatible; outreach-research/1.0). This passes most basic anti-scraping checks.
AI Agent returns a JSON error instead of abstract: The page content was empty (HTTP request returned an error page). The {{ $json.data ? $json.data : "

empty" }} expression in the HTML to Markdown node handles this — it passes a dummy div instead of null, which prevents the AI from throwing on empty input.

Loop produces mixed data across prospects: You are not using batch size 1. Set Loop Over Items batch size to exactly 1.
Icebreakers sound generic despite the workflow running: The AI is receiving summaries but not finding distinctive details. Check your Limit node — if it is set to 0 or is missing, you may only be scraping the homepage. Set it to 3 to ensure About and Services pages are included.

Running cost

OpenRouter GPT-4 mini: approximately $0.002 per prospect (3 page abstracts + 1 icebreaker generation).
100 prospects = $0.20 in API costs.
1,000 prospects = $2.00.

Compare this to $5–$50 per manually researched and written personalised opener. The cost case is immediate.
Workflow JSON at elevoras.com.
What niche are you targeting with cold outreach? Drop it in the comments — happy to suggest prompt adjustments for specific industries.

DEV Community

Automate Personalized Cold Email Icebreakers With n8n, 0.3% to 10% Reply Rate

Top comments (0)